(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=38471822

这些陈述表明,使用“rg”进行 grep 操作被认为优于传统 grep,因为它具有更高的性能水平,特别是在处理大量文本或大型数据集时。 此外,“ag”、“awk”、“sed”和“fgrep”等其他工具提供了高级功能,这增加了这些实用程序的复杂性和深度。 然而,单独使用“git grep”而不通过签出和未跟踪的树进行递归,忽略 GitIgnore 文件,并且仅限于当前存储库内的浅层搜索,与使用“rg”等全功能工具相比,可能会限制其功能和相关性` 或 `ack`。 最终,在这些工具之间进行选择最终取决于个人喜好、所需的功能和个人使用模式。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Ripgrep is faster than grep, ag, Git grep, ucg, pt, sift (2016) (burntsushi.net)
362 points by subset 21 hours ago | hide | past | favorite | 178 comments










It's fast indeed. And I can't help keeping promoting the combination with fzf :) For those who want to try it out, this is a Powershell function but the same principle applies in any shell. Does ripgrep then puts fuzzy searching in the resulting files+text on top while showing context in bat:

  function frg {
    $result = rg --ignore-case --color=always --line-number --no-heading @Args |
      fzf --ansi `
          --color 'hl:-1:underline,hl+:-1:underline:reverse' `
          --delimiter ':' `
          --preview "bat --color=always {1} --theme='Solarized (light)' --highlight-line {2}" `
          --preview-window 'up,60%,border-bottom,+{2}+3/3,~3'
    if ($result) {
      & ($env:EDITOR).Trim("`"'") $result.Split(': ')[0]
    }
  }
There are other ways to approach this, but for me this is a very fast way of nailing down 'I now something exists in this multi-repo project but don't know where exactly nor the exact name'

edit this comes out of https://github.com/junegunn/fzf/blob/master/ADVANCED.md and even though you might not want to use most of what is in there, it's still worth glancing over it to get ideas of what you could do with it



Infact I would recommend a step further to integrate rip-grep-all (rga) with fzf that can do a fuzzy search not just on text files but on all types of files including pdfs, zip files. More details here [1]

[1] https://github.com/phiresky/ripgrep-all/wiki/fzf-Integration



That's really nice, thanks. Long ago there was the Google Desktop Search where you could 'Google' your local documents. But the difference is that that worked with an index, so I imagine it's faster if you have thousands of pdfs en epubs.


Even longer ago, there was `glimpse`: https://www.linuxjournal.com/article/1164 which is still available. [1] glimpse's index-builds are like 10X slower than `qgrep` mentioned elsethread. `qgrep` also seems to have faster search (though I only tried a few patterns) and `qgrep` does not allow spelling errors like `glimpse`.

Neither `glimpse` nor `qgrep`, to my knowledge, directly supports pre-processing / document conversion (like `pdftotext`), though I imagine this would be easy to add to either replicating Desktop Search. (Indirectly, at some space cost, you could always dump conversions into a shadow file hierarchy, index that, and then translate path names.)

[1] https://manpages.ubuntu.com/manpages/focal/man1/glimpse.1.ht...



I wrote a bash version of this:

  function frg {
    result=`rg --ignore-case --color=always --line-number --no-heading "$@" |
      fzf --ansi \
          --color 'hl:-1:underline,hl+:-1:underline:reverse' \
          --delimiter ':' \
          --preview "bat --color=always {1} --theme='Solarized (light)' --highlight-line {2}" \
          --preview-window 'up,60%,border-bottom,+{2}+3/3,~3'`
    file="${result%%:*}"
    linenumber=`echo "${result}" | cut -d: -f2`
    if [ ! -z "$file" ]; then
            $EDITOR +"${linenumber}" "$file"
    fi
  }


I wrote a zsh version of this:

  function frg {
      result=$(rg --ignore-case --color=always --line-number --no-heading "$@" |
        fzf --ansi \
            --color 'hl:-1:underline,hl+:-1:underline:reverse' \
            --delimiter ':' \
            --preview "bat --color=always {1} --theme='Solarized (light)' --highlight-line {2}" \
            --preview-window 'up,60%,border-bottom,+{2}+3/3,~3')
      file=${result%%:*}
      linenumber=$(echo "${result}" | cut -d: -f2)
      if [[ -n "$file" ]]; then
              $EDITOR +"${linenumber}" "$file"
      fi
    }


I wrote a fish version, and simplified it:

    function frg --description "rg tui built with fzf and bat"
        rg --ignore-case --color=always --line-number --no-heading "$argv" |
            fzf --ansi \
                --color 'hl:-1:underline,hl+:-1:underline:reverse' \
                --delimiter ':' \
                --preview "bat --color=always {1} --theme='Solarized (light)' --highlight-line {2}" \
                --preview-window 'up,60%,border-bottom,+{2}+3/3,~3' \
                --bind "enter:become($EDITOR +{2} {1})"
    end
Still not a fan of the string-based injections based on the colon and newline characters, but all versions suffer from it. (also: nice that fzf does the right thing and prevents space and quote injection by default).


I love this, thank you! If anyone else wants to open the file in VScode, the command is

    code -g "$file:$linenumber"


Awesome. Thanks. Saved me some time. I haven't used the fzf integration like this.


I've never really seen PowerShell beyond minimal commands, but after seeing the parent, I definitely think it has the superior syntax of the shells. Especially for scripts.


I expected to like Powershell when I began working somewhere with a lot of Windows (after decades of mostly Linux). I figured on paper this sounds like it has learned many important lessons that Unix shells could learn but (at least the popular ones) didn't, it's been given a blank canvas, the principles it's working to make sense, it has good people behind it. So I even undertook to write a modest new piece of glue code in Power Shell, after all if it had been on Linux I'd definitely consider the Bourne shell as well as Python for the work...

Then I tried it and I strongly dislike it. The syntax is clunky, it's really no better than popular Unix shells at being a "real" programming language, and yet it's not as good as they are at being just a shell either.

It also just doesn't feel like a quality product. On my work Windows laptop, Powershell will sometimes not quite bother flushing after it starts, so I get the banner text and then... I have to hit "enter" to get it to finish up and write a prompt. In JSON parsing the provided JSON parser has some arbitrary limits... which vary from one version to another. So code which worked fine on machine #1 just silently doesn't work on machine #2 since the JSON parsers were changed and nobody apparently thought that was worth calling out. If you told me this was the beta of Microsoft's new product I'd be excited but feel I needed to provide lots of feedback. Knowing this is the finished product I am underwhelmed.



I find the built-in commands rough. "curl https://jrock.us" to see if my website is up used to involve opening Internet Explorer to accept some sort of agreement. Now it just flashes the terminal, moves the cursor to the far right hand side of the screen, and blinks for a while. I like the Linux version of curl better...


Ironically, windows 10+ comes with real curl installed, to use it type curl.exe instead


I had no idea!

As it turns out, the reason that "curl ..." doesn't work is because it pops up a window below all of my other windows saying that certificate revocation information is unavailable, and would I like to proceed. After that it does download my web page!



> It also just doesn't feel like a quality product. On my work Windows laptop, Powershell will sometimes not quite bother flushing after it starts, so I get the banner text and then..

That’s independent of the shell, and is I believe a bug in the terminal emulator. There is an open source Windows Terminal you can separately install and that is so much better.



Nope, windows terminal definitely does this too. Just last week I was trying to install WSL, and thought it had frozen at 20% and was trying to figure out what went wrong ... turns out it had already booted but powershell had stopped flushing output.


> The syntax is clunky, it's really no better than popular Unix shells at being a "real" programming language...

YMMV (and obviously does). I think that powershell is night and day better than bash (etc) as a programming language.



Maybe if the "popular" shells, but http://www.nushell.sh/ is looking better and better


nu acknowledged powershell as one of its inspirations, yeah!!


I use it near exclusively. Love nu.


A lot of people sleep on PowerShell, possibly because some of the syntax is a little clunky (and quite slow compared to some other shells, I will freely admit). That being said, I'd argue object oriented programming is a massive improvement over text oriented programming. I never want to touch awk again!


Unlike most Microsoft things it was not constrained by back compatibility.

I generally don't like MS software, but their commitment to back compatibility is worth calling out.



PowerShell is the worst of all worlds. Its a terrible shell compared to bash/zsh/whateversh, and for anything complex enough to need a long script you’re far better off in Python.


The only thing making it less than a total win is its handling of piped errors and `set -e`. The programming model itself is far superior to 'stringly-typed' sh.


1. Thanks for this, instantly added to my dotfiles

2. You nerd-sniped me into getting rid of the unnecessary `cut` process :)

  file=${result%%:*}
  line=${result#*:}
  line=${line%%:*}


Thanks! I have tried using bash substitution to solve it but failed (I just learned the difference between "#" and "##").


FYI, these substitutions are POSIX:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

Also a couple mnemonic hints:

% vs #. On US keyboards, # is shift-3 and is to the left of % which is shift-5. So # matches on the start (left) and % matches on the end (right).

# vs ## (or % vs %%). Doubling the character makes the match greedy. It's twice a wide so it needs to eat more.

Bash also supports ${parameter/pattern/string} and ${parameter//pattern/string} (and a bunch others besides) which are not POSIX:

https://www.gnu.org/software/bash/manual/html_node/Shell-Par...



Thanks for the great information!


Jesus, bash really is an abomination


Wow, nice, thanks! :heart:


great stuff


Vim is almost broken for me without fzf+rg. Feels like I’m manually grinding coffee instead of using electricity.


This comment got me to get out my French press and manually grind some beans. It wasn't a meditative and calming as I remember, and the coffee tastes a little...dusty. I guess it's time for me to update my vimrc.


Aeropress is a direct upgrade from french press and uses way less coffee


I hadn't heard of this before. Thanks!


Add one or a few drops of water to your roasted coffee beans with your hand and shake well after weighing it out to stop the grinds from sticking to the walls of your grinder from static.


I love that a ripgrep article has such a deeply nerdy coffee thread…

> Add one or a few drops of water to your roasted coffee beans

Ah, RDT (Ross Droplet Technique)[0].

A little atomizer (“spritz” bottle) of plain water serves well here. NB: this is for single-dose grinding - e.g. measuring a small amount of beans loaded into a grinder to grind immediately. If you have a grinder with a “big” hopper on top that has (e.g.) the weeks worth of coffee (even though you grind on-demand for ea. espresso/french press/aeropress/pourover/drip/…) this isn’t for you.

[0] https://thebasicbarista.com/en-us/blogs/topics/how-rdt-broke...



This thread took a turn


I use my Aeropress every day but I wouldn't say it's "better" than French press, it's just different. Using less coffee but finer ground changes the characteristics of the brew quite a bit (probably some technical reason about extraction level or something).


I’m glad my analogy sprouted another branch in this conversation :)


Which integration for ripgrep do you use with Vim?


Can't speak for OP, but I use telescope for neovim and I don't think I could use (neo)vim without it.


Telescope is cool, but last I checked it was neovim only or recommended and I’m a regular-Vim holdout.


I modified some functions from: https://github.com/junegunn/fzf.vim

And added my keyboard shortcuts.



With fzf you can add lots of files to git while skipping some if you want:

    fza = "!git ls-files -m -o --exclude-standard | fzf -m --print0 | xargs -0 git add"
With that in the [alias] section of a gitconfig file, running git fza brings up a list of modified and not yet added files, space toggles each entry and moves to the next entry.

That alias as well as fzf+fd really speed up some parts of my workflow.

Oh and shameless plug for my guide on what to include in your zsh setup on macOS: https://gist.github.com/aclarknexient/0ffcb98aa262c585c49d4b...



Add the preview to see what you're actually stashing:

   git ls-files -m -o --exclude-standard | fzf -m --print0 --preview "git diff {1}" | ....
And that's just the start: it could even be that by binding a key to the fzf reload command to then display the diff in it's finder, and in turn a key to stage the selected line, you could turn that into an interactive git staging tool.


Nice. I'll have to try that out!


That blew my mind. I've used fzf a couple time here and there, but now I Get It. Thanks!


This is pretty much my exact use of ripgrep, too. I use it as a starting point to zero in on files/projects in a several-hundred repo codebase, and then go from there....


Not bad. For those who want to try it, install all prereqs with: choco install fzf bat ripgrep

How do you scroll the preview window with keyboard ?



> How do you scroll the preview window with keyboard ?

    alias pf="fzf --preview='less {}' --bind shift-up:preview-page-up,shift-down:preview-page-down"
That will let you run `pf` to preview files in less and lets you use shift + arrow keys to scroll the preview window. No dependencies are needed except for fzf. If you want to use ripgrep with fzf you can set FZF_DEFAULT_COMMAND to run rg such as `export FZF_DEFAULT_COMMAND="rg ..."` where ... are your preferred rg flags. This full setup is in my dotfiles at https://github.com/nickjj/dotfiles.

I've made a video and blog post about it here: https://nickjanetakis.com/blog/customize-fzf-ctrl-t-binding-...

I also made https://nickjanetakis.com/blog/using-fzf-to-preview-text-fil... which covers how to modify fzf's built in CTRL+t shortcut to allow for previews too. CTRL+t is a hotkey driven way to fuzzy match a list of files.



shift-up/down


Oh. Thanks for the tip. This might make me finally embrace powershell. I’ve been using WSL+zsh+fzf as a Windows CLI for continuity with day job Mac tools, but git CLI performance is only usable inside the WSL file system.


You can also add a small script to your WSL under `/usr/local/bin/git`:

  GIT_WINDOWS="/mnt/c/Program Files/Git/bin/git.exe"
  GIT_LINUX="/usr/bin/git"
  
  case "$(pwd -P)" in
  /mnt/?/*)
    case "$@" in
    # Needed to fix prompt, but it breaks things like paging, colours, etc
    rev-parse*)
      # running linux git for rev-parse seems faster, even without translating paths
      exec "$GIT_LINUX" "$@"
      ;;
    *)
      exec "$GIT_WINDOWS" -c color.ui=always "$@"
      ;;
    esac
    ;;
  *)
    exec "$GIT_LINUX" "$@"
    ;;
  esac

This allows you to use `git` in your WSL shell but it'll pick whichever executable is suitable for the filesystem that the repo is in :)


Thank you. I will use this.


The code as written above only works if you haven't changed the mountpoint for your windows partition (i.e. from /mnt), consider that


This might make me finally embrace powershell

Yeah, I have a bit of a love-hate relationship with it. But I actually have that with all shells out there. I don't know if it's just me or the shells, or (the most likely I think): a bit of both. But PS is available out of the box and using objects vs plain text is a major win in my book, and even though I still don't know half of the syntax by heart it feels less of an endless fight than other shells. And since I use the shell itself for rather basic things and for the rest only for tools (like shown here), we get along just fine.



Thanks for this!


This is a gem; thank you.


I use it neovim with fzf search.


I use ripgrep with the Emacs packages project.el (comes out of the box) and dumb-jump (needs to be installed). This may not be the most popular way of using rg but I have been very pleased with the overall experience. All it takes is running package-install to install the dumb-jump package and configuring the following hook:

  (add-hook 'xref-backend-functions #'dumb-jump-xref-activate)
The Xref key sequences and commands work fine with it. If I type M-. (or C-u M-.) to find definitions of an identifier in a Python project, dumb-jump runs a command like the following, processes the results, and displays the results in an Xref buffer.

  rg --color never --no-heading --line-number -U --pcre2 --type py '\s*\bfoo\s*=[^=\n]+|def\s*foo\b\s*\(|class\s*foo\b\s*\(?' /path/to/git/project/
The above command shows how dumb-jump automatically restricts the search to the current file type within the current project directory. If no project directory is found, it defaults to the home directory.

By the way, dumb-jump supports the silver searcher tool ag too which happens to be quite fast as well. If neither ag nor rg is found, it defaults to grep which as one would expect can be quite slow while searching the whole home directory.



Addendum to my comment above:

Ripgrep can be used quite easily with the project.el package too that comes out of the box in Emacs. So it is not really necessary to install an external package to make use of ripgrep within Emacs. We first need to configure xref-search-program to ripgrep as shown below, otherwise it defaults to grep which can be quite slow on large directories:

  (setq xref-search-program 'ripgrep)
Then a project search with C-x p g foo RET ends up executing a command like the following on the current project directory:

  rg -i --null -nH --no-heading --no-messages -g '!*/' -e foo
The results are displayed in an Xref buffer again which in my opinion is the best thing about using external search tools within Emacs. The Xref key sequences like n (next match), p (previous match), RET (jump to source of match), C-o (show the source of the match in a split window), etc. make navigating the results a breeze!


Author of ripgrep here.

Looking at your regex---just by inspection, I haven't tried it, so I could be wrong---but I think you can drop the --pcre2 flag. I also think you can drop the second and third \b assertion. You might need the first one though.



The example I have posted in my comment is not a command I am typing myself. The dumb-jump package generates this command for us automatically. It is possible to customize the command it generates though. Indeed while running ripgrep manually, I do not use the --pcre2 option. Thank you for developing and maintaining this excellent tool!


Oooo gotya! That makes more sense. Thanks for the clarification.


Deadgrep (uses ripgrep and evil-collection has a binding) takes me to my happy place -

https://github.com/Wilfred/deadgrep



this is a good option but i still use rg.el for the occasion that i want to search several projects at once or a subfolder within a project, where i would otherwise use ‘rgrep’


What's interesting is that ripgrep now also powers VS Code search with a Node.js wrapper.

https://www.npmjs.com/package/@vscode/ripgrep



I've always wondered how in world search in VS Code is so fast given it's an Electron app - now I know.


...which is awesome if you can request/install VS Code but not ripgrep.

You can find the rg binary in the VS installation (at least, I can on Windows at my place of employment).



Hello thank you for pointing this out. I hate how slow grep is in Windows :( and I cannot install rg (I have no choice in the OS at work)


It's not new it has been in vscode for 7 years.


I did not know this that's actually very interesting.


I've been using ripgrep for about 2 years now and I find in indispensable. The main reason I switched from grep was ease of use. From the README: "By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files." Typing `rg search_term directory` is much better than the corresponding grep command, but the speed improvement is also a nice bonus.

Random other helpful flag I use often is -M if any of the matches are way too long to read through and cause a lot of terminal chaos. Just add `-M 1000` or adjust the number for your needs and the really long matches will omit the text context in the results.



Yeah, the -M command is wonderful (super handy for ignoring minified files that you don't want to see results from, etc), and also great is the -g command (eg `-g *.cs` and you'll just search in files that have the .cs extension).

Also the fact that it is a standalone portable executable can be super handy. Often when working on a new machine, I'll drop in the executable and an alias for grep that points to rg, so if muscle memory kicks in and I type grep it will still use rg.



If you're a fan of the -g flag to ripgrep then I also recommend checking out the -t flag, short for --type, which lets you search specific file types. You can see the full list with `rg --type-list`. For example, you could just search .cs files with `rg -tcs`.

This flag is especially convenient if you want to search e.g. .yml and .yaml in one go, or .c and .h in one go, etc.



Thanks, I didn't know about `-t`, I'll read up on it.


-t is useful but -g doesn't require lookup in help first. Maybe the worse is better principle?


Tbh I've always just typed -t followed by something that feels intuitive and it's always worked. Never really bothered looking in help until I made the above comment.


And this may still be true in 2023, but the problem is that most of the parallelized grep replacements (e.g. ripgrep, ag, etc.) are SO much faster than grep that the much small speed differences between them doesn't provide much of a basis for differentiating them.

I use ag (typically from inside Emacs) on a 900k LOC codebase and it is effectively instantaneous (on a 16 core Ryzen Threadripper 2950X). I just don't have a need to go from less than 1 second to "a bit less than less than 1 second".

Speed is not the defining attribute of the "new greps" - they need to be assessed and compared in other ways.



In 2016, I'd say speed was definitely a defining attribute. ag has very significant performance cliffs. You can see them in the blog post.

But as I mentioned in my comparison to qgrep elsewhere in the thread, everyone has different workloads. And for some workloads, perf differences might not matter. It really just depends. 900 KLOC isn't that big, and indeed, for simple queries pretty much any non-naive grep is going to chew through it very very quickly.

As for comparisons in other ways, at least for ag, it's on life support. I thought it was going to get removed from Debian, but it looks like someone rescued it: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=999962

The blog post also compares Unicode support, and contextualizes its performance. ag essentially has zero Unicode support. Unicode support isn't universally applicable of course---you may not care about it---but it satisfies your non-perf comparison criteria. :-)



The title needs “(2016)”. This is the original announcement, not new information.


Discussions:

"Ripgrep – A new command line search tool" https://news.ycombinator.com/item?id=12564442 (740 points | Sept 23, 2016 | 209 comments) - there are discussions related to speed too

"Ripgrep is faster (2016)" https://news.ycombinator.com/item?id=17941319 (98 points | Sept 8, 2018 | 40 comments)



well... it is not faster than qgrep :) even though the way both work - differs greatly, and even though qgrep is based on re2 - the speed comes from the presence of index. but then I wonder why people forget the qgrep option, since with large file stores it makes much more sense to use qgrep AND indices, rather than always go through all the files.

this above all true UNLESS you need multi-line matches with UTF8, where ripgrep is not so fast, because it needs to fall back to the other PCRE2 lib



Author of ripgrep here.

Yes, qgrep uses indexing, which will always give it a leg up over other tools that don't use indexing. But of course, now you need to setup and maintain an index. The UX isn't quite as simple as "just run a search."

But there isn't much of a mystery here. Someone might neglect to use qgrep for exactly the same reason that "grep is fast enough for me" might prevent someone from using ripgrep. And indeed, "grep is fast enough" is very much true in some non-trivial fraction of cases. There are many many searches in which you won't be able to perceive the speed difference between ripgrep and grep, if any exists. And, analogously, the difference between qgrep and ripgrep. The cases I'm thinking of tend to be small haystacks. If you have only a small thing to search, then perhaps even the speed of a "naive" grep is fast enough.

So if ripgrep, say, completes a search of the Linux kernel in under 100ms, is that annoying enough to push you towards a different kind of tool that uses indexing? Maybe, depends on what you're doing. But probably not for standard interactive usage.

This is my interpretation anyway of your wonderment of (in your words) "why people forget the qgrep option." YMMV.

I have flirted with the idea of adding indexing to ripgrep: https://github.com/BurntSushi/ripgrep/issues/1497

> this above all true UNLESS you need multi-line matches with UTF8, where ripgrep is not so fast, because it needs to fall back to the other PCRE2 lib

That's not true. Multiline searches certainly do not require PCRE2. I don't know what you mean by "with UTF8," but the default regex engine has Unicode support.

PCRE2 is a fully optional dependency of ripgrep. You can build ripgrep without PCRE2 and it will still have multiline search support.



Does `build.rs` build the project? One of my favorite (Big Corp) code-bases just had a single C file (build.c) that did all the dependency tracking, like, Make, but in some nicely written (easy to understand) C code. The C file started with a shebang: a self-building-and-executing line, so we'd do this:

``` ./build.c ```

... and then magic happened.



No. Cargo does. The `build.rs` is basically a Cargo hook that gets compiled as a Rust program and executed just before the ripgrep binary is compiled. It lets you do things like set linker flags[1] so that you can embed an XML manifest into the binary on Windows to enable "long path support."

ripgrep's build.rs used to do more, like build shell completions and the man page. But that's now part of ripgrep proper. e.g., `rg --generate man` writes roff to stdout.

[1]: https://github.com/BurntSushi/ripgrep/blob/2a4dba3fbfef944c5...

[2]: https://github.com/BurntSushi/ripgrep/blob/2a4dba3fbfef944c5...



I searched in portage, and it seems there is another version working also with other documents like PDFs and doc.

https://github.com/phiresky/ripgrep-all



What are the reasons for grep not being replaced/improved? This topic seems a bit old by now.


There's like a whole host of things you could use to explain it. Inertia. Compatibility. Resistance to change. Innovator's dilemma. And so on. (I do not say any of these things pejoratively! All of those things apply to me too.)

With respect to compatibility, see my FAQ on the topic: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...



For the same reason the 40yo chair I currently sit in is not being replaced with Razer UltraSeat XR3000-A. It's comfortable, fits the workplace around it, and there's no reason for getting a replacement and rebuilding everything. (Partially because a Razer-like chair already stands nearby taking care of my clothes, but that's where the analogy ends.)


grep is a general purpose tool for searching for text in all types of files, baked into the standards for UNIX. Some programmers use it to search source code. Other people use it for other types of text searches that have nothing to do with source code, they rely on it in scripts, they don't use it as part of a text-based programmer UI, they rely on it to never crash, etc.

ripgrep is a specialist, opinionated tool, designed primarily to search through source code repositories.

There's not much you can add to general purpose text search to make it faster; you can make it use mmap() at the risk of it crashing on truncated files, you can reduce the expressiveness of regular expressions so they can be computed faster. You could throw out general support for all locales and charsets and hardcode support for only UTF-8 / UTF-16, but you shouldn't.



> There's not much you can add to general purpose text search to make it faster

Oh I beg to differ! The blog post goes into this. Here's a simple demonstration using ripgrep 14:

    $ ls -l full.txt
    -rw-rw-r-- 1 andrew users 13113340782 Sep 29 12:30 full.txt

    $ time rg -c --no-mmap 'Clipton' full.txt
    294

    real    1.419
    user    0.539
    sys     0.879
    maxmem  15 MB
    faults  0

    $ time LC_ALL=C grep -c 'Clipton' full.txt
    294

    real    6.911
    user    6.078
    sys     0.829
    maxmem  15 MB
    faults  0

    $ time rg -c --no-mmap 'DMZ|Clipton' full.txt
    1070

    real    1.643
    user    0.747
    sys     0.894
    maxmem  15 MB
    faults  0

    $ time LC_ALL=C grep -E -c 'DMZ|Clipton' full.txt
    1070

    real    8.317
    user    7.384
    sys     0.930
    maxmem  15 MB
    faults  0
No memory maps. No multi-threading. No filtering. No fancy regex engine features or reducing expressiveness. No locales. No UTF-8. No UTF-16. Just a simple literal and a simple alternation of literals. It's just better algorithms.

Also, you can disable ripgrep's opinions with `-uuu`. It's not designed to just be for code searching. You can use it for normal grepping too. It will even automatically revert to the standard grep line format in shell pipelines.



> you can make it use mmap() at the risk of it crashing on truncated files

I was under the impression that grep removed mmap() support because it was slower than normal file i/o



I talked about this in the OP. Memory maps are sometimes a little faster. See:

    $ ls -l full.txt
    -rw-rw-r-- 1 andrew users 13113340782 Sep 29 12:30 full.txt

    $ time rg -c --no-mmap Clipton full.txt
    294

    real    1.337
    user    0.470
    sys     0.866
    maxmem  15 MB
    faults  0

    $ time rg -c --mmap Clipton full.txt
    294

    real    1.045
    user    0.722
    sys     0.323
    maxmem  12511 MB
    faults  0
But in recursive search, especially when used for lots of little files, they end up provoking substantial overhead that slows everything down.

And this might change depending on the platform.



There are multiple alternatives you can already use as an alternative, like ripgrep. What are you proposing, switching out the command `grep` for another utility?

Sounds like that could introduce a ton of breakage, for little value. People who want a faster grep will use a different thing, while people who use grep can continue to use it. Sounds like an ideal situation already.



Another reason would be if you want to follow POSIX standards. For example, `GNU grep` supports `POSIXLY_CORRECT` environment variable.


It still wastes time of all the people on the road to that realization, including the time spent looking for a better alternative


Someone designed unix based on the idea that some system functions are both core OS functions AND tools for human use. That leads to some bizarre outcomes decades later like "there must be a program called xyz that accepts these arguments and works exactly like this".


These benchmarking results are seven years old, so perhaps it has been.

My entirely anecdotal and unscientific impression is that rg and grep perform similarly on Linux (though rg has nicer defaults for searching through source code). The old version of grep that Apple preinstalls on the Mac was slower last time I checked though.



Lack of interest maybe? I don't use grep, I use an UI that lets me click and jump to a file. Or the builtin search in my IDE.


Honestly: I rely on my shell scripts working and any "grep replacement" has to work with all the old crusty shell scripts out there, likely including ones that use odd "quirks" and GNU options.

If you want to innovate in this space, why sign up for all that? Invent a better wheel, and if people like it, they'll migrate over time.

I remember using ag in the old days, and I use rg now. But there's things rg does by default that I don't like at times... so I go back to old fashioned grep.

rg is at the point where many programmers use it. I think it is on its way to becoming one of those "standard tools". It needs... another 5 years?

When POSIX has a rg standard... we'll know ripgrep "succeeded" and teargrep will soon come into existence ;)



Oh my heavens, I would never let ripgrep into POSIX. You can pry it out of my cold dead hands. :-)

> so I go back to old fashioned grep

If you do `rg -uuu` then it should search the same stuff grep will. Not sure if that's what you meant though.



I guess it is because after decades of use, grep has probably been fixed to handle lots of user cases that the new tools don´t handle because they haven´t found them yet.


Author of ripgrep here.

Like automatic encoding detection and transparently searching UTF-16?

Or simple ways for composing character classes, e.g., `[\pL&&\p{Greek}]` for all codepoints in the Greek script that are letters. Another favorite of mine is `\P{ascii}`, which will search for any codepoint that isn't in the ASCII subset.

Or more sophisticated filtering features that let you automatically respect things like gitignore rules.

Those are all things that ripgrep does that grep does not. So I do not favor this explanation personally.

ripgrep has just about all of the functionality that GNU grep does. I would say the two biggest missing pieces at this point are:

* POSIX locale support. (But this might be a feature[1].)

* Support for "basic" regexes or some equivalent that flips the escaping rules around. i.e., You need to write `\+` to match 1 or more things, where as `+` will just match `+ literally.

Otherwise, ripgrep has unfortunately grown just about as many flags as GNU grep.

[1]: https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...



Complete guess: it works just fine for 99.9999% of users, but greater than (1-0.999999)% chance that it would break compatibility or have a bug. Anyone who would need the performance gain would know about specialized alternatives.


Using Ripgrep via Consult [1] in Emacs is bliss. It's like the rg+fzf thing that some have made, but all inside Emacs. I use the `consult-ripgrep` command all the time, and sometimes I use it to make project-wide edits too! Workflow is search with `consult-ripgrep` -> export results to buffer -> edit buffer -> commit edits back to files. Details at [2] (includes video of me working it)

[1]: https://github.com/minad/consult#grep-and-find [2]: https://lambdaland.org/posts/2023-05-31_warp_factor_refactor...



One thing I wish ripgrep had is support for AND conditions.


If you don't mind using PCRE, you can do it. For example:

    rg -P '(?=.*pat1)(?=.*pat2)(?=.*pat3)'
You could create a shell function shortcut if you need to use it often. But yeah, having it as a feature of the tool itself would be nice.


Maybe I'm missing something but I only use it with AND conditions (usually in the form of 'foo 'bar and it only matches lines with foo AND bar both present)


A search for `rg -e foo -e bar` will return lines that match either foo or bar. Some lines may have both, but it isn't required.

The standard way to run "AND" queries is through shell pipelines. That is, `rg foo | rg bar` will only print lines containing both. But composition usually comes with costs. The output reverts to the standard grep format and it doesn't interact nicely with contextual options like -C/--context.

See: https://github.com/BurntSushi/ripgrep/issues/875



As I have overloaded my rg with a customized rg alias, I can't pipe multiple rg calls.

Otherwise it would look like this:

    # rg nokogiri | rg linux
    
    :11:Gemfile.lock:647:  nokogiri (1.15.5-x86_64-linux)
But that is a me problem.

The workaround is of course just to pipe into grep instead.



Or `\rg`, which will use the command directly and skip your alias.


Oh, look at that. Nice.

Still losing the coloring but you can't have everything.



That’s one of the reasons I made this actually https://github.com/boyter/cs

I wanted and boolean syntax mixed with fzf instant search. It’s not as fast as ripgrep of course but it’s not solving the same problem.



ripgrep is easily one of my most used and most loved tools. I use it directly and also have it set as my grpprg in neovim.


Grep is already pretty much instant so what does it matter?


I love grep, but ripgrep really improves on it.

Very often, I don't want to look for files that aren't tracked under Git VC, and I'm not looking for matches in binary files, so by default, ripgrep does that, which can cut time by 99%. I used to grep in small dirs, now I can I can ripgrep in my whole home, not that I do it, but I can. That + Sourcegraph on master branch, and it makes searching for any other thing than plain text feel sooo slow (Atlassian Confluence and Jira, Google docs, etc.).

Thank you so much Burntsushi and contributors!



did you try it on big dataset? there is a huge difference actually


I've been using ripgrep for the last year to quickly search massive database dumps. I compared it with grep and it's a game changer.


I messed up my conda environments once when I changed my username. So many references to the old username in the conda folders. Ripgrep saved the day!


Why not just make new ones? Regardless of whether I use Conda, Pyenv or virtualenv, I always consider the environments to be disposable.


It takes forever (like an hour) to install the data science stack plus pytorch and cuda libraries plus other hardware related libraries.


fastmod is another nice and user-friendly Rust tool for such cases.


I love ripgrep, it's searched a directory in a half-second for a pattern that took GNU grep literally fifteen minutes.


Would be nice if ripgrep was drop in compatible with grep. I'd feel like a dick writing a shell script for other people to use and forcing them to install a new grep


Never will be: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

If you need drop-in compatibility with grep, then use grep. :-)



Semi off-topic, I've coded a ncurses-frontend to navigate and filter grep-like results which might be of interest to some of you: https://github.com/gquere/ngp2


Why not just use :grep in vim and navigate the vim quick fix list?


I love ripgrep for the speed and the more sane defaults. I use it nearly every day.

For those just using it to search through a codebase, don't forget -F for string literals.



But is any of them using `_mm256_sad_epu8` for small, literal strings?


Not exactly but yes, it ultimately uses the `memchr` crate [1] which provides SIMD-optimized character and string search routines. But it uses `_mm256_cmpeq_epi8` instead of `_mm256_sad_epu8`.

[1] https://docs.rs/memchr/latest/memchr/



And also the SIMD in aho-corasick, which is used whenever a small number of literals are searched for. For example, `foo|bar` or `(?i)foo`.

https://github.com/BurntSushi/aho-corasick/blob/f227162f7c56...

But no `_mm256_sad_epu8`. What an oddly specific question..?



What is really oddly specific is this instruction :p but I believe it is a common trick to quickly scan for short string matches.

It computes `sum(|x[i] - y[i]|)` for consecutive `i` at different offsets, so it should be zero at substring matches.

For context: https://epubs.siam.org/doi/pdf/10.1137/1.9781611972931.10

I was slightly mistaken, the instruction of interest is _mm256_mpsadbw_epu8



Oh I see. Yes, that's what is commonly used in academic publications. But I've yet to see it used in the wild.

I mentioned exactly that paper (I believe) in my write-up on Teddy: https://github.com/BurntSushi/aho-corasick/tree/master/src/p...



I think Wojciech Muła, who devised the original SIMD-oriented Rabin Karp algorithm, also did measure MPSADBW approaches and found that it is not a good fit for general string-in-string searches [1]. Maybe not today though.

[1] http://0x80.pl/articles/simd-strfind.html#id7



_mm256_:(_epu8


I've been using ag forever - will check out ripgrep.


I switched from from ripgrep to ugrep and never looked back. It's just as fast, but also comes with fuzzy matching (which is super useful), a TUI (useful for code reviews), and can also search in PDFs, archives, etc.

The optional Google search syntax also very convenient.

https://ugrep.com



I’m a die-hard ripgrep fan, but just recently found ugrep looking for one feature that ripgrep lacks: searching in zip archives (without decompressing them to disk).

Ugrep has that. In my case, I’m working with zipped corpora of millions of small text files, so I can skip unpacking the whole thing to the filesystem (certain filesystems have trouble at this scale).

I’m grateful for both tools. Thanks to the respective authors!



That's where ripgrep-all comes into play which will grep through archives, PDFs, ebooks, documents, etc.


I'm scared that if I start using Google search syntax in my grepping that I'll mostly get results trying to sell me something :)


So I was casually searching for "ugrep vs ripgrep" articles, when I stumbled upon a couple reddit posts where apparently the authors of ugrep and ripgrep seemed to have a multi-year feud on reddit, eg. https://www.reddit.com/r/programming/comments/120wqvr/ripgre...

So weird. I mean, it's just about some open source tool, right? :-/



I came across ugrep recently and I immediately recognized the organization as one that I had dealt with starting about 15 years go. The author is brilliant‡, but extremely prickly (sometimes even to paying customers). The author of ripgrep, on the other hand, has always seemed like someone who just wants to get on with the business of writing software that people use.

‡ The main commercial product of the ugrep author's company at the time was the gSOAP code generator (it may still be), and that it not only works but makes a reasonably good C and C++ API from WSDL is proof that it is the product of a genius madman. It also allowed you to create both the API and WSDL from a C++-ish header, and both .NET and Java WSDL tools worked perfectly with it. We needed it to work and work it did.

At the time, the generated API was just difficult enough to use that I generated another ~1k lines of code for that project. IIRC, the generated API is sort of handle-based, which requires a slightly different approach than the strict RAII approach we were using. Generating that code was a minor adventure (generating the gSOAP code from the header-ish file, generating doxygen XML from the generated gSOAP code, then generating the wrapper C++ from the doxygen XML).



Psst, don't tell him about emacs and vi(m)...

There are feuds about open source tools all the time. Text editors, Linux distros, shells, programming languages, desktop environments, etc... And ugrep vs ripgrep may be a poster child for C++ vs Rust.

It is not all bad, it drives progress, and it usually stays at a technical level, I've yet to see people killing each others for their choice of command line search tool.



This is so weird, even ripgrep's author is actively seeking conflict in ugrep's new release posts. Not a good colour on both of them.


Any examples? All I see in the recent release post is:

> ugrep is easily one of the if not most featureful grep programs in existence. And it is also fast.

which is burntsushi, ripgrep's author, defending ugrep from someone saying they only focus on performance at the cost of features.



At least how I read it the linked post was ugrep's author seeking conflict, not the ripgrep's.


How is it "seeking conflict" to correct factually wrong claims about your project?


Is the TUI better than just sending the results through fzf? For me the configurability and flexibility of fzf would be hard to compete with.


It's also a drop-in replacement for grep as it supports the same flags and regular expression syntax.


Thanks for mentioning this.

I think the killer feature is compatibility with existing grep command line switches. Not needing to learn a whole new set of options is quite nice.



Feels like it should always be included by default on grep level now.


git grep cannot even find a simple string in its repo.


What? Git grep is all you ever need in my experience, and it's ~faster than~ (edit: as fast as) ripgrep when searching a git repo.


> it's faster than ripgrep when searching a git repo.

Source? Ripgrep's benchmarks show it significantly faster.



See sibling comment


I really doubt git grep can outperform ripgrep in any tests... please provide some proof.


I tested this on a large repo in 2016 when I installed several tools (including rg and ag) to compare speed. I don't have the metrics anymore, but the results were pretty clear then. According to the benchmarks from the OP, git grep is pretty comparable to rg in a large git repo. I guess different benchmarks give slightly different results, but the OP acknowledges that git grep is very fast. Bonus is that it comes preinstalled with git and can search through commit history.


Author of ripgrep here.

It really just depends. The way I like to characterize `git grep` (at present) is that it has sharp performance cliffs. ripgrep has them too, to be sure, but I think it has fewer of them.

If you're just searching for a simple literal, `git grep` is decently fast:

    $ git remote -v
    origin  [email protected]:torvalds/linux (fetch)
    origin  [email protected]:torvalds/linux (push)

    $ git rev-parse HEAD
    f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6

    $ time LC_ALL=en_US.UTF-8 git grep -c -E 'PM_RESUME'
    Documentation/dev-tools/sparse.rst:3
    Documentation/translations/zh_CN/dev-tools/sparse.rst:3
    Documentation/translations/zh_TW/sparse.txt:3
    arch/arm/mach-omap2/omap-secure.h:1
    arch/arm/mach-omap2/pm33xx-core.c:1
    arch/x86/kernel/apm_32.c:1
    drivers/input/mouse/cyapa.h:1
    drivers/mtd/maps/pcmciamtd.c:1
    drivers/net/wireless/intersil/hostap/hostap_cs.c:1
    drivers/net/wwan/t7xx/t7xx_pci.c:15
    drivers/net/wwan/t7xx/t7xx_reg.h:7
    drivers/usb/mtu3/mtu3_hw_regs.h:1
    include/uapi/linux/apm_bios.h:1

    real    0.215
    user    0.421
    sys     1.226
    maxmem  161 MB
    faults  0

    $ time rg -c 'PM_RESUME'
    drivers/mtd/maps/pcmciamtd.c:1
    drivers/net/wwan/t7xx/t7xx_reg.h:7
    drivers/net/wwan/t7xx/t7xx_pci.c:15
    drivers/net/wireless/intersil/hostap/hostap_cs.c:1
    drivers/usb/mtu3/mtu3_hw_regs.h:1
    drivers/input/mouse/cyapa.h:1
    arch/x86/kernel/apm_32.c:1
    Documentation/translations/zh_CN/dev-tools/sparse.rst:3
    Documentation/translations/zh_TW/sparse.txt:3
    Documentation/dev-tools/sparse.rst:3
    arch/arm/mach-omap2/pm33xx-core.c:1
    arch/arm/mach-omap2/omap-secure.h:1
    include/uapi/linux/apm_bios.h:1

    real    0.078
    user    0.259
    sys     0.577
    maxmem  15 MB
    faults  0
But if you switch it up and start adding regex things to your pattern, there can be substantial slowdowns:

    $ time LC_ALL=C git grep -c -E '\w{5,}\s+PM_RESUME'
    Documentation/dev-tools/sparse.rst:1
    Documentation/translations/zh_CN/dev-tools/sparse.rst:1
    Documentation/translations/zh_TW/sparse.txt:1

    real    5.704
    user    55.671
    sys     0.585
    maxmem  207 MB
    faults  0

    $ time LC_ALL=en_US.UTF-8 git grep -c -E '\w{5,}\s+PM_RESUME'
    Documentation/dev-tools/sparse.rst:1
    Documentation/translations/zh_CN/dev-tools/sparse.rst:1
    Documentation/translations/zh_TW/sparse.txt:1

    real    24.529
    user    4:34.42
    sys     0.753
    maxmem  211 MB
    faults  0

    $ time LC_ALL=en_US.UTF-8 git grep -c -P '\w{5,}\s+PM_RESUME'
    Documentation/dev-tools/sparse.rst:1
    Documentation/translations/zh_CN/dev-tools/sparse.rst:1
    Documentation/translations/zh_TW/sparse.txt:1

    real    1.372
    user    16.980
    sys     0.647
    maxmem  211 MB
    faults  1

    $ time rg -c '\w{5,}\s+PM_RESUME'
    Documentation/translations/zh_CN/dev-tools/sparse.rst:1
    Documentation/dev-tools/sparse.rst:1
    Documentation/translations/zh_TW/sparse.txt:1

    real    0.082
    user    0.226
    sys     0.612
    maxmem  18 MB
    faults  0
In the above cases, ripgrep has Unicode enabled. (It's enabled by default irrespective of locale settings. ripgrep doesn't interact with POSIX locales at all.)


Thanks for clarifying! I use `git grep -IPn --color=always --recurse-submodules` many times a day, every day. I hasn't yet let me down, but I don't search for unicode when working on source code. I do use regex though, using the -P switch.


git grep is shallow: only current git repo, rg is fully recursive, all submodules and also untracked (and not ignored) directories.

In some trees, git grep will be a lot faster because it searches a smaller part of it.



No, git grep can recurse, if you pass the flag, just like all other git commands. --recurse-submodules


I don’t think there’s been a point to using `git grep` since ack started parsing gitignore. As far as I’m concerned the use case of `git grep` is to search into non-checked-out trees (by giving it a tree-ish). And it’s not super great at that, because it searches a static tree-ish, so pickaxe filters are generally more useful (though they’re slow).

Once again mercurial has/had more useful defaults, `hg grep` searches through the history by default, that’s it’s job.



Small clarification: ack did not and does not respect your gitignore files. I just tried it myself, and indeed it doesn't. And this is consistent with the feature chart maintained by the author of ack: https://beyondgrep.com/feature-comparison/

One practical result of this is that it will mean `ack` will be quite slow when searching typical checkouts of Node.js or Rust projects, because it won't automatically ignore the `node_modules` or `target` directories. In both cases, those directories can become enormous.

`ack` will ignore things like `.git` by default though.

I believe `ag` was the first widely used grep-like tool that attempted to respect your .gitignore files automatically. (Besides, of course, `git grep`. But `git grep` behaves a little differently. It only searches what is tracked in the repo, and that may or may not be in sync with the rules in your gitignores.)



[flagged]



So? My distro doesn't come with almost any of the tools I need for day to day work. It has never been a problem for me to install a new editor or compiler on a new machine, I don't see why ripgrep would be any different. Especially since it's usually a single command to install anyway.


I can't think of a single time I've used grep where I thought "I wish this was faster".


Even if the answer is instant, you have a 50% performance improvement in your search just from typing "rg" instead of "grep"!

From my perspective it's a no brainer. I don't HAVE a grep (because I don't have a Unix) so when I install a grep, any grep, reaching for rg is natural. It's modern and maintained. I have no scripts anywhere that might expect grep to be called "grep".

Of course if you already have a grep (e.g. you run Unix/Linux) then the story is different. Your system probably already has a grep. Replacing it takes effort and that effort needs to have some return.



Well, a cmd script for msys64 grep in my \CmdTools is named `gr`. It feels more natural, because index-then-middle finger also does. Thinking of it, I actually hate starting anything with a middle finger (no pun). Also learning new things that do the same thing as the old one.


Even faster, I have an alias 'ss' (mnemonic for 'super search') for rg. Fitts' Law to the max!


What do you use a single "s" for?


git status --untracked-files=all

sn is 'git status --untracked-files=no'.



I am amused by this comment, because it shows a dramatically different type of thinking. I have probably have thought "I wish this was faster" for nearly everything I do on a computer :)


Multi-GB log files. Even with LC_ALL=C, grep is painfully slow.


That's probably true - but good Lord, one should probably do something to reduce the size of log files that large.


A few years ago I worked on a Solaris box that would lock the whole machine up whenever I grepped through the log files. Like it wouldn't just be slow, the web server that was running on it would literally stop serving requests while it was grepping.

I never worked out how that could be happening.



My best guess is your grep search was saturating I/O bandwidth, which slowed everything else to a crawl.

Another possibility is that your grep search was hogging up your system's memory. That might make it swap. On my systems which do not have swap enabled but do have overcommit enabled, I experience out-of-memory conditions as my system essentially freezing for some period of time until Linux's OOM-killer kicks in and kills the offending process.

I would say the first is more likely than the second. In order for grep to hog up memory, you need to be searching some pretty specific kinds of files. A simple log file probably won't do it. But... a big binary file? Sure:

    grep -a burntsushi /proc/self/pagemap
Don't try that one at home kids. You've been warned. (ripgrep should suffer the same fate.)

(There are other reasons for a system to lock up, but the above two are the ones that are pretty common for me. Well, in the past anyway. Now my machines have oodles of RAM and lots of I/O bandwidth.)



On extremely slow systems, such as Windows. There I can search in multiple repos only with rg.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com