#8: Advanced usage of grep

Posted by | Comments (6) | Trackbacks (2)

grep is probably one of the best known, or let's better say: the most used command line tools on UNIX systems. It's often used for things like

ps aux | grep someprocess

or anything related. That's very basic usage, but grep is so much more than just a very simple text search command.

First of all, let me make clear that grep is not only built to work with streams but also with files. As stated in Article #4: Cut your use of cat of this Advent series, there is a lot of meaningless use of grep in combination with cat an pipes. That usage is absolute nonsense. So if your work on files, don't use pipes, just write this:

grep expr file

where expr is your search expression and file is your file name. So much by way of introduction, let's come to the more crucial points.

Are you familiar with regular expressions? You should. In my opinion, each programmer and system administrator should at least know the basic facts about regular expressions. Personally I love regexps but I know that they cause many people headaches. There is no reason for that, don't be afraid of regular expressions. Once you've understood the concept they're quite simple.

grep basically understands three types of regular expressions: basic regexp, extended regexp and Perl compatible regexp (PCRE). When you run grep without any further parameter (or with -G), grep will assume your expression to be basic regexp. Basic regular expressions are specified by POSIX (Portable Operating System Interface for UNIX) but in grep they have a few extensions such as the quantifiers \? and \+. And here you already see the basic syntax: special meta characters have to be escaped with backslashes. So if you've ever bothered about grep not interpreting your regular expression right, it's because you haven't escaped the meta characters:

echo 'foooooooooobar' | grep 'fo\+b.\?r'

That'll work fine. The same applies to parentheses and quantifiers in braces, but not to squared brackets, which define character classes:

echo 'foofoobar' | grep '\(foo\)\{1,2\}[abr]\{3\}'

Notice the unescaped square brackets. Escaping these would erase their special meaning. The same for the meta characters . (any non-whitespace character) and * (quantifier equal to \{0,\}) and of course the backslash itself \. These only have a special meaning without backslashes, so escaping would make them normal characters.

This syntax is not very convenient, so there are extended regular expressions where all these meta characters are written without backslashes. To use extended regexps, specify the parameter -E:

echo 'foobar' | grep -E 'fo+b.?r'

Extended regular expressions also introduce a few more character classes like \w (word characters, equal to the POSIX class [[:alnum:]] or [a-zA-Z0-9]), \W (non-word characters, i.e. the opposite) and \b (word boundaries). There are also some more, for those have a look at the man page.

Extended regular expressions are ways more comfortable than basic regular expressions but they also have a few limitations. For instance, escape sequences like \d for numbers and \s for whitespace are not defined. These are included in the next level of regexp: the Perl compatible regular expressions. To use PCREs, specify the parameter -P:

echo 'foo2bar blablub' | grep -P '^[^\W]o+\d\w+\s(?:bl[aub]{1,2}){2}$'

If you know Perl or PHP you might have worked with PCREs already. If not, now is the time, it's fun! ;-)

That's the very, very basic introduction into regular expressions with grep. By the way, if you don't want to use regular expressions at all, set the -F parameter, which tells grep to handle your expressions as a fixed string, which has to match as is.

As yet we've passed our expression to grep as a single parameter. But you can also provide multiple expressions. For instance:

echo 'foobar' | grep -e foo -e bar

If you have just one expression, -e can be omitted. Another possibility is to load search patterns from a file with -f. Assuming we have a regexp containg file called mypattern, we can use it to match our string with:

echo 'foobar' | grep -f mypattern

grep has tons of other parameters which are very interesting and useful. I list the most important here:

  • -i: make the pattern case-insensitive (so x becomes equal to X).
  • -w: only match whole words
  • -c: don't print the result to the screen but just the number of matches.
  • -m N: only find N occurrences, then exit (N is a number).
  • -l: don't print the matches but the names of all files with matches.
  • -L: don't print the matches but the names of all files without matches.
  • -H: print the file name before each match. This is the default if you specified more than just one file to search (if working on STDIN, the output will be (standard input)).
  • -h: suppress file names, this is the default if you're searching only one file or operating on STDIN.
  • -n: print the line numbers of each match.
  • -o: show only matches without context.

There are many, many other parameters but I can't list them all. All of the above ones also have long names such as --ignore-case for -i and --word-regexp for -w, but I prefer the short ones. Whatsoever, I advise you to read the manual for grep carefully. There might be a lot you haven't known this tool can do. You can also work on binaries, device files and FIFOs, output the complete lines with matches on it, rather than just the matches themselves, and much more. There is a lot of hidden functionality, which not many people know about. So have fun with it and yeah… become a grep and (reg)expert!

Read more about grep and regular expressions:

Trackbacks

robo47 sent a Trackback on : (permalink)

RT @reflinux: #Advent series “24 Short #Linux #Hints”, day 8: Advanced usage of #grep http://bit.ly/i84ymb

Manko10 sent a Trackback on : (permalink)

RT @reflinux: #Advent series “24 Short #Linux #Hints”, day 8: Advanced usage of #grep http://bit.ly/i84ymb

Comments

There have been 6 comments submitted yet. Add one as well!
Basti
Basti wrote on : (permalink)

Hey,

yesterday i got an exception thrown by a php autoloader messaging me “no this.php found”. Short thought later i tried to grep for “new.+\$” on my shell using regex.
I tried it first using :
grep -P /new\s+\$/i *
-> no result
short friend advice later i escaped the $ with two backslashes, weired, but still no result (as the bash needs a double escaping for whatever reason). I ended up using egrep.

At the very end i tried different delimiters with grep -P and the double escaping on dollar -> got the same result with egrep.

Long story short grep —P seems to be still “highly experimental” smile

nice article and best regards

Janek Bevendorff
Janek Bevendorff wrote on : (permalink)

PCRE with grep actually IS experimental (as stated in the manual).
However, you just have to leave out the embedding slashes. Use this:

grep -iP 'new\s+\$' *

The parameter -i makes the expression case-insensitive.

Further on, egrep is the same as grep -E and fgrep the same as grep -F, but egrep and fgrep are both deprecated and you shouldn’t use them anymore.

Basti
Basti wrote on : (permalink)

That’s why i quoted “highly exp..” just wanted to throw in that short story – don’t blame me for that addition smile

Janek Bevendorff
Janek Bevendorff wrote on : (permalink)

I don’t blame you at all but appreciate all your comments very much. I just told you that PCREs in grep don’t have surrounding slashes and modifiers behind them. No harm meant. wink

cephas Kamuchira
cephas Kamuchira wrote on : (permalink)

Hi guys
How to use grep to pipe a mail that was send from some going to another address under 1 go cmd on linux ie from=<test@test.com and to=<sumone@someone.com from that /var/log/mail.log?

Janek Bevendorff
Janek Bevendorff wrote on : (permalink)

Hi cephas,

That highly depends on the format of your mail.log. But in general you only need to get the regular expression right.
If you’re not so good at regexp, you can use this website for testing your expressions.

Write a comment:

HTML-Tags will be converted to Entities.
Textile-formatting allowed
Standard emoticons like :-) and ;-) are converted to images.
Design and Code Copyright © 2010-2017 Janek Bevendorff Content on this site is published under the terms of the GNU Free Documentation License (GFDL). You may redistribute content only in compliance with these terms. tweetbackcheck