ZSH Gem #23: Working with extended regular expressions

Posted by | Comments (0) | Trackback (1)

There are two ZSH modules which allow you to easily work with POSIX extended regular expressions (POSIX ERE) or with Perl compatible regular expressions (PCRE) which are even more advanced than POSIX ERE. These two modules are zsh/regex and zsh/pcre. You can use either one of them or both at the same time. That's entirely up to you. I'll show you both.

First let me illustrate zsh/regex a bit which is the simpler one of both. zsh/regex provides, once loaded, the new conditional expression -regex-match which can be used in combination with the [[ command (e.g. in if conditions or loops):

# Load module
zmodload zsh/regex

# Execute POSIX ERE
[[ "foobar_123" -regex-match "^([a-zA-Z0-9]+)_([0-9]+)$" ]] && echo match || echo no match

The condition returns true if the expressions matches, otherwise false. If there are any matches, the whole matching part is stored in the $MATCH parameter and if there are any substrings in parentheses, these parts will be available in the array $match (here in our example you'd have an array with two elements containing foobar and 123).

Now that I've shown you zsh/regex let's come to the more complex zsh/pcre. zsh/pcre also provides a new conditional expression called -pcre-match which works about the same as -regex-match except that it accepts Perl compatible regular expressions. So we could rewrite our example from above as follows:

# Load module
zmodload zsh/pcre

# Execute PCRE
[[ "foobar_123" -pcre-match "^(\w+)_(\d+)$" ]] && echo match || echo no match

That's a little less to write. But zsh/pcre also provides a few new commands besides the conditional expression -pcre-match. The most important ones to know are pcre_compile and pcre_match. With the first one you compile a regular expression from a string and with the second one you use this compiled regular expression on other strings. That means you always need to use both in combination.

Both commands provide several flag parameters. The most important ones for pcre_compile are -m which will match multi-line patterns, -s which makes the dot pattern (.) match whitespace as well and -i which makes the pattern case-insensitive.

The most important flags for pcre_match are -v and -a which let you set different names for the match variable containing the whole matching part and the match array containing all the substrings from enclosing parentheses (which are again $MATCH and $match by default).

Our example from above with pcre_compile and pcre_match would look like this:

pcre_compile "^(\w+)_(\d+)$"
pcre_match "foobar_123" && echo match || echo no match

Sometimes it may be more to write, but it also gives you some more flexibility due to the arguments both commands can take. For example, the following simple case-insensitive regular expression

pcre_compile -i "^foobar\s+\d+$"
pcre_match "fOoBaR   123" && echo match || echo no match

would need such a monster expression if just performed with -pcre-match:

[[ "fOoBaR   123" -pcre-match "^[fF][oO]{2}[bB][aA][rR]\s+\d+$" ]] && echo match || echo no match

In this case the second variant is not just more to type (yes, that's true, count the characters), the first one is also much easier to read and less error-prone so I'd prefer that one.

Whichever variant you take and whether you prefer POSIX regular expressions or PCRE always depends on the situation. But all of them give you the full power of regular expressions. So use them!

Read more about zsh/regex and zsh/pcre:


Sergio sent a Trackback on : (permalink)

New Blog entry: #ZSH Gem #23: Working with extended regular expressions https://t.co/gVg9SZBGvX


No comments have been submitted yet. Be the first!

Write a comment:

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

By submitting a comment, you agree to our privacy policy.

Design and Code Copyright © 2010-2024 Janek Bevendorff Content on this site is published under the terms of the GNU Free Documentation License (GFDL). You may redistribute content only in compliance with these terms.