ZSH Gem #23: Working with extended regular expressions
There are two ZSH modules which allow you to easily work with POSIX extended regular expressions (POSIX ERE) or with Perl compatible regular expressions (PCRE) which are even more advanced than POSIX ERE. These two modules are zsh/regex
and zsh/pcre
. You can use either one of them or both at the same time. That's entirely up to you. I'll show you both.
First let me illustrate zsh/regex
a bit which is the simpler one of both. zsh/regex
provides, once loaded, the new conditional expression -regex-match
which can be used in combination with the [[
command (e.g. in if
conditions or loops):
# Load module
zmodload zsh/regex
# Execute POSIX ERE
[[ "foobar_123" -regex-match "^([a-zA-Z0-9]+)_([0-9]+)$" ]] && echo match || echo no match
The condition returns true if the expressions matches, otherwise false. If there are any matches, the whole matching part is stored in the $MATCH
parameter and if there are any substrings in parentheses, these parts will be available in the array $match
(here in our example you'd have an array with two elements containing foobar
and 123
).
Now that I've shown you zsh/regex
let's come to the more complex zsh/pcre
. zsh/pcre
also provides a new conditional expression called -pcre-match
which works about the same as -regex-match
except that it accepts Perl compatible regular expressions. So we could rewrite our example from above as follows:
# Load module
zmodload zsh/pcre
# Execute PCRE
[[ "foobar_123" -pcre-match "^(\w+)_(\d+)$" ]] && echo match || echo no match
That's a little less to write. But zsh/pcre
also provides a few new commands besides the conditional expression -pcre-match
. The most important ones to know are pcre_compile
and pcre_match
. With the first one you compile a regular expression from a string and with the second one you use this compiled regular expression on other strings. That means you always need to use both in combination.
Both commands provide several flag parameters. The most important ones for pcre_compile
are -m
which will match multi-line patterns, -s
which makes the dot pattern (.
) match whitespace as well and -i
which makes the pattern case-insensitive.
The most important flags for pcre_match
are -v
and -a
which let you set different names for the match variable containing the whole matching part and the match array containing all the substrings from enclosing parentheses (which are again $MATCH
and $match
by default).
Our example from above with pcre_compile
and pcre_match
would look like this:
pcre_compile "^(\w+)_(\d+)$"
pcre_match "foobar_123" && echo match || echo no match
Sometimes it may be more to write, but it also gives you some more flexibility due to the arguments both commands can take. For example, the following simple case-insensitive regular expression
pcre_compile -i "^foobar\s+\d+$"
pcre_match "fOoBaR 123" && echo match || echo no match
would need such a monster expression if just performed with -pcre-match
:
[[ "fOoBaR 123" -pcre-match "^[fF][oO]{2}[bB][aA][rR]\s+\d+$" ]] && echo match || echo no match
In this case the second variant is not just more to type (yes, that's true, count the characters), the first one is also much easier to read and less error-prone so I'd prefer that one.
Whichever variant you take and whether you prefer POSIX regular expressions or PCRE always depends on the situation. But all of them give you the full power of regular expressions. So use them!
Read more about zsh/regex and zsh/pcre:
- zsh.sf.net: The zsh/regex Module
- zsh.sf.net: The zsh/pcre Module
- zsh.sf.net: Conditional Expressions
- Wikipedia.org: Regular expression
New Blog entry: #ZSH Gem #23: Working with extended regular expressions https://t.co/gVg9SZBGvX