#20: Multi-line sed search and replace

Posted by | Comments (11) | Trackbacks (3)

sed is a very famous stream editor on UNIX systems. It is very powerful and versatile and makes manipulation of streams and text files very easy. However, it has a steep learning curve. Like vi/vim, sed might seem very bulky at the beginning, but as soon as you begin to understand the tool, it makes your workflow very efficient.

sed is built to process strings (either from STDIN or from a file) line by line. Therefore, you can't search for multiple lines in a way like this:

sed 's/foo\nbar/bla\nblub/'

That wouldn't work because the pattern space, in which all operations are performed, only contains one line at a time. Also replacing newline characters wouldn't work that way because they are stripped in pattern space. But there are several ways to work around this. I'll be showing you three ways of performing multi-line replacements.

The first way is that one, you normally find when googling for that topic. It makes use of the N command, which reads the next line and appends it to the pattern space, separated by a newline character.

sed '/foo/{ N; s/foo\nbar/bla\nblub/ }'

This looks for foo and, if found, appends the next line and does the replacement. But this method has a catch. If you have an input string like this, this won't work

foo
foo
bar

The first occurrence of foo is found and the next line is appended to pattern space. But of course

foo
foo

doesn't match the multi-line pattern, so the pattern space is replaced by the next line, which is

bar

but this doesn't match foo. You see, this method is very rough and does only work in a few cases.

But we can optimize this. Besides pattern space, sed also provides the so called hold space. This is just a temporary buffer on which no operations are performed. We can use this to read the whole input into it first and then replace the pattern space with the contents from hold space. That looks like this:

sed -n '1h; 1!H; ${ g; s/foo\nbar/bla\nblub/ p }'

That first reads the first line from pattern space into hold space (1h) replacing all contents which currently exist in hold space. Then all lines except line 1 are appended to hold space (1!H). The reason why we cannot only use 1,$H is that this would result in a blank line at the beginning since hold space has not been emptied. As soon as the end of the string is reached (range marker $), a subclause is opened which writes contents from hold space into pattern space (g) and does the replacement. Because we have read everything into hold space and then into pattern space, we would get double output. To avoid this, the parameter -n (no output) is set and the edited final string is printed manually with the p command from within the subclause. This method works remarkably well, but you should note that it is much slower if the stream/file is very long. One advantage of sed over many other tools is that it reads line by line, so it doesn't take more memory when working on long strings. This advantage is abrogated with this method. Keep that in mind.

Another way that came to my mind is to omit the hold space and read everything directly into pattern space. That's a mixture of method one and two.

sed '1!N; s/foo\nbar/bla\nblub/'

sed automatically reads line 1 into pattern space, so we only have to append all the others. We do this with 1!N, which appends all lines except the first one to pattern space. Then the replacement is performed. Done! Short and nifty. The only problem with this method is that is has problems with multiple replacements (g flag). For this better use the second method. Of course, you can also use this with the -n parameter and p command, but then you have to set a semicolon after the replacement command, otherwise you'd only get the parts which have been replaced. The rest of the string would not be printed to the screen. So

sed -n '1!N; s/foo\nbar/bla\nblub/ p'

is something different than

sed -n '1!N; s/foo\nbar/bla\nblub/; p'

The first one would only output

bla
blub

and the second one

foo
bla
blub

That's it, hope you learned a bit.

Read more about sed multi-line replacement:

Trackbacks

Manko10 sent a Trackback on : (permalink)

RT @reflinux: #Advent series "24 Short #Linux #Hints", day 20: Multi-line sed search and replace http://bit.ly/g53KTc

robo47 sent a Trackback on : (permalink)

RT @reflinux: #Advent series "24 Short #Linux #Hints", day 20: Multi-line sed search and replace http://bit.ly/g53KTc

Comments

There have been 11 comments submitted yet. Add one as well!
Jeff
Jeff wrote on : (permalink)
This was really informative but I still havent been able to get it to work. What is your input on the command? A full working example would really make this article fantastic. Thanks
Janek Bevendorff
Janek Bevendorff wrote on : (permalink)
Hi Jeff, you can either specify an input file like so: bc. sed 'pattern' input-file or pipe the output through STDIN: bc. echo foobar | sed 'pattern'
Virginia
Virginia wrote on : (permalink)
Hi, thanks for this topic!!! I read it, but I am still to "stupid" two replace the second of two lines In my example file I got those two lines: Type^M green^M and I want to replace it with Type^M red^M My code looks like this: REPLACELINE1="Type^M" REPLACELINE2="green" REPLACEWITHLINE2="red" sed -n "1!N; s/$REPLACELINE1\n$REPLACELINE2/$REPLACEWITHLINE2/; p" "$FILE" > "$FILE.NEU" Do you know what I did wrong? I would appreciate any help!! /Virginia
Janek Bevendorff
Janek Bevendorff wrote on : (permalink)
Hi Virginia, you might want to try this: REPLACELINE1='Type^M' REPLACELINE2='green' REPLACEWITHLINE2='red' sed -n "1h; 1!H; \${ g; s/$REPLACELINE1\n$REPLACELINE2/$REPLACEWITHLINE2/ p }" "$FILE" > "$FILE.NEU" (be careful when doing copy&paste as the blog software replaces the string delimiters/quotation symbols in the comments with their real typographic counterparts)
SPR
SPR wrote on : (permalink)
Hi...this is Raj..i am new to shell scripting ..i have to write a script to search and replace multiple lines in a file.. Example . #cat myfile.txt -a myown car dkdjdj / ghee 500>= / -w mdjdfdj -S jfdjf -F nfjfjf . . .etc.. i have to search multiple lines and if not found replace that line in afile. I used below command but not works for me sed -i 's/-a djdjdjdj -s nff -f 500> = / /-a djdjdjdj -s nff -f 500> =//g' myfile.txt please suggest me how to do it in better way. Thanks in advace.. SPR
Janek Bevendorff
Janek Bevendorff wrote on : (permalink)
Hi, I don't really know what you want to replace since your pattern doesn't match the contents of the file file (there is no "-a djdjdjdj" in the file, only "-a myown car dkdjdj"). What I can say for sure, though, is that you have a syntax error because you either need to use different delimiters (e.g. | instead of /) or escape every occurrence of / in the search and replace string.
J.P.Paillet
J.P.Paillet wrote on : (permalink)
Thank you very much, this is a very clear explanation. For the first time in years, I could use sed instead of awk for simple cases. What follows could help me to automate updates on our DNS server's config files. sed -e '/^@[ ]*IN[ ]*SOA/N;s/^\(@.*\n[^0-9]*\)[0-9]*/\1'$(date +"%Y%m%d%H")'/'
Basil
Basil wrote on : (permalink)
Hi Janek, I need to replace with The new line character and tab spaces in the first search string causes the issues. Can you please guide me to replace it?
Janek Bevendorff
Janek Bevendorff wrote on : (permalink)
You could try it like this: pre. sed -n '1h; 1!H; ${ g; s/\(\)/\1TRUE\2/g; p }' (ignore the double escaped HTML tags, they should actually be < and >, not &lt; and &gt;) However, I don't think, sed is the right tool for you. Replacing HTML or XML with regular expressions isn't really a thing (XML is not a regular language). For a few quick ad-hoc fixes, it might work, but if you really need to do some more (and guaranteed to be valid) changes, you should really use an XML parser. Remember, that the regular expression also won't match if the attributes are in the wrong order, the tag is not self-closing etc.
NC
NC wrote on : (permalink)
How do I print a file, I want to first search for StartPattern1 and EndPattern1 with Matchpattern1 and print from startpattern1 and Endpattern1. Then I want to do same startPattern1 and Endpattern1 this time with Matchpattern2. i was able to accomplish this by below sed -n '/interface dsl/h;/interface dsl/!H; /!/ {x;/dsl profile INTRO/p;}' /home/ProfileChangeSelect/FILE now i want to add " dsl profile VALUE" in addition to dsl profile INTRO What is the syntax? THanks

Write a comment:

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

By submitting a comment, you agree to our privacy policy.

Design and Code Copyright © 2010-2024 Janek Bevendorff Content on this site is published under the terms of the GNU Free Documentation License (GFDL). You may redistribute content only in compliance with these terms.