Latest revision as of 19:39, 30 March 2019

Sed is a *nix system utility that will come with 99% of *nix systems. It's an in-place string manipulation program that can come in handy to make a whole lot of typing into a few lines of string manipulation. It can get ugly, but once you start to use it you'll wonder how you ever lived without it.

Sed introduction and tutorial: http://www.grymoire.com/Unix/Sed.html

Editing Files In-Place

Sed can be used to edit files in-place using the -i flag.

Find and Replace

You can find and replace instances of a string in a file using:

$ sed -i -e 's/peanut butter/jelly/g' file{1,2,3}.txt

This replaces peanut butter with jelly in file1.txt, file2,txt, and file3.txt. To replace more than one thing, use

$ sed -i -e 's/peanut butter/jelly/g' \
         -e 's/green eggs/ham/g'      \
         -e 's/water/wine/g'          \
         file{1,2,3}.txt

or, more succinctly:

$ sed -i -e 's/peanut butter/jelly/g;s/green eggs/ham/g' \
         file{1,2,3}.txt

Sed Patterns

Repeating Search Patterns in Replacements

If you are searching for a pattern, and want to repeat the pattern in the replacement pattern, you can surround it in (escaped) parentheses, like this: \(pattern_to_repeat\)

This can then be put into the replacement pattern by using \1. An example:

$ echo "peanut butter and jelly" | \
  sed -e 's/\(jelly\)/strawberry \1/'

peanut butter and strawberry jelly

This can be done with an arbitrary number of patterns, e.g.:

$ echo "pattern1 pattern2 pattern3 pattern4 pattern5" | \
  sed -e 's/\(pattern1\) \(pattern2\) \(pattern3\) \(pattern4\) \(pattern5\)/\5 \4 \3 \2 \1/'

pattern5 pattern4 pattern3 pattern2 pattern1

and the number of patterns can also be greater than 9:

echo "p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11" | \
 sed -e 's/\(p1\) \(p2\) \(p3\) \(p4\) \(p5\) \(p6\) \(p7\) \(p8\) \(p9\) \(p10\) \(p11\)/\3 \2 \1 \4 \6 \5 \9 \8 \7 \10 \11/'

p3 p2 p1 p4 p6 p5 p9 p8 p7 p10 p11

Special/Escape Characters

Main article: Regular Expressions

NOTE: This section is specific to GNU sed, other versions of sed will likely behave differently.

Sometimes you want to look for generic patterns, like "four numbers in a row", rather than something specific, like "5555". This can be done using special/escape characters.

Numerical Characters

To match any number between 0 and 9, use [0-9], like this:

$ echo "5" | sed -e 's/[0-9]/replacement/'
replacement

To match a pattern of N numbers between 0 and 9, use \{N\}, like this:

$ echo "5678" | sed -e 's/[0-9]\{4\}/replacement/'
replacement

If you want to match a pattern of numbers between 0 and 9, and know there will be somewhere between M and N numbers, you can use the syntax \{M,N\}. For example, if you want to replace a number between 2 and 4 digits long:

$ echo "56" | sed -e 's/[0-9]\{2,4\}/replacement/'
replacement

$ echo "5234678" | sed -e 's/[0-9]\{2,4\}/replacement/'
replacement678

$ echo "5" | sed -e 's/[0-9]\{2,4\}/replacement/'
5

Note that in the last command executed, the replacement pattern doesn't show up because the largest pattern of numbers between 0 and 9 is 1, which does not fall in the range of 2 to 4.

Since \{M,N\} is ugly and burdensome to type, you can use the sed flag -r or --regexp-extended to eliminate the need for backslashes:

$ echo "5234678" | sed -e 's/[0-9]\{2,4\}/replacement/'
replacement678

$ echo "5234678" | sed -re 's/[0-9]{2,4}/replacement/'
replacement678

To leave the upper bound of the number size unspecified, use \{N,\}:

$ echo "52" | sed -re 's/[0-9]{2,}/replacement/'
replacement

$ echo "5234678" | sed -re 's/[0-9]{2,}/replacement/'
replacement

$ echo "5223902949082309448792387234" | sed -re 's/[0-9]{2,}/replacement/'
replacement

Sed Commands

Less Common Commands

w command

To search for a pattern, and print the resulting pattern to a file, use the w command:

$ cat list_file 
Phoenix
New York City
San Francisco
Orlando
Atlanta
Seattle
San Antonio
St. Louis

$ sed -n '/San/w search_results' list_file

$ cat search_results 
San Francisco
San Antonio

e command

To output the results of a command into a new line, the e command can be used. For example, the contents of a small file (called new_item in this example) could be inserted into a line of the file list_file. Here are the two files:

$ cat list_file
Phoenix
New York
San Francisco
Orlando
Atlanta
Seattle
San Antonio
St. Louis


$ cat new_item
Boston

Now we can prepend the contents of the command "cat new_item" to the result of the search "/New York/", which results in the line Boston appearing above the line New York:

$ sed '/New York/e cat new_item' list_file
Phoenix
Boston
New York
San Francisco
Orlando
Atlanta
Seattle
San Antonio
St. Louis

Examples

Renaming files, case 1

I had a set of simulation outputs whose names looked like this:

i8_j8_k8
i9_j9_k9
i10_j10_k10
i11_j11_k11
[...]
i101_j101_k101
i102_j102_k102
i103_j103_k103

This became problematic, since, doing a string sort, these go out of order (e.g. i80 comes after i8). I wanted to rename them to be something like this:

i008_j008_k008
i009_j009_k009
i010_j010_k010
i011_j011_k011
[...]
i101_j101_k101
i102_j102_k102
i103_j103_k103

I will explain this three-part command, as follows:

ls -1c i*

This command will list all of the files, with one file name on each line. This is then piped to the sed command.

/bin/sed \
 -e 'p' \
 -e 's/i\([0-9]\{1\}\)_/i00\1_/' \
 -e 's/i\([0-9]\{2\}\)_/i0\1_/'  \
 \
 -e 's/j\([0-9]\{1\}\)_/j00\1_/' \
 -e 's/j\([0-9]\{2\}\)_/j0\1_/'  \
 \
 -e 's/k\([0-9]\{1\}\)$/k00\1/'  \
 -e 's/k\([0-9]\{2\}\)$/k0\1/'

This sed command has four parts. The first is the print statement, 'p': this prints the name of the file, before any manipulation is performed by sed.

The next three parts are to transform the i's, j's, and k's into the desired form. The first line looks for a number in the form iN (where N is a number from 0-9) and replacees it with i00N, and the second line looks for a number in the form iNN and replaces it with i0NN.

The symtax \{1\} means 1 instance of the preceeding regular expression; the syntax \{2\} means 2 instances of the preceeding regular expression; etc. (See the Regular expressions page).

The parentheses that surround the number pattern \([0-9]\{1\}\) are used to store the pattern, so that it can be inserted in the replacement string (this is what the \1 does).

Finally, the last part of the command is an Xargs command that will take two arguments at a time; the first argument is the original file name (printed with the sed 'p' command), and the second argument is the manipulated string (now in the desired format, iNNN_jNNN_kNNN). These are passed two at a time to the mv command.

I put this in the file script.sh and ran it. The result is:

$ ./script.sh
mv i8_j8_k8 i008_j008_k008
mv i9_j9_k9 i009_j009_k009
mv i10_j10_k10 i010_j010_k010
mv i11_j11_k11 i011_j011_k011
[...]
mv i101_j101_k101 i101_j101_k101

Postscript: I had to modify this script and re-run it in the same directory, which caused a bunch of errors like this:

mv: `i059_j072_k072' and `i059_j072_k072' are the same file
mv i042_j072_k072 i042_j072_k072 
mv: `i042_j072_k072' and `i042_j072_k072' are the same file
mv i018_j072_k072 i018_j072_k072 
mv: `i018_j072_k072' and `i018_j072_k072' are the same file
mv i026_j072_k072 i026_j072_k072 
mv: `i026_j072_k072' and `i026_j072_k072' are the same file
mv i016_j072_k072 i016_j072_k072 
mv: `i016_j072_k072' and `i016_j072_k072' are the same file
mv i142_j072_k072 i142_j072_k072 
mv: `i142_j072_k072' and `i142_j072_k072' are the same file
mv i129_j072_k072 i129_j072_k072 
mv: `i129_j072_k072' and `i129_j072_k072' are the same file
mv i135_j072_k072 i135_j072_k072 
mv: `i135_j072_k072' and `i135_j072_k072' are the same file
mv i125_j072_k072 i125_j072_k072 
mv: `i125_j072_k072' and `i125_j072_k072' are the same file
mv i127_j072_k072 i127_j072_k072 
mv: `i127_j072_k072' and `i127_j072_k072' are the same file
mv i119_j072_k072 i119_j072_k072 
mv: `i119_j072_k072' and `i119_j072_k072' are the same file
mv i114_j072_k072 i114_j072_k072 
mv: `i114_j072_k072' and `i114_j072_k072' are the same file
mv i100_j072_k072 i100_j072_k072 
mv: `i100_j072_k072' and `i100_j072_k072' are the same file

The reason is, Sed would still print two things, the filename and the transformed filename, but if it had already transformed the filename then it would not be transformed, leading to duplicate arguments fed to the mv command.

I ended up using Awk to check if the arguments being fed to mv were duplicates: see Awk#Renaming Files, If Names Not Duplicates

Renaming files, case 2

I had a couple of ebook files that were named "Title - Author.mobi" and I needed to rename them to be "Author - Title.mobi". The challenge was that they had spaces in their names:

My Inventions - Nickola Tessla.mobi
My Uncle Oswald - Roald Dahl.mobi
Myths to live by - Joseph Campbell.mobi
Revolting rhymes - Roald Dahl.mobi
Screwjack - Hunter S. Thompson.mobi
The Hacker Crackdown - Bruce Sterling.mobi
The Information - Martin Amis.mobi
The Moronic Inferno & Other Visits to America - Martin Amis.mobi
The Power of Myth - Joseph Campbell w_ Bill Moyers.mobi
The Rachel Papers - Martin Amis.mobi
The Rum Diary_ A Novel - Hunter S. Thompson.mobi
The Silmarillion - J. R. R. Tolkien.mobi
The Witches - Roald Dahl.mobi
The critical period of American history, 1783-1789 - John Fiske.mobi
The interpretation of dreams - Sigmund Freud.mobi
The murder on the links - Agatha Christie.mobi
The time machine - H. G. Wells.mobi
The virtue of selfishness_ a new concept of egoism - Ayn Rand.mobi
Time's Arrow - Martin Amis.mobi

The trick for doing this was to parse the name into three pieces: the piece appearing before " - ", the piece appearing after " - " and before ".mobi", and the ".mobi" file extension.

Combine the first two pieces \1 and \2 with double-quotes to reorder the title and author, and feed that new file name to xargs for the renaming.

This was problematic because my initial script was not working:

$ ls -1 *.mobi  \
 | sed 's/\(.*\) - \(.*\)\.mobi/"\1 - \2\.mobi" "\2 - \1\.mobi" /g' \
 | xargs -0 -n1 -I% mv %

usage: mv [-f | -i | -n] [-v] source target
       mv [-f | -i | -n] [-v] source ... directory

i.e. the mv command was empty. This is some stupid problem with xargs. If I just run it as "xargs -0", then it prints everything as expected. But as soon as I specify a command, it doesn't print anything.

So I ended up having to hack a solution, like so:

$ ls -1 *.mobi \
 | sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g' \
 | sed 's/\ /\\\ /g' \
 | sed "s/\'/\\\'/g" \
 | sed 's/mobi\\/mobi/g' \
 | xargs -n2 mv

This consists of 4 sed commands plus an xargs command. The 4 sed commands do the following:

The first command is:

sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g'

This command takes anything of the form "Title - Author.mobi" and transforms it into "Title - Author.mobi Author - Title.mobi". The point here is to print the original file name, then print the destination file name, all on one line.

The next command is:

sed 's/\ /\\\ /g'

This command replaces all spaces with an escaped space, "\ ", which will prevent the need for double quotes (which is part of the reason xargs was choking on the output of the command that failed above).

The next command is:

sed "s/\'/\\\'/g"

This just escapes the single quotes in the file names.

The last command is:

sed 's/mobi\\/mobi/g'

and this command finds any space that's at the end of a filename (like in "Title\ -\ Author.mobi\ Author\ -\ Title.mobi", the space separating the two filenames), and eliminates the escape character. This ensures that the two filenames are kept separate and distinct.

Finally, all of this is fed to xargs, which then feeds it to the mv command.

Certainly not the most elegant solution, but it was necessary because of some kind of problem with xargs, whitespaces, and double quotes.

Renaming Files to put Path in Name

I wanted to move a bunch of files from:

Pictures/Centralia/some category/some category 1.jpg
Pictures/Centralia/some category/some category 2.jpg
Pictures/Centralia/some category/some category 3.jpg

Pictures/Centralia/another category/another category 1.jpg
Pictures/Centralia/another category/another category 2.jpg
Pictures/Centralia/another category/another category 3.jpg

to

Pictures/Centralia_some_category_1.jpg
...

Pictures/Centralia_another_category_1.jpg
...

To do this, I used the following command:

find Centralia -name "*.jpg" | \
  sed -e 'p;s/\//_/g' \
      -e 's/ /_/g'    \
      -e 's/_\([a-z]\{1,\}_[a-z]\{1,\}_\)\{3\}\([0-9]\)/_\1\2/g' | \
  sed -e 's/^/"/' -e 's/$/"/' | \
  xargs -n2 mv

The find command finds all the files I wanted to rename.

The second and third lines replace slashes with underscores.

The fourth line looks for an instance of a one-to-three "some category"

References

This page has more information on special/escape characters: http://sed.sourceforge.net/sedfaq6.html

One-line sed commands: http://sed.sourceforge.net/grabbag/tutorials/sed1line.txt

O'Reilly Sed/Awk book: https://docstore.mik.ua/orelly/unix/sedawk/
- Chapter 5: Basic sed commands http://docstore.mik.ua/orelly/unix/sedawk/ch05_01.htm
- Chapter 6: Advanced sed commands http://docstore.mik.ua/orelly/unix/sedawk/ch06_01.htm

Sed info file: http://www.gnu.org/software/sed/manual/sed.html#Common-Commands

Sed: Difference between revisions

From charlesreid1