Sed: Difference between revisions
From charlesreid1
| (19 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Sed is a *nix system utility that will come with 99% of *nix systems. It's an in-place string manipulation program that can come in handy to make a whole lot of typing into a few lines of string manipulation. It | Sed is a *nix system utility that will come with 99% of *nix systems. It's an in-place string manipulation program that can come in handy to make a whole lot of typing into a few lines of string manipulation. It can get ugly, but once you start to use it you'll wonder how you ever lived without it. | ||
Sed introduction and tutorial: http://www.grymoire.com/Unix/Sed.html | Sed introduction and tutorial: http://www.grymoire.com/Unix/Sed.html | ||
| Line 7: | Line 7: | ||
Sed can be used to edit files in-place using the <code>-i</code> flag. | Sed can be used to edit files in-place using the <code>-i</code> flag. | ||
=Find and Replace= | |||
You can find and replace instances of a string in a file using: | You can find and replace instances of a string in a file using: | ||
| Line 29: | Line 29: | ||
$ sed -i -e 's/peanut butter/jelly/g;s/green eggs/ham/g' \ | $ sed -i -e 's/peanut butter/jelly/g;s/green eggs/ham/g' \ | ||
file{1,2,3}.txt | file{1,2,3}.txt | ||
</source> | |||
=Sed Patterns= | |||
==Repeating Search Patterns in Replacements== | |||
If you are searching for a pattern, and want to repeat the pattern in the replacement pattern, you can surround it in (escaped) parentheses, like this: <code>\(pattern_to_repeat\)</code> | |||
This can then be put into the replacement pattern by using <code>\1</code>. An example: | |||
<source lang="bash"> | |||
$ echo "peanut butter and jelly" | \ | |||
sed -e 's/\(jelly\)/strawberry \1/' | |||
peanut butter and strawberry jelly | |||
</source> | |||
This can be done with an arbitrary number of patterns, e.g.: | |||
<source lang="bash"> | |||
$ echo "pattern1 pattern2 pattern3 pattern4 pattern5" | \ | |||
sed -e 's/\(pattern1\) \(pattern2\) \(pattern3\) \(pattern4\) \(pattern5\)/\5 \4 \3 \2 \1/' | |||
pattern5 pattern4 pattern3 pattern2 pattern1 | |||
</source> | |||
and the number of patterns can also be greater than 9: | |||
<source lang="bash"> | |||
echo "p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11" | \ | |||
sed -e 's/\(p1\) \(p2\) \(p3\) \(p4\) \(p5\) \(p6\) \(p7\) \(p8\) \(p9\) \(p10\) \(p11\)/\3 \2 \1 \4 \6 \5 \9 \8 \7 \10 \11/' | |||
p3 p2 p1 p4 p6 p5 p9 p8 p7 p10 p11 | |||
</source> | </source> | ||
==Special/Escape Characters== | ==Special/Escape Characters== | ||
{{Main|Regular Expressions}} | |||
NOTE: This section is specific to GNU sed, other versions of sed will likely behave differently. | NOTE: This section is specific to GNU sed, other versions of sed will likely behave differently. | ||
| Line 91: | Line 126: | ||
</pre> | </pre> | ||
=Sed Commands= | |||
<!-- | |||
<pre> | |||
# | |||
[No addresses allowed.] | |||
The # character begins a comment; the comment continues until the next newline. | |||
If you are concerned about portability, be aware that some implementations of sed (which are not posix conformant) may only support a single one-line comment, and then only when the very first character of the script is a #. | |||
Warning: if the first two characters of the sed script are #n, then the -n (no-autoprint) option is forced. If you want to put a comment in the first line of your script and that comment begins with the letter ‘n’ and you do not want this behavior, then be sure to either use a capital ‘N’, or place at least one space before the ‘n’. | |||
q [exit-code] | |||
This command only accepts a single address. | |||
Exit sed without processing any more commands or input. Note that the current pattern space is printed if auto-print is not disabled with the -n options. The ability to return an exit code from the sed script is a GNU sed extension. | |||
d | |||
Delete the pattern space; immediately start next cycle. | |||
p | |||
Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option. | |||
n | |||
If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands. | |||
{ commands } | |||
A group of commands may be enclosed between { and } characters. This is particularly useful when you want a group of commands to be triggered by a single address (or address-range) match. | |||
</pre> | |||
<pre>3.5 The s Command | |||
The syntax of the s (as in substitute) command is ‘s/regexp/replacement/flags’. The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character. | |||
The s command is probably the most important in sed and has a lot of different options. Its basic concept is simple: the s command attempts to match the pattern space against the supplied regexp; if the match is successful, then that portion of the pattern space which was matched is replaced with replacement. | |||
The replacement can contain \n (n being a number from 1 to 9, inclusive) references, which refer to the portion of the match which is contained between the nth \( and its matching \). Also, the replacement can contain unescaped & characters which reference the whole matched portion of the pattern space. Finally, as a GNU sed extension, you can include a special sequence made of a backslash and one of the letters L, l, U, u, or E. The meaning is as follows: | |||
\L | |||
Turn the replacement to lowercase until a \U or \E is found, | |||
\l | |||
Turn the next character to lowercase, | |||
\U | |||
Turn the replacement to uppercase until a \L or \E is found, | |||
\u | |||
Turn the next character to uppercase, | |||
\E | |||
Stop case conversion started by \L or \U. | |||
To include a literal \, &, or newline in the final replacement, be sure to precede the desired \, &, or newline in the replacement with a \. | |||
The s command can be followed by zero or more of the following flags: | |||
g | |||
Apply the replacement to all matches to the regexp, not just the first. | |||
number | |||
Only replace the numberth match of the regexp. | |||
Note: the posix standard does not specify what should happen when you mix the g and number modifiers, and currently there is no widely agreed upon meaning across sed implementations. For GNU sed, the interaction is defined to be: ignore matches before the numberth, and then match and replace all matches from the numberth on. | |||
p | |||
If the substitution was made, then print the new pattern space. | |||
Note: when both the p and e options are specified, the relative ordering of the two produces very different results. In general, ep (evaluate then print) is what you want, but operating the other way round can be useful for debugging. For this reason, the current version of GNU sed interprets specially the presence of p options both before and after e, printing the pattern space before and after evaluation, while in general flags for the s command show their effect just once. This behavior, although documented, might change in future versions. | |||
w file-name | |||
If the substitution was made, then write out the result to the named file. As a GNU sed extension, two special values of file-name are supported: /dev/stderr, which writes the result to the standard error, and /dev/stdout, which writes to the standard output.4 | |||
e | |||
This command allows one to pipe input from a shell command into pattern space. If a substitution was made, the command that is found in pattern space is executed and pattern space is replaced with its output. A trailing newline is suppressed; results are undefined if the command to be executed contains a nul character. This is a GNU sed extension. | |||
I | |||
i | |||
The I modifier to regular-expression matching is a GNU extension which makes sed match regexp in a case-insensitive manner. | |||
M | |||
m | |||
The M modifier to regular-expression matching is a GNU sed extension which causes ^ and $ to match respectively (in addition to the normal behavior) the empty string after a newline, and the empty string before a newline. There are special character sequences (\` and \') which always match the beginning or the end of the buffer. M stands for multi-line. | |||
</pre> | |||
<pre> | |||
Though perhaps less frequently used than those in the previous section, some very small yet useful sed scripts can be built with these commands. | |||
y/source-chars/dest-chars/ | |||
(The / characters may be uniformly replaced by any other single character within any given y command.) | |||
Transliterate any characters in the pattern space which match any of the source-chars with the corresponding character in dest-chars. | |||
Instances of the / (or whatever other character is used in its stead), \, or newlines can appear in the source-chars or dest-chars lists, provide that each instance is escaped by a \. The source-chars and dest-chars lists must contain the same number of characters (after de-escaping). | |||
a\ | |||
text | |||
As a GNU extension, this command accepts two addresses. | |||
Queue the lines of text which follow this command (each but the last ending with a \, which are removed from the output) to be output at the end of the current cycle, or when the next input line is read. | |||
Escape sequences in text are processed, so you should use \\ in text to print a single backslash. | |||
As a GNU extension, if between the a and the newline there is other than a whitespace-\ sequence, then the text of this line, starting at the first non-whitespace character after the a, is taken as the first line of the text block. (This enables a simplification in scripting a one-line add.) This extension also works with the i and c commands. | |||
i\ | |||
text | |||
As a GNU extension, this command accepts two addresses. | |||
Immediately output the lines of text which follow this command (each but the last ending with a \, which are removed from the output). | |||
c\ | |||
text | |||
Delete the lines matching the address or address-range, and output the lines of text which follow this command (each but the last ending with a \, which are removed from the output) in place of the last line (or in place of each line, if no addresses were specified). A new cycle is started after this command is done, since the pattern space will have been deleted. | |||
= | |||
As a GNU extension, this command accepts two addresses. | |||
Print out the current input line number (with a trailing newline). | |||
l n | |||
Print the pattern space in an unambiguous form: non-printable characters (and the \ character) are printed in C-style escaped form; long lines are split, with a trailing \ character to indicate the split; the end of each line is marked with a $. | |||
n specifies the desired line-wrap length; a length of 0 (zero) means to never wrap long lines. If omitted, the default as specified on the command line is used. The n parameter is a GNU sed extension. | |||
r filename | |||
As a GNU extension, this command accepts two addresses. | |||
Queue the contents of filename to be read and inserted into the output stream at the end of the current cycle, or when the next input line is read. Note that if filename cannot be read, it is treated as if it were an empty file, without any error indication. | |||
As a GNU sed extension, the special value /dev/stdin is supported for the file name, which reads the contents of the standard input. | |||
w filename | |||
Write the pattern space to filename. As a GNU sed extension, two special values of file-name are supported: /dev/stderr, which writes the result to the standard error, and /dev/stdout, which writes to the standard output.5 | |||
The file will be created (or truncated) before the first input line is read; all w commands (including instances of the w flag on successful s commands) which refer to the same filename are output without closing and reopening the file. | |||
D | |||
If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input. | |||
N | |||
Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands. | |||
P | |||
Print out the portion of the pattern space up to the first newline. | |||
h | |||
Replace the contents of the hold space with the contents of the pattern space. | |||
H | |||
Append a newline to the contents of the hold space, and then append the contents of the pattern space to that of the hold space. | |||
g | |||
Replace the contents of the pattern space with the contents of the hold space. | |||
G | |||
Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space. | |||
x | |||
Exchange the contents of the hold and pattern spaces. | |||
</pre> | |||
--> | |||
==Less Common Commands== | |||
===w command=== | |||
To search for a pattern, and print the resulting pattern to a file, use the <code>w</code> command: | |||
<pre> | |||
$ cat list_file | |||
Phoenix | |||
New York City | |||
San Francisco | |||
Orlando | |||
Atlanta | |||
Seattle | |||
San Antonio | |||
St. Louis | |||
$ sed -n '/San/w search_results' list_file | |||
$ cat search_results | |||
San Francisco | |||
San Antonio | |||
</pre> | |||
===e command=== | |||
To output the results of a command into a new line, the <code>e</code> command can be used. For example, the contents of a small file (called new_item in this example) could be inserted into a line of the file list_file. Here are the two files: | |||
<pre> | |||
$ cat list_file | |||
Phoenix | |||
New York | |||
San Francisco | |||
Orlando | |||
Atlanta | |||
Seattle | |||
San Antonio | |||
St. Louis | |||
$ cat new_item | |||
Boston | |||
</pre> | |||
Now we can prepend the contents of the command "cat new_item" to the result of the search "/New York/", which results in the line Boston appearing above the line New York: | |||
<pre> | |||
$ sed '/New York/e cat new_item' list_file | |||
Phoenix | |||
Boston | |||
New York | |||
San Francisco | |||
Orlando | |||
Atlanta | |||
Seattle | |||
San Antonio | |||
St. Louis | |||
</pre> | |||
=Examples= | |||
==Renaming files, case 1== | |||
I had a set of simulation outputs whose names looked like this: | |||
<pre> | |||
i8_j8_k8 | |||
i9_j9_k9 | |||
i10_j10_k10 | |||
i11_j11_k11 | |||
[...] | |||
i101_j101_k101 | |||
i102_j102_k102 | |||
i103_j103_k103 | |||
</pre> | |||
This became problematic, since, doing a string sort, these go out of order (e.g. i80 comes after i8). I wanted to rename them to be something like this: | |||
<pre> | |||
i008_j008_k008 | |||
i009_j009_k009 | |||
i010_j010_k010 | |||
i011_j011_k011 | |||
[...] | |||
i101_j101_k101 | |||
i102_j102_k102 | |||
i103_j103_k103 | |||
</pre> | |||
{{Ambox | |||
|image=none | |||
|type=delete | |||
|text=To do this, I used the following sed script: | |||
<source lang="bash"> | |||
#!/bin/sh | |||
ls -1c i* | /bin/sed \ | |||
-e 'p' \ | |||
-e 's/i\([0-9]\{1\}\)_/i00\1_/' \ | |||
-e 's/i\([0-9]\{2\}\)_/i0\1_/' \ | |||
-e 's/j\([0-9]\{1\}\)_/j00\1_/' \ | |||
-e 's/j\([0-9]\{2\}\)_/j0\1_/' \ | |||
-e 's/k\([0-9]\{1\}\)$/k00\1/' \ | |||
-e 's/k\([0-9]\{2\}\)$/k0\1/' \ | |||
| xargs -I {} -n2 -t mv | |||
</source> | |||
}} | |||
I will explain this three-part command, as follows: | |||
<source lang="bash"> | |||
ls -1c i* | |||
</source> | |||
This command will list all of the files, with one file name on each line. This is then piped to the sed command. | |||
<source lang="bash"> | |||
/bin/sed \ | |||
-e 'p' \ | |||
-e 's/i\([0-9]\{1\}\)_/i00\1_/' \ | |||
-e 's/i\([0-9]\{2\}\)_/i0\1_/' \ | |||
\ | |||
-e 's/j\([0-9]\{1\}\)_/j00\1_/' \ | |||
-e 's/j\([0-9]\{2\}\)_/j0\1_/' \ | |||
\ | |||
-e 's/k\([0-9]\{1\}\)$/k00\1/' \ | |||
-e 's/k\([0-9]\{2\}\)$/k0\1/' | |||
</source> | |||
This sed command has four parts. The first is the print statement, 'p': this prints the name of the file, before any manipulation is performed by sed. | |||
The next three parts are to transform the i's, j's, and k's into the desired form. The first line looks for a number in the form iN (where N is a number from 0-9) and replacees it with i00N, and the second line looks for a number in the form iNN and replaces it with i0NN. | |||
The symtax <code>\{1\}</code> means 1 instance of the preceeding regular expression; the syntax <code>\{2\}</code> means 2 instances of the preceeding regular expression; etc. (See the [[Regular expressions]] page). | |||
The parentheses that surround the number pattern <code>\([0-9]\{1\}\)</code> are used to store the pattern, so that it can be inserted in the replacement string (this is what the <code>\1</code> does). | |||
Finally, the last part of the command is an [[Xargs]] command that will take two arguments at a time; the first argument is the original file name (printed with the sed 'p' command), and the second argument is the manipulated string (now in the desired format, iNNN_jNNN_kNNN). These are passed two at a time to the mv command. | |||
I put this in the file <code>script.sh</code> and ran it. The result is: | |||
<source lang="bash"> | |||
$ ./script.sh | |||
mv i8_j8_k8 i008_j008_k008 | |||
mv i9_j9_k9 i009_j009_k009 | |||
mv i10_j10_k10 i010_j010_k010 | |||
mv i11_j11_k11 i011_j011_k011 | |||
[...] | |||
mv i101_j101_k101 i101_j101_k101 | |||
</source> | |||
'''Postscript''': I had to modify this script and re-run it in the same directory, which caused a bunch of errors like this: | |||
<pre> | |||
mv: `i059_j072_k072' and `i059_j072_k072' are the same file | |||
mv i042_j072_k072 i042_j072_k072 | |||
mv: `i042_j072_k072' and `i042_j072_k072' are the same file | |||
mv i018_j072_k072 i018_j072_k072 | |||
mv: `i018_j072_k072' and `i018_j072_k072' are the same file | |||
mv i026_j072_k072 i026_j072_k072 | |||
mv: `i026_j072_k072' and `i026_j072_k072' are the same file | |||
mv i016_j072_k072 i016_j072_k072 | |||
mv: `i016_j072_k072' and `i016_j072_k072' are the same file | |||
mv i142_j072_k072 i142_j072_k072 | |||
mv: `i142_j072_k072' and `i142_j072_k072' are the same file | |||
mv i129_j072_k072 i129_j072_k072 | |||
mv: `i129_j072_k072' and `i129_j072_k072' are the same file | |||
mv i135_j072_k072 i135_j072_k072 | |||
mv: `i135_j072_k072' and `i135_j072_k072' are the same file | |||
mv i125_j072_k072 i125_j072_k072 | |||
mv: `i125_j072_k072' and `i125_j072_k072' are the same file | |||
mv i127_j072_k072 i127_j072_k072 | |||
mv: `i127_j072_k072' and `i127_j072_k072' are the same file | |||
mv i119_j072_k072 i119_j072_k072 | |||
mv: `i119_j072_k072' and `i119_j072_k072' are the same file | |||
mv i114_j072_k072 i114_j072_k072 | |||
mv: `i114_j072_k072' and `i114_j072_k072' are the same file | |||
mv i100_j072_k072 i100_j072_k072 | |||
mv: `i100_j072_k072' and `i100_j072_k072' are the same file | |||
</pre> | |||
The reason is, Sed would still print two things, the filename and the transformed filename, but if it had already transformed the filename then it would not be transformed, leading to duplicate arguments fed to the mv command. | |||
I ended up using [[Awk]] to check if the arguments being fed to mv were duplicates: see [[Awk#Renaming_Files.2C_If_Names_Not_Duplicates|Awk#Renaming Files, If Names Not Duplicates]] | |||
==Renaming files, case 2== | |||
I had a couple of ebook files that were named "Title - Author.mobi" and I needed to rename them to be "Author - Title.mobi". The challenge was that they had spaces in their names: | |||
<pre> | |||
My Inventions - Nickola Tessla.mobi | |||
My Uncle Oswald - Roald Dahl.mobi | |||
Myths to live by - Joseph Campbell.mobi | |||
Revolting rhymes - Roald Dahl.mobi | |||
Screwjack - Hunter S. Thompson.mobi | |||
The Hacker Crackdown - Bruce Sterling.mobi | |||
The Information - Martin Amis.mobi | |||
The Moronic Inferno & Other Visits to America - Martin Amis.mobi | |||
The Power of Myth - Joseph Campbell w_ Bill Moyers.mobi | |||
The Rachel Papers - Martin Amis.mobi | |||
The Rum Diary_ A Novel - Hunter S. Thompson.mobi | |||
The Silmarillion - J. R. R. Tolkien.mobi | |||
The Witches - Roald Dahl.mobi | |||
The critical period of American history, 1783-1789 - John Fiske.mobi | |||
The interpretation of dreams - Sigmund Freud.mobi | |||
The murder on the links - Agatha Christie.mobi | |||
The time machine - H. G. Wells.mobi | |||
The virtue of selfishness_ a new concept of egoism - Ayn Rand.mobi | |||
Time's Arrow - Martin Amis.mobi | |||
</pre> | |||
The trick for doing this was to parse the name into three pieces: the piece appearing before " - ", the piece appearing after " - " and before ".mobi", and the ".mobi" file extension. | |||
Combine the first two pieces <code>\1</code> and <code>\2</code> with double-quotes to reorder the title and author, and feed that new file name to xargs for the renaming. | |||
{{Ambox | |||
|image=none | |||
|type=delete | |||
|text=To do this, I used the following sed script: | |||
<source lang="bash"> | |||
#!/bin/sh | |||
$ ls -1 *.mobi \ | |||
| sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g' \ | |||
| sed 's/\ /\\\ /g' \ | |||
| sed "s/\'/\\\'/g" \ | |||
| sed 's/mobi\\/mobi/g' \ | |||
| xargs -n2 mv | |||
</source> | |||
}} | |||
This was problematic because my initial script was not working: | |||
<source lang="bash"> | |||
$ ls -1 *.mobi \ | |||
| sed 's/\(.*\) - \(.*\)\.mobi/"\1 - \2\.mobi" "\2 - \1\.mobi" /g' \ | |||
| xargs -0 -n1 -I% mv % | |||
usage: mv [-f | -i | -n] [-v] source target | |||
mv [-f | -i | -n] [-v] source ... directory | |||
</source> | |||
i.e. the mv command was empty. This is some stupid problem with xargs. If I just run it as "xargs -0", then it prints everything as expected. But as soon as I specify a command, it doesn't print anything. | |||
So I ended up having to hack a solution, like so: | |||
<source lang="bash"> | |||
$ ls -1 *.mobi \ | |||
| sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g' \ | |||
| sed 's/\ /\\\ /g' \ | |||
| sed "s/\'/\\\'/g" \ | |||
| sed 's/mobi\\/mobi/g' \ | |||
| xargs -n2 mv | |||
</source> | |||
This consists of 4 sed commands plus an xargs command. The 4 sed commands do the following: | |||
The first command is: | |||
<pre> | |||
sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g' | |||
</pre> | |||
This command takes anything of the form "Title - Author.mobi" and transforms it into "Title - Author.mobi Author - Title.mobi". The point here is to print the original file name, then print the destination file name, all on one line. | |||
The next command is: | |||
<pre> | |||
sed 's/\ /\\\ /g' | |||
</pre> | |||
This command replaces all spaces with an escaped space, "<code>\ </code>", which will prevent the need for double quotes (which is part of the reason xargs was choking on the output of the command that failed above). | |||
The next command is: | |||
<pre> | |||
sed "s/\'/\\\'/g" | |||
</pre> | |||
This just escapes the single quotes in the file names. | |||
The last command is: | |||
<pre> | |||
sed 's/mobi\\/mobi/g' | |||
</pre> | |||
and this command finds any space that's at the end of a filename (like in "Title\ -\ Author.mobi\ Author\ -\ Title.mobi", the space separating the two filenames), and eliminates the escape character. This ensures that the two filenames are kept separate and distinct. | |||
Finally, all of this is fed to xargs, which then feeds it to the mv command. | |||
Certainly not the most elegant solution, but it was necessary because of some kind of problem with xargs, whitespaces, and double quotes. | |||
==Renaming Files to put Path in Name== | |||
I wanted to move a bunch of files from: | |||
<pre> | |||
Pictures/Centralia/some category/some category 1.jpg | |||
Pictures/Centralia/some category/some category 2.jpg | |||
Pictures/Centralia/some category/some category 3.jpg | |||
Pictures/Centralia/another category/another category 1.jpg | |||
Pictures/Centralia/another category/another category 2.jpg | |||
Pictures/Centralia/another category/another category 3.jpg | |||
</pre> | |||
to | |||
<pre> | |||
Pictures/Centralia_some_category_1.jpg | |||
... | |||
Pictures/Centralia_another_category_1.jpg | |||
... | |||
</pre> | |||
To do this, I used the following command: | |||
<pre> | |||
find Centralia -name "*.jpg" | \ | |||
sed -e 'p;s/\//_/g' \ | |||
-e 's/ /_/g' \ | |||
-e 's/_\([a-z]\{1,\}_[a-z]\{1,\}_\)\{3\}\([0-9]\)/_\1\2/g' | \ | |||
sed -e 's/^/"/' -e 's/$/"/' | \ | |||
xargs -n2 mv | |||
</pre> | |||
The find command finds all the files I wanted to rename. | |||
The second and third lines replace slashes with underscores. | |||
The fourth line looks for an instance of a one-to-three "some category" | |||
=References= | |||
* This page has more information on special/escape characters: http://sed.sourceforge.net/sedfaq6.html | |||
* One-line sed commands: http://sed.sourceforge.net/grabbag/tutorials/sed1line.txt | |||
* O'Reilly Sed/Awk book: https://docstore.mik.ua/orelly/unix/sedawk/ | |||
** Chapter 5: Basic sed commands http://docstore.mik.ua/orelly/unix/sedawk/ch05_01.htm | |||
** Chapter 6: Advanced sed commands http://docstore.mik.ua/orelly/unix/sedawk/ch06_01.htm | |||
* Sed info file: http://www.gnu.org/software/sed/manual/sed.html#Common-Commands | |||
{{Programs}} | |||
{{Unix Programs}} | |||
{{Languages}} | |||
Latest revision as of 19:39, 30 March 2019
Sed is a *nix system utility that will come with 99% of *nix systems. It's an in-place string manipulation program that can come in handy to make a whole lot of typing into a few lines of string manipulation. It can get ugly, but once you start to use it you'll wonder how you ever lived without it.
Sed introduction and tutorial: http://www.grymoire.com/Unix/Sed.html
Editing Files In-Place
Sed can be used to edit files in-place using the -i flag.
Find and Replace
You can find and replace instances of a string in a file using:
$ sed -i -e 's/peanut butter/jelly/g' file{1,2,3}.txt
This replaces peanut butter with jelly in file1.txt, file2,txt, and file3.txt. To replace more than one thing, use
$ sed -i -e 's/peanut butter/jelly/g' \
-e 's/green eggs/ham/g' \
-e 's/water/wine/g' \
file{1,2,3}.txt
or, more succinctly:
$ sed -i -e 's/peanut butter/jelly/g;s/green eggs/ham/g' \
file{1,2,3}.txt
Sed Patterns
Repeating Search Patterns in Replacements
If you are searching for a pattern, and want to repeat the pattern in the replacement pattern, you can surround it in (escaped) parentheses, like this: \(pattern_to_repeat\)
This can then be put into the replacement pattern by using \1. An example:
$ echo "peanut butter and jelly" | \
sed -e 's/\(jelly\)/strawberry \1/'
peanut butter and strawberry jelly
This can be done with an arbitrary number of patterns, e.g.:
$ echo "pattern1 pattern2 pattern3 pattern4 pattern5" | \
sed -e 's/\(pattern1\) \(pattern2\) \(pattern3\) \(pattern4\) \(pattern5\)/\5 \4 \3 \2 \1/'
pattern5 pattern4 pattern3 pattern2 pattern1
and the number of patterns can also be greater than 9:
echo "p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11" | \
sed -e 's/\(p1\) \(p2\) \(p3\) \(p4\) \(p5\) \(p6\) \(p7\) \(p8\) \(p9\) \(p10\) \(p11\)/\3 \2 \1 \4 \6 \5 \9 \8 \7 \10 \11/'
p3 p2 p1 p4 p6 p5 p9 p8 p7 p10 p11
Special/Escape Characters
NOTE: This section is specific to GNU sed, other versions of sed will likely behave differently.
Sometimes you want to look for generic patterns, like "four numbers in a row", rather than something specific, like "5555". This can be done using special/escape characters.
Numerical Characters
To match any number between 0 and 9, use [0-9], like this:
$ echo "5" | sed -e 's/[0-9]/replacement/' replacement
To match a pattern of N numbers between 0 and 9, use \{N\}, like this:
$ echo "5678" | sed -e 's/[0-9]\{4\}/replacement/'
replacement
If you want to match a pattern of numbers between 0 and 9, and know there will be somewhere between M and N numbers, you can use the syntax \{M,N\}. For example, if you want to replace a number between 2 and 4 digits long:
$ echo "56" | sed -e 's/[0-9]\{2,4\}/replacement/'
replacement
$ echo "5234678" | sed -e 's/[0-9]\{2,4\}/replacement/'
replacement678
$ echo "5" | sed -e 's/[0-9]\{2,4\}/replacement/'
5
Note that in the last command executed, the replacement pattern doesn't show up because the largest pattern of numbers between 0 and 9 is 1, which does not fall in the range of 2 to 4.
Since \{M,N\} is ugly and burdensome to type, you can use the sed flag -r or --regexp-extended to eliminate the need for backslashes:
$ echo "5234678" | sed -e 's/[0-9]\{2,4\}/replacement/'
replacement678
$ echo "5234678" | sed -re 's/[0-9]{2,4}/replacement/'
replacement678
To leave the upper bound of the number size unspecified, use \{N,\}:
$ echo "52" | sed -re 's/[0-9]{2,}/replacement/'
replacement
$ echo "5234678" | sed -re 's/[0-9]{2,}/replacement/'
replacement
$ echo "5223902949082309448792387234" | sed -re 's/[0-9]{2,}/replacement/'
replacement
Sed Commands
Less Common Commands
w command
To search for a pattern, and print the resulting pattern to a file, use the w command:
$ cat list_file Phoenix New York City San Francisco Orlando Atlanta Seattle San Antonio St. Louis $ sed -n '/San/w search_results' list_file $ cat search_results San Francisco San Antonio
e command
To output the results of a command into a new line, the e command can be used. For example, the contents of a small file (called new_item in this example) could be inserted into a line of the file list_file. Here are the two files:
$ cat list_file Phoenix New York San Francisco Orlando Atlanta Seattle San Antonio St. Louis $ cat new_item Boston
Now we can prepend the contents of the command "cat new_item" to the result of the search "/New York/", which results in the line Boston appearing above the line New York:
$ sed '/New York/e cat new_item' list_file Phoenix Boston New York San Francisco Orlando Atlanta Seattle San Antonio St. Louis
Examples
Renaming files, case 1
I had a set of simulation outputs whose names looked like this:
i8_j8_k8 i9_j9_k9 i10_j10_k10 i11_j11_k11 [...] i101_j101_k101 i102_j102_k102 i103_j103_k103
This became problematic, since, doing a string sort, these go out of order (e.g. i80 comes after i8). I wanted to rename them to be something like this:
i008_j008_k008 i009_j009_k009 i010_j010_k010 i011_j011_k011 [...] i101_j101_k101 i102_j102_k102 i103_j103_k103
To do this, I used the following sed script:
#!/bin/sh
ls -1c i* | /bin/sed \
-e 'p' \
-e 's/i\([0-9]\{1\}\)_/i00\1_/' \
-e 's/i\([0-9]\{2\}\)_/i0\1_/' \
-e 's/j\([0-9]\{1\}\)_/j00\1_/' \
-e 's/j\([0-9]\{2\}\)_/j0\1_/' \
-e 's/k\([0-9]\{1\}\)$/k00\1/' \
-e 's/k\([0-9]\{2\}\)$/k0\1/' \
| xargs -I {} -n2 -t mv
|
I will explain this three-part command, as follows:
ls -1c i*
This command will list all of the files, with one file name on each line. This is then piped to the sed command.
/bin/sed \
-e 'p' \
-e 's/i\([0-9]\{1\}\)_/i00\1_/' \
-e 's/i\([0-9]\{2\}\)_/i0\1_/' \
\
-e 's/j\([0-9]\{1\}\)_/j00\1_/' \
-e 's/j\([0-9]\{2\}\)_/j0\1_/' \
\
-e 's/k\([0-9]\{1\}\)$/k00\1/' \
-e 's/k\([0-9]\{2\}\)$/k0\1/'
This sed command has four parts. The first is the print statement, 'p': this prints the name of the file, before any manipulation is performed by sed.
The next three parts are to transform the i's, j's, and k's into the desired form. The first line looks for a number in the form iN (where N is a number from 0-9) and replacees it with i00N, and the second line looks for a number in the form iNN and replaces it with i0NN.
The symtax \{1\} means 1 instance of the preceeding regular expression; the syntax \{2\} means 2 instances of the preceeding regular expression; etc. (See the Regular expressions page).
The parentheses that surround the number pattern \([0-9]\{1\}\) are used to store the pattern, so that it can be inserted in the replacement string (this is what the \1 does).
Finally, the last part of the command is an Xargs command that will take two arguments at a time; the first argument is the original file name (printed with the sed 'p' command), and the second argument is the manipulated string (now in the desired format, iNNN_jNNN_kNNN). These are passed two at a time to the mv command.
I put this in the file script.sh and ran it. The result is:
$ ./script.sh
mv i8_j8_k8 i008_j008_k008
mv i9_j9_k9 i009_j009_k009
mv i10_j10_k10 i010_j010_k010
mv i11_j11_k11 i011_j011_k011
[...]
mv i101_j101_k101 i101_j101_k101
Postscript: I had to modify this script and re-run it in the same directory, which caused a bunch of errors like this:
mv: `i059_j072_k072' and `i059_j072_k072' are the same file mv i042_j072_k072 i042_j072_k072 mv: `i042_j072_k072' and `i042_j072_k072' are the same file mv i018_j072_k072 i018_j072_k072 mv: `i018_j072_k072' and `i018_j072_k072' are the same file mv i026_j072_k072 i026_j072_k072 mv: `i026_j072_k072' and `i026_j072_k072' are the same file mv i016_j072_k072 i016_j072_k072 mv: `i016_j072_k072' and `i016_j072_k072' are the same file mv i142_j072_k072 i142_j072_k072 mv: `i142_j072_k072' and `i142_j072_k072' are the same file mv i129_j072_k072 i129_j072_k072 mv: `i129_j072_k072' and `i129_j072_k072' are the same file mv i135_j072_k072 i135_j072_k072 mv: `i135_j072_k072' and `i135_j072_k072' are the same file mv i125_j072_k072 i125_j072_k072 mv: `i125_j072_k072' and `i125_j072_k072' are the same file mv i127_j072_k072 i127_j072_k072 mv: `i127_j072_k072' and `i127_j072_k072' are the same file mv i119_j072_k072 i119_j072_k072 mv: `i119_j072_k072' and `i119_j072_k072' are the same file mv i114_j072_k072 i114_j072_k072 mv: `i114_j072_k072' and `i114_j072_k072' are the same file mv i100_j072_k072 i100_j072_k072 mv: `i100_j072_k072' and `i100_j072_k072' are the same file
The reason is, Sed would still print two things, the filename and the transformed filename, but if it had already transformed the filename then it would not be transformed, leading to duplicate arguments fed to the mv command.
I ended up using Awk to check if the arguments being fed to mv were duplicates: see Awk#Renaming Files, If Names Not Duplicates
Renaming files, case 2
I had a couple of ebook files that were named "Title - Author.mobi" and I needed to rename them to be "Author - Title.mobi". The challenge was that they had spaces in their names:
My Inventions - Nickola Tessla.mobi My Uncle Oswald - Roald Dahl.mobi Myths to live by - Joseph Campbell.mobi Revolting rhymes - Roald Dahl.mobi Screwjack - Hunter S. Thompson.mobi The Hacker Crackdown - Bruce Sterling.mobi The Information - Martin Amis.mobi The Moronic Inferno & Other Visits to America - Martin Amis.mobi The Power of Myth - Joseph Campbell w_ Bill Moyers.mobi The Rachel Papers - Martin Amis.mobi The Rum Diary_ A Novel - Hunter S. Thompson.mobi The Silmarillion - J. R. R. Tolkien.mobi The Witches - Roald Dahl.mobi The critical period of American history, 1783-1789 - John Fiske.mobi The interpretation of dreams - Sigmund Freud.mobi The murder on the links - Agatha Christie.mobi The time machine - H. G. Wells.mobi The virtue of selfishness_ a new concept of egoism - Ayn Rand.mobi Time's Arrow - Martin Amis.mobi
The trick for doing this was to parse the name into three pieces: the piece appearing before " - ", the piece appearing after " - " and before ".mobi", and the ".mobi" file extension.
Combine the first two pieces \1 and \2 with double-quotes to reorder the title and author, and feed that new file name to xargs for the renaming.
To do this, I used the following sed script:
#!/bin/sh
$ ls -1 *.mobi \
| sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g' \
| sed 's/\ /\\\ /g' \
| sed "s/\'/\\\'/g" \
| sed 's/mobi\\/mobi/g' \
| xargs -n2 mv
|
This was problematic because my initial script was not working:
$ ls -1 *.mobi \
| sed 's/\(.*\) - \(.*\)\.mobi/"\1 - \2\.mobi" "\2 - \1\.mobi" /g' \
| xargs -0 -n1 -I% mv %
usage: mv [-f | -i | -n] [-v] source target
mv [-f | -i | -n] [-v] source ... directory
i.e. the mv command was empty. This is some stupid problem with xargs. If I just run it as "xargs -0", then it prints everything as expected. But as soon as I specify a command, it doesn't print anything.
So I ended up having to hack a solution, like so:
$ ls -1 *.mobi \
| sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g' \
| sed 's/\ /\\\ /g' \
| sed "s/\'/\\\'/g" \
| sed 's/mobi\\/mobi/g' \
| xargs -n2 mv
This consists of 4 sed commands plus an xargs command. The 4 sed commands do the following:
The first command is:
sed 's/\(.*\) - \(.*\)\.mobi/\1 - \2\.mobi \2 - \1\.mobi /g'
This command takes anything of the form "Title - Author.mobi" and transforms it into "Title - Author.mobi Author - Title.mobi". The point here is to print the original file name, then print the destination file name, all on one line.
The next command is:
sed 's/\ /\\\ /g'
This command replaces all spaces with an escaped space, "\ ", which will prevent the need for double quotes (which is part of the reason xargs was choking on the output of the command that failed above).
The next command is:
sed "s/\'/\\\'/g"
This just escapes the single quotes in the file names.
The last command is:
sed 's/mobi\\/mobi/g'
and this command finds any space that's at the end of a filename (like in "Title\ -\ Author.mobi\ Author\ -\ Title.mobi", the space separating the two filenames), and eliminates the escape character. This ensures that the two filenames are kept separate and distinct.
Finally, all of this is fed to xargs, which then feeds it to the mv command.
Certainly not the most elegant solution, but it was necessary because of some kind of problem with xargs, whitespaces, and double quotes.
Renaming Files to put Path in Name
I wanted to move a bunch of files from:
Pictures/Centralia/some category/some category 1.jpg Pictures/Centralia/some category/some category 2.jpg Pictures/Centralia/some category/some category 3.jpg Pictures/Centralia/another category/another category 1.jpg Pictures/Centralia/another category/another category 2.jpg Pictures/Centralia/another category/another category 3.jpg
to
Pictures/Centralia_some_category_1.jpg ... Pictures/Centralia_another_category_1.jpg ...
To do this, I used the following command:
find Centralia -name "*.jpg" | \
sed -e 'p;s/\//_/g' \
-e 's/ /_/g' \
-e 's/_\([a-z]\{1,\}_[a-z]\{1,\}_\)\{3\}\([0-9]\)/_\1\2/g' | \
sed -e 's/^/"/' -e 's/$/"/' | \
xargs -n2 mv
The find command finds all the files I wanted to rename.
The second and third lines replace slashes with underscores.
The fourth line looks for an instance of a one-to-three "some category"
References
- This page has more information on special/escape characters: http://sed.sourceforge.net/sedfaq6.html
- One-line sed commands: http://sed.sourceforge.net/grabbag/tutorials/sed1line.txt
- O'Reilly Sed/Awk book: https://docstore.mik.ua/orelly/unix/sedawk/
- Chapter 5: Basic sed commands http://docstore.mik.ua/orelly/unix/sedawk/ch05_01.htm
- Chapter 6: Advanced sed commands http://docstore.mik.ua/orelly/unix/sedawk/ch06_01.htm
| GNU/Linux/Unix the concrete that makes the foundations of the internet.
Compiling Software · Upgrading Software Category:Build Tools · Make · Cmake · Gdb Bash Bash · Bash/Quick (Quick Reference) · Bash Math Text Editors Text Manipulation Command Line Utilities Aptitude · Diff · Make · Patch · Subversion · Xargs Security SSH (Secure Shell) · Gpg (Gnu Privacy Guard) · Category:Security Networking Linux/SSH · Linux/Networking · Linux/File Server Web Servers
|