From charlesreid1

Line 181: Line 181:
</pre>
</pre>


Each file is being processed, one argument at a time.
Each file is being processed, one at a time.


Likewise:
Likewise:

Revision as of 01:06, 6 April 2011

xargs is a very handy command line utility that allows you to perform operations on a list. For example, you can take a list of files and tell xargs to run the "rm" command on the list to remove each file on the list. Or you can use xargs to unzip a list of files. Think of it as a quick-and-dirty way to do a bash loop like

for i in `cat list_of_files`; do
command $i
done

except with the flexibility to do more complex things with either the input or the output of the loop. It has the additional advantage that it performs the command once for each argument in the list. To understand this as an advantage, see wikipedia:xargs - the example they give is for the "rm" command, which cannot take extremely long lists of files. To circumvent this, xargs will run the "rm" command once for each file in the list.

Usage

Basic Usage

xargs is almost always called to receive information from a pipe.

One of the most convenient uses of xargs is to grep through files, or to remove lists of files.

For example, to search through C++ code files for the word "turbulence," you could use xargs as follows:

$ find . -name "*.cc" | xargs grep "turbulence"

./Arches.cc:  //db->require("turbulence_model", turbModel);
./BoundaryCondition.cc:    // --- new turbulence inlet flow generator --- 
./BoundaryCondition.cc:      turb_db->get("turbulence_intensity",intensity);
./CompDynamicProcedure.cc:// Schedule recomputation of the turbulence sub model 
./CompDynamicProcedure.cc:// Schedule recomputation of the turbulence sub model 
./CompLocalDynamicProcedure.cc:// Schedule recomputation of the turbulence sub model 
./CompLocalDynamicProcedure.cc:// Schedule recomputation of the turbulence sub model 
./EnthalpySolver.cc:    // if it is set in turbulence model
./IncDynamicProcedure.cc:// Schedule recomputation of the turbulence sub model 
./IncDynamicProcedure.cc:// Schedule recomputation of the turbulence sub model 
./OdtClosure.cc:// Schedule recomputation of the turbulence sub model 
./ReactiveScalarSolver.cc:    // if it is set in turbulence model
./ScalarSolver.cc:    // if it is set in turbulence model
./ScalarSolver.cc:    // we only need to set mixture fraction Pr number in turbulence model
./ScaleSimilarityModel.cc:// Schedule recomputation of the turbulence sub model 
./SmagorinskyModel.cc:// Schedule recomputation of the turbulence sub model 
./SmagorinskyModel.cc:// Schedule recomputation of the turbulence sub model 
./SmagorinskyModel.cc:// Schedule recomputation of the turbulence sub model

Or, to remove a set of files contained in a list contained in a file, you could use:

$ cat list_file | xargs rm

REMEMBER: before running "xargs rm" with ANYTHING, you should run the plain command by itself first, or run "xargs cat", to print the names of the files that will be removed. This prevents you from shooting yourself in the foot:

for i in $EXTREMITIES; do
  rm $i;
done

Creating Files

I mentioned there are lots of creative uses for files. One example would be to create a bunch of files from a list of file names:

$ cat list_of_files | xargs touch

Or, if you want to touch a bunch of files matching certain criteria. An example is if you're on a filesystem that deletes files older than X days, and you want to touch any directory that is X-1 days old, you can feed arguments to find so that it will return directories that are X-1 days old, and then feed that to "xargs touch".

Double-Checking Xargs Commands

The -p (lowercase p) and -t flags can be very useful in making sure that you are executing the commands correctly.

Using the -p flag causes xargs to print out each command that it will execute and ask for confirmation from the user.

Using the -t flag causes xargs to print out each command that it will execute to stderr before it executes the command.


Advanced Usage

Dealing with Newline Characters

One of the problems with combining find with xargs is if you've got spaces or newlines or weird characters in your list of files. In this case, you can use some arguments for find and xargs to prevent you from screwing stuff up:

$ find . -name "*.cc" -print0 | xargs -0 grep "something"

The -print0 argument for find is explained in the man page for find (man find):

     -print0
             This primary always evaluates to true.  It prints the pathname of the current file to standard output, 
             followed by an ASCII NUL character (character code 0).

Likewise, the -0 argument for xargs is explained (in man xargs):

     -0      Change xargs to expect NUL (``\0'') characters as separators, instead of spaces and newlines.  This is 
             expected to be used in concert with the -print0 function in find(1).

Manipulating the Argument List

Sometimes you don't just want to feed the argument list, one at a time, to a command. You might want to insert the name of the file somewhere as a part of the command - "mv" and "cp" are examples of this. For example, you don't say "cp file" (which is what would happen if you pipe the names of files to "xargs cp"), you say "cp file /destination/dir/".

You can insert the name of the file by using the -I flag. From the xargs manpage (man xargs):

     -I replstr
             Execute utility for each input line, replacing one or more occurrences of replstr in up to 
             replacements (or 5 if no -R flag is specified) arguments to utility with the entire line of 
             input.  The resulting arguments, after replacement is done, will not be allowed to grow beyond 
             255 bytes; this is implemented by concatenating as much of the argument containing replstr as 
             possible, to the constructed arguments to utility, up to 255 bytes.  The 255 byte limit does 
             not apply to arguments to utility which do not contain replstr, and furthermore, no replacement 
             will be done on utility itself. 
             Implies -x.

So, an example of this would be:

$ $ find . -name "*.cc" -print0 | xargs -0 -I {} cp {} /temp/.

This would copy all of the *.cc files returned by find into the /temp/ directory. Note that this would destroy the directory structure. If you want to preserve the directory structure, you would probably want to use plain old "cp -R".

-I vs. -J

-I and -J seem at first blush to be redundant - both replace strings, both allow for slightly more complex commands to be executed. However, there is a very big difference:

Using "xargs -J rplstr" will cause xargs to print, all at once, whatever arguments are being piped to it, wherever it finds rplstr.

Using "xargs -I rplstr" will cause xargs to print, one at a time, whatever arguments are being piped to it, wherever it finds rplstr.

This becomes more clear with some examples. Let's say we have a list of file names that we want to create:

$ cat list_of_files
file1
file2
file3

We can pipe this list to "xargs touch" to create the files.

$ cat list_of_files | xargs -t touch
touch file1 file2 file3

This appends the entirety of stdout to the end of the command. The -J argument will do the exact same thing, except instead of appending the entirety of stdout to the end of the command, it will insert it wherever rplstr shows up.

This example copies all the files that were just touched (file1, file2, file3) into one directory higher:

$ cat list_of_files | xargs -t -J % cp % ../
cp file1 file2 file3 ../

But now let's say we want to rename each file, from "file1" to "new_file1", "file2" to "new_file2", and "file3" to "new_file3":

$ cat list_of_files | xargs -t -J % mv % new_%
mv file1 file2 file3 new_%
usage: mv [-f | -i | -n] [-v] source target
       mv [-f | -i | -n] [-v] source ... directory

Oops - that doesn't work as expected. This is because -J is plopping down everything fed to xargs into the % symbol.

Using -I fixes the problem. First, touching files with -I shows that something different is happening:

$ cat list_of_files | xargs -t -I % touch %
touch file1
touch file2
touch file3

Each file is being processed, one at a time.

Likewise:

$ cat list_of_files | xargs -t -I % cp % ../
cp file1 ../
cp file2 ../
cp file3 ../

This shows each file is copied, one at a time.

Now when the move command is given, it works:

$ cat list_of_files | xargs -t -I % mv % new_%
mv file1 new_file1
mv file2 new_file2
mv file3 new_file3

The -I flag can also handle spaces:

$ cat list_of_files2
file name one
file name two
file name three

$ cat list_of_files2 | xargs -t -I {} touch {}
touch file name one
touch file name two
touch file name three

$ ls -1
file name one
file name three
file name two
list_of_files
list_of_files2

The -I argument correctly creates three files named "file name one", "file name two", and "file name three", whereas -J would mess this command up and create files named "file", "name", "one", "two", and "three":

$ cat list_of_files2 | xargs -t -J {} touch {}
touch file name one file name two file name three

$ ls -1
file
list_of_files
list_of_files2
name
one
three
two

Furthermore, the -I command can handle each argument in chunks of N pieces, by giving xargs the argument -n N:

$ cat list_of_files2 | xargs -t -n1 -I {} touch {}
touch file
touch name
touch one
touch file
touch name
touch two
touch file
touch name
touch three

$ cat list_of_files2 | xargs -t -n2 -I {} touch {}
touch file name
touch one
touch file name
touch two
touch file name
touch three

Furthermore, you can use the replacement string fed to -I in double-quotes, whereas it doesn't work exactly right using the replacement string fed to -J:

$ cat list_of_files2 | xargs -I {} echo "Hello from {}"
Hello from file name one
Hello from file name two
Hello from file name three

$ cat list_of_files2 | xargs -J {} echo "Hello from {}"
Hello from {} file name one file name two file name three

Doing More Than One Thing With Input List

Let's say you want to do two or three things with the list of inputs coming to xargs. In this case, you can pipe xargs commands to more xargs commands.

Example:

$ cat list_of_files
/path/to/file name one
/path/to/file name two
/path/to/file name three

What if we want to touch a file named "file name one", "file name two", and "file name three" in the current directory, and we want to get rid of the path?

The path can be removed using the "basename" utility, and the file can be created using "touch". But the normal/intuitive way these might be chained together doesn't work:

$ cat list_of_files | xargs -t -I {} touch `basename {}`
touch /path/to/file name one
touch: /path/to/file name one: No such file or directory
touch /path/to/file name two
touch: /path/to/file name two: No such file or directory
touch /path/to/file name three
touch: /path/to/file name three: No such file or directory

This doesn't work because the substitution of {} doesn't happen inside the back-ticks.

To solve this problem, xargs can be piped to xargs:

$ cat list_of_files | xargs -t -I {} basename {} | xargs -t -I {} touch {}
basename /path/to/file name one
touch file name one
basename /path/to/file name two
touch file name two
basename /path/to/file name three
touch file name three

$ ls -1
file name one
file name three
file name two
list_of_files


Xargs Jujitsu

SVN

I use xargs in pipes to do some handy stuff. For example, if I am in a directory that is part of an SVN repository, I can run the command "svn status" to tell me about files that have been changed, etc.:

$ svn status .
M      Core/Grid/Patch.h
M      Core/Grid/Level.cc
M      Core/Grid/Level.h
M      Core/Grid/sub.mk
?      Core/Grid/AMR_CoarsenRefine.cc
?      Core/Grid/LevelP.h
?      Core/Grid/AMR_CoarsenRefine.h
M      Core/Math/Matrix3.cc
M      Core/Math/Matrix3.h
M      Core/Parallel/Parallel.cc
M      Core/Parallel/Parallel.h
M      Core/Containers/Handle.h
M      Core/Containers/RunLengthEncoder.h

Those "?"s mean the files have not been added to the repository yet. In this case, there's only 3, so it would be straightforward to copy-and-paste the file names into an "svn add" command to add the files to the repository. But the smart solution is to make the computer work for you!

First, use grep to reduce the list only to files that haven't been added to the repository yet:

$ svn status . | grep "?"
?      Core/Grid/AMR_CoarsenRefine.cc
?      Core/Grid/LevelP.h
?      Core/Grid/AMR_CoarsenRefine.h

Then use Awk to strip the leading "?":

$ svn status . | grep "?" | awk -F" " '{print $2}'
Core/Grid/AMR_CoarsenRefine.cc
Core/Grid/LevelP.h
Core/Grid/AMR_CoarsenRefine.h

And finally, use xargs to add each file, one-at-a-time, to the repository:

$ svn status . | grep "?" | awk -F" " '{print $2}' | xargs svn add
A      Core/Grid/AMR_CoarsenRefine.cc
A      Core/Grid/LevelP.h
A      Core/Grid/AMR_CoarsenRefine.h

Or, if we wanted to delete them (this would be the case if we wanted to "clean out" and revert a directory to the repository version), we could replace "svn add" with "rm":

$ svn status . | grep "?" | awk -F" " '{print $2}' | xargs rm

REMEMBER: before you run the "rm" command on anything, leave off the "xargs" portion, and make sure that you're slicing-and-dicing correctly (doing everything up until the awk command will just print the names of the files on which "rm" will operate).