Listing 15. Examples of using xargs tool
~/tmp $ ls -1 | xargs
December_Report.pdf README a
~/tmp $ ls -1 | xargs file
December_Report.pdf: PDF document, version 1.3
README: ASCII text
a: directory
: POSIX tar archive
: Bourne shell script text executable
~/tmp $
The xargs command is not only used to pass file names. You can also use it any time you need to filter the text into a single line:
Listing 16. Good Habits 7 Example: Use the xargs tool to filter text into a single line
~/tmp $ ls -l | xargs
-rw-r--r-- 7 joe joe 12043 Jan 27 20:36 December_Report.pdf -rw-r--r-- 1 \
root root 238 Dec 03 08:19 README drwxr-xr-x 38 joe joe 354082 Nov 02 \
16:07 a -rw-r--r-- 3 joe joe 5096 Dec 14 14:26 -rwxr-xr-x 1 \
joe joe 3239 Sep 30 12:40
~/tmp $
Use xargs with caution
Technically speaking, using xargs rarely has trouble. By default, the end of the file string is underscore (_); if the character is sent as a single input parameter, all content after it will be ignored. To prevent this from happening, the -e flag can be used, which completely disables the end string without parameters.
Back to top
Understand when grep should perform counts - when should bypass
Avoid Counting the output row count by piped grep to wc-l. grep’s -c option provides a count of rows matching a specific pattern and is generally faster than piped to wc, as shown in the following example:
Listing 17. Good Habits 8 Examples: Line Count with and without grep
~ $ time grep and tmp/a/ | wc -l
2811
real 0m0.097s
user 0m0.006s
sys 0m0.032s
~ $ time grep -c and tmp/a/
2811
real 0m0.013s
user 0m0.006s
sys 0m0.005s
~ $
In addition to the speed factor, the -c option is also a good way to perform counting. For multiple files, grep with the -c option returns a separate count of each file, one count per line, while the pipeline for wc provides a combined total count of all files.
However, this example shows another common mistake to avoid, regardless of speed consideration. These counting methods only provide the number of rows containing the matching pattern - if that's what you're looking for, that's fine. But in the case of multiple instances with a particular pattern in the row, these methods cannot provide you with a real count of the actual number of instances matched. Ultimately, to count instances, you still have to use wc to count. First, use the -o option (if your version supports it) to run the grep command. This option only outputs matching patterns, one pattern per line, and not outputs the lines themselves. But you can't use it with the -c option, so use wc -l to count rows, as shown in the following example:
Listing 18. Good Habits 8 Example: Counting pattern instances using grep
~ $ grep -o and tmp/a/ | wc -l
3402
~ $
In this example, calling wc is slightly faster than calling grep a second time and inserting a virtual pattern (e.g. grep -c) to match and count rows.
Back to top
Match certain fields in the output, not just the rows
Tools such as awk are better than grep when you want to match only patterns in specific fields in the output row.
The simplified example below demonstrates how to list only files modified in December.
Listing 19. Example of Bad Habits 9: Use grep to find patterns in specific fields
~/tmp $ ls -l /tmp/a/b/c | grep Dec
-rw-r--r-- 7 joe joe 12043 Jan 27 20:36 December_Report.pdf
-rw-r--r-- 1 root root 238 Dec 03 08:19 README
-rw-r--r-- 3 joe joe 5096 Dec 14 14:26
~/tmp $
In this example, grep filters the rows and outputs all files with Dec in their modification date and name. So files such as December_Report.pdf are matched, even if it has not been modified since January. This may not be the result you wish. To match patterns in a specific field, it is best to use awk, where a relational operator matches the exact field, as shown in the following example:
Listing 20. Good Habits 9 Example: Use awk to find patterns in specific fields
~/tmp $ ls -l | awk '$6 == "Dec"'
-rw-r--r-- 3 joe joe 5096 Dec 14 14:26
-rw-r--r-- 1 root root 238 Dec 03 08:19 README
~/tmp $
For more details on how to use awk, see Resources.
Back to top
Stop using the pipeline with cat
A common basic usage error for grep is to pipe the output of cat to grep to search for the contents of a single file. This is absolutely unnecessary and is a pure waste of time, because tools such as grep accept file names as parameters. You don't need to use cat in this case at all, as shown in the following example:
Listing 21. Examples of Good and Bad Habits 10: Using grep with and without cat
~ $ time cat tmp/a/ | grep and
2811
real 0m0.015s
user 0m0.003s
sys 0m0.013s
~ $ time grep and tmp/a/
2811
real 0m0.010s
user 0m0.006s
sys 0m0.004s
~ $
This error exists in many tools. Since most tools accept standard input using hyphen (-) as a parameter, even if cat is used to disperse multiple files in stdin, the parameters are usually invalid. It is really necessary to perform the connection first before the pipeline only if you use cat with one of multiple filter options.
Back to top
Conclusion: Develop good habits
It's best to check for any bad usage patterns in your command line habits. Poor usage patterns can slow you down and often lead to unexpected errors. This article introduces 10 new habits that can help you get rid of many of the most common usage mistakes. Developing these good habits is a positive step to strengthening your UNIX command line skills.