SoFunction
Updated on 2025-03-10

Usage of awk command in linux

Let’s first take an example:
File a, the average value of the floating point number in the first column of the statistics file a is the row of floating point number. Using awk to achieve it can be done in just one sentence
$cat a
1.021 33
1#.ll   44
2.53 6
ss    7

awk 'BEGIN{total = 0;len = 0} {if($1~/^[0-9]+\.[0-9]*/){total += $1; len++}} END{print total/len}' a
(Analysis: $1~/^[0-9]+\.[0-9]*/ means $1 matches the regular expression in "//". If it matches, total is added to $1, and len increases by itself, that is, the number is added by 1. "^[0-9]+\.[0-9]*" is a regular expression, "^[0-9]" means starting with a number, "\." means escaped, and "." means "." is the decimal point. "[0-9]*" means 0 or more numbers)

The general syntax format of awk is:
awk [-parameter variable] 'BEGIN{initialization} condition type 1{action1} condition type 2{action2}. . . . END{Post-processing}'
Among them: the statements in BEGIN and END play a role before starting to read the file (in_file) and after reading the file, which can be understood as initialization and tailing.

(1) Parameter description:
-F re: Allow awk to change its field separator
-v var=$v Assign the v value to var. If there are multiple variables to be assigned, then write multiple -v, and each variable assignment corresponds to one -v
. To print the line between num-th line of file a to num+num1,
awk -v num=$num -v num1=$num1 'NR==num,NR==num+num1{print}' a
-f progfile: Allow awk to call and execute the progfile program file. Of course, the progfile must be a program file that complies with the awk syntax.

(2) Awk built-in variables:
ARGC    Number of command line parameters
ARGV    Command line parameter array
ARGIND The ARGV logo of the currently processed file
There are two files a and b
awk '{if(ARGIND==1){print "process a file"} if(ARGIND==2){print "process b file"}}' a b
The order of file processing is to scan file a first, then scan file b

NR Number of records that have been read
FNR    The number of records in the current file
The above example can also be written like this:
awk 'NR==FNR{print "process file a"} NR > FNR{print "process file b"}' a b
Enter files a and b. Since a first scan, there must be NR==FNR when scanning a. Then when scanning b, FNR starts counting from 1, while NR continues counting with the number of rows of a, so NR > FNR

To display lines 10 to 15 of the file
awk 'NR==10,NR==15{print}' a

FS input field separator (default is: space:), equivalent to the -F option
awk -F ':' '{print}' a    The same as awk 'BEGIN{FS=":"}{print}' a

OFS output field separator (default is: space:)
awk -F ':' 'BEGIN{OFS=";"}{print $1,$2,$3}' b
If cat b is
1:2:3
4:5:6
Then, after setting OFS to ";" will output
1;2;3
4;5;6
(Small comment: awk uses $1, $2, and 3 divided fields as $1, $2, $3..., and $0 represents the entire record (usually a whole line))

NF: The number of fields in the current record
awk -F ':' '{print NF}' b's output is
3
3
It means that each line of b is divided by the delimiter ":" and has 3 fields.
NF can be used to control the output rows that meet the requirements of the number of fields, so that some exception rows can be processed.
awk -F ':' '{if (NF == 3)print}' b

RS: Enter the record separator, default is "\n"
By default, awk treats a line as a record; if RS is set, awk divides the records according to RS.
For example, if file c, cat c is
hello world; I want to go swimming tomorrow;hiahia
The result of running awk 'BEGIN{ RS = ";" } {print}' c is
hello world
I want to go swimming tomorrow
hiahia
Reasonable use of RS and FS can make awk process more mode documents, for example, multiple lines can be processed at once, for example, the output of document d cat d is
1 2
3 4 5

6 7
8 9 10
11 12

hello
Each record is divided by empty lines and each field is divided by newline characters. This awk is also easy to write.
awk 'BEGIN{ FS = "\n"; RS = ""} {print NF}' d Output
2
3
1

ORS: Output record separator, default is line break, controls the output symbol after each print statement
awk 'BEGIN{ FS = "\n"; RS = ""; ORS = ";"} {print NF}' d Output
2;3;1

(3) awk reads variables in shell
Functions can be implemented using the -v option
     $b=1
     $cat f
     apple

$awk -v var=$b '{print var, $var}' f
1 apple
As for whether there is a way to pass variables in awk to the shell, this is how I understand this question. The shell calls awk, which is actually fork a child process, and the child process cannot pass variables to the parent process unless it is redirected (including pipelines)
a=$(awk '{print $b, '$b'}' f)
$echo $a
apple 1

(4) Output redirection

The output redirection of awk is similar to that of a shell. The redirected target file name must be quoted in double quotes.
$awk '$4 >=70 {print $1,$2 > "destfile" }' filename
$awk '$4 >=70 {print $1,$2 >> "destfile" }' filename

(5) Call the shell command in awk:

1) Use the pipeline
The concept of pipelines in awk is similar to that of a shell, both using the "|" symbol. If a pipe is opened in the awk program, the pipe must be closed before another pipe can be opened. That is to say, only one pipe can be opened at a time. The shell command must be quoted in double quotes. "If you plan to use a file or pipeline in the awk program again for reading and writing, you may have to close the program first, because the pipeline in it will remain open until the script runs. Note that once the pipeline is opened, it will remain open until awk exits. Therefore, statements in the END block will also receive the influence of the pipeline. (The pipeline can be closed on the first line of END)"
There are two syntaxes for using pipes in awk, namely:
awk output | shell input
shell output | awk input

For awk output | shell input, the shell receives the output of awk and processes it. It should be noted that awk output is cached in pipe first, and then call the shell command to process after the output is completed. The shell command is only processed once, and the processing time is "when the awk program ends, or when the pipeline is closed (the pipeline needs to be closed explicitly)"
$awk '/west/{count++} {printf "%s %s\t\t%-15s\n", $3,$4,$1 | "sort +1"} END{close "sort +1"; printf "The number of sales pers in the western"; printf "region is " count "." }' datafile (Explanation: /west/{count++} means matching with "wes"t. If it matches, count will increase by itself)
The printf function is used to format the output and send it to the pipeline. After all outputs are collected, they are sent to the sort command together. The pipe must be closed with the exact same command as when it was opened (sort +1), otherwise the statements in the END block will be sorted with the previous output. The sort command here is executed only once.

In shell output | awk input can only be a getline function. The result of the shell execution is cached in the pipe and then transferred to awk for processing. If there are multiple lines of data, the awk getline command may be called multiple times.
$awk 'BEGIN{ while(("ls" | getline d) > 0) print d}' f