How to read and process CSV files using Bash

introduce

I'll cover it because it's very easy to write when reading simple CSV files with Linux Bash scripts and processing them.

How to use the cut command

As a way to read a CSV file and process it using the Bash script you often see, there is a description of reading a CSV file line by line from standard input and storing the columns in a variable using the cut command.

Contents

 #!/bin/bash

while read line
do
  # Save a line of text of the CSV file read in the $line line and save it in the variable columns through the cut command.  col1=$(echo ${line} | cut -d , -f 1)
  col2=$(echo ${line} | cut -d , -f 2)
  col3=$(echo ${line} | cut -d , -f 3)

  # The processing content is described here
  # $colX Reference to the text of the read CSV file  echo "col1:$col1 col2:$col2 col3:$col3"
done &lt; $1

Contents of csv file

$ cat  
a1,a2,a3
b1,b2,b3
c1,c2,c3
$

Execute scripts with csv file as parameter

$ ./  
col1:a1 col2:a2 col3:a3
col1:b1 col2:b2 col3:b3
col1:c1 col2:c2 col3:c3
$

How to use IFS to store columns in variables

By changing the environment variable of the delimiter IFS to and setting multiple variables in the read command, you can give a simple description without using the cut command.

Contents

 #!/bin/bash

 # Store a row of text in read CSV files in multiple variableswhile IFS=, read col1 col2 col3
do
  # The processing content is described here
  # $colX Reference to the text of the read CSV file  echo "col1:$col1 col2:$col2 col3:$col3"
done &lt; $1

How to use IFS to store columns in an array (●)

You can also use the -a option of the read command to store the split column in an array.

 #!/bin/bash

while IFS=, read -a col
do
  echo "col1:${col[0]} col2:${col[1]} col3:${col[2]}"
done < $1

This method is the most recommended because it is an array, so it is easy to loop, add, delete, and process columns, and can be flexibly referenced and used variable extensions to show.

#!/bin/bash
while IFS=, read -a col
do
  for c in ${col[@]}
  do
    echo "loop:$c"
  done

  unset col[2]
  col+=(lastcol)

  echo "${col[@]}"
  echo "${col[@]:1}"
  echo "${col[@]/#/col:}"
done < $1

$ ./  
loop:a1
loop:a2
loop:a3
a1 a2 lastcol
a2 lastcol
col:a1 col:a2 col:lastcol
loop:b1
loop:b2
loop:b3
b1 b2 lastcol
b2 lastcol
col:b1 col:b2 col:lastcol
loop:c1
loop:c2
loop:c3
c1 c2 lastcol
c2 lastcol
col:c1 col:c2 col:lastcol
$

For space or tab-delimited files

If the file delimiter is a space separator (SSV) or a tab separator (TSV), the environment variable IFS of the delimiter defaults to spaces, tabs, and newlines, so IFS is not specified in the script for processing. I can.

 #!/bin/bash

while read -a col
do
  echo "col1:${col[0]} col2:${col[1]} col3:${col[2]}"
done < $1

However, if the column is empty, it will omit consecutive spaces and tabs before and after and package the variable positions together.

$ cat 
a1 a2 a3
b1  b3
  c3
$

$ ./ 
col1:a1 col2:a2 col3:a3
col1:b1 col2:b3 col3:
col1:c3 col2: col3:
$

To prevent this, you need to replace spaces and tabs with commas and read them into a comma-separated array.

 #!/bin/bash

IFS=,
while read line
do
  col=(${line// /,})
  echo "col1:${col[0]} col2:${col[1]} col3:${col[2]}"
done < $1

#!/bin/bash

IFS=,
while read line
do
  col=(${line//$'\t'/,})
  echo "col1:${col[0]} col2:${col[1]} col3:${col[2]}"
done < $1

How to handle the CSV file to be read

If you want to read the CSV file in reverse order, replace a specific character, and then read it, you can process it before reading the CSV file by passing the result of the process replacement <() to standard input.

 #!/bin/bash

while IFS=, read -a col
do
  echo "col1:${col[0]} col2:${col[1]} col3:${col[2]}"
done < <(tac $1)

$ cat  
a1,a2,a3
b1,b2,b3
c1,c2,c3
$

$ ./  
col1:c1 col2:c2 col3:c3
col1:b1 col2:b2 col3:b3
col1:a1 col2:a2 col3:a3
$

How to use the awk command

If you want to easily process or aggregate CSV files, it may be easier to handle with the awk command.
The awk command reads the file specified as a parameter line by line from the beginning, and automatically stores the contents separated by the delimiter in the variables $1, $2..., and can describe the contents to be processed line by line. When processing CSV files, you need to specify a comma as the separator in the -F option.

$ awk -F, '{print "col1:"$1,"col2:"$2,"col3:"$3}' 
col1:a1 col2:a2 col3:a3
col1:b1 col2:b2 col3:b3
col1:c1 col2:c2 col3:c3
$

Furthermore, if processing content becomes complicated, processing content can be described in a file as a script.

{
  print "col1:"$1,"col2:"$2,"col3:"$3
}

$ awk -F, -f   
col1:a1 col2:a2 col3:a3
col1:b1 col2:b2 col3:b3
col1:c1 col2:c2 col3:c3
$

If the file separator is space-separated (SSV) or tab-separated (TSV), you don't need to specify options like a Bash script to read, but it's safer to specify options because the variable location is packaged and stored.

$ awk -F'[. ]' '{print "col1:"$1,"col2:"$2,"col3:"$3}' 
col1:a1 col2:a2 col3:a3
col1:b1 col2: col3:b3
col1: col2: col3:c3
$

$ awk -F'[.\t]' '{print "col1:"$1,"col2:"$2,"col3:"$3}'  
col1:a1 col2:a2 col3:a3
col1:b1 col2: col3:b3
col1: col2: col3:c3
$

The awk command is also a powerful command, but there is another way to use the perl command to handle CSV files, which is more intuitive and complex.
This is a very complex command that is included in a Linux server by default, so if you are interested, you can use it.

This is the article about how to use Bash to read and process CSV files. For more related Bash to read and process CSV content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!