awk
awk is a powerful tool. It can deal with rows and columns at the same time. Many C functions can be used with it.
Its basic pattern is awk 'BEGIN{print "start"} pattern {commands} END {print "end"} file'. BEGIN and END are optional. They are actions before process and after process, respectively.
Variable syntax
NR: number of current rowNF: number of fields, default delimeter is space ` `$0: content of current row$1: the content of the first field
Logic
- Execute the
BEGIN{ } - Process
- Read a row of content (file or stdin).
- If the content matches pattern, Execute
{commands}. Else, pass. If the pattern does not exist, execute. - Repeat
- Execute the
END{ }
Examples
awk '{print $1}' student.csv: Print the first fieldawk '/Tom/ {print $2}' student.csv: If the line containsTom, print the second fieldawk -F ',' '{print $NF}' student.csv: Set the delimeter to be,; print the last fieldawk '{s+=$3} END {print s}' student.csv: Calculate the sum of column 3, without headerawk 'BEGIN {getline; print $0} {s+=$3} END {print s}' student.csv: Jump the headline; calculate the sum of column 3awk 'END{print NR}' file: Get how many linesawk -F"," 'BEGIN{getline} max < $3 {max = $3; maxline=$0} END{print maxline}' student.csv: Calculate the max of column 3; print this lineawk -F"," 'BEGIN{OFS=","} {tmp=$3; $3=$4; $4=tmp; print $0}' student.csv: Swap column 3 and column 4.OFSis Output Field Separator, space by default.awk 'BEGIN {getline; print "id," $0} {print NR-1 "," $0}' student.csv: Add a column showing row number
sed
sed is a stream editor. It can print, delete and substitute text. Its basic format is sed [options] commands [file-to-edit]. command is the key component. The pattern of commands is [addr]X[options]. file-to-edit is the file to be edited; it can also deal with stdin as input.
addrspecifies the range of rows we are going to modify, e.g. the 1st row, No. 3 to 100 row. It can be determined by regular expression.Xis a char sed command, e.g.pis print;dis delete;sis substitute.optionsis options forX, e.g.gwith commandsmeans global.
sed will do echo for matched lines by default. -n will suppress this action.
sed '' filename: Likecatsed -n '1p' filename: Print the first linesed -n '10,20p' filename: Print 10-20 linesed -n '10,+9p' filename: Print 10 lines starting from line 10sed -n '1~2p' filename: Print from line 1 to the end, except line 2
delete
sed '1d' filename: Delete the first linesed -i '1d' filename: In-place, modify the file directlysed -i.bak '1d' filename: In-place but do backup firstsed '2,10d' filename: Delete 2-10 linesed /^$/d filename: Delete blank linesed /^foo/d filename: Delete line starting withfoosed /ERROR/!d filename: Delete line withoutERROR,!is to negate the range
substitute
The format is sed 's/regex/replacement/' filename. We can specify range before s as well.
sed 's/this/This/' filename: Substitute only the first occurrancesed 's/this/This/g' filename:g, Globalsed 's/this/This/2 filename: Substitute the second occurrance in matched rowecho "thisthisthis" | sed 's/this/This/2'
sed -n 's/this/This/2p' filename: Print the substituting linessed 's/this/This/i filename:i, case insensitivesed -e 's/this/This/' -e 's/that/That/' filename: Multiple sed
Tutorial
See tutorials as below.
cut
cut can do some simple manipulations on csv files.
Options
-d: field delimeter-f: fields
Examples
cut -d ',' -f1 filename: Get the first columncut -d ',' -f1,3 filename: Get the first and the second columnscut -d':' -f2-4 filename: Get the second to the fourth columns with delimeter:cut -d ',' -f3 --complement filename: Get all columns other than the third
