awk
awk
is a powerful tool. It can deal with rows and columns at the same time. Many C functions can be used with it.
Its basic pattern is awk 'BEGIN{print "start"} pattern {commands} END {print "end"} file'
. BEGIN
and END
are optional. They are actions before process and after process, respectively.
Variable syntax
NR
: number of current rowNF
: number of fields, default delimeter is space ` `$0
: content of current row$1
: the content of the first field
Logic
- Execute the
BEGIN{ }
- Process
- Read a row of content (file or stdin).
- If the content matches pattern, Execute
{commands}
. Else, pass. If the pattern does not exist, execute. - Repeat
- Execute the
END{ }
Examples
awk '{print $1}' student.csv
: Print the first fieldawk '/Tom/ {print $2}' student.csv
: If the line containsTom
, print the second fieldawk -F ',' '{print $NF}' student.csv
: Set the delimeter to be,
; print the last fieldawk '{s+=$3} END {print s}' student.csv
: Calculate the sum of column 3, without headerawk 'BEGIN {getline; print $0} {s+=$3} END {print s}' student.csv
: Jump the headline; calculate the sum of column 3awk 'END{print NR}' file
: Get how many linesawk -F"," 'BEGIN{getline} max < $3 {max = $3; maxline=$0} END{print maxline}' student.csv
: Calculate the max of column 3; print this lineawk -F"," 'BEGIN{OFS=","} {tmp=$3; $3=$4; $4=tmp; print $0}' student.csv
: Swap column 3 and column 4.OFS
is Output Field Separator, space by default.awk 'BEGIN {getline; print "id," $0} {print NR-1 "," $0}' student.csv
: Add a column showing row number
sed
sed
is a stream editor. It can print, delete and substitute text. Its basic format is sed [options] commands [file-to-edit]
. command
is the key component. The pattern of commands
is [addr]X[options]
. file-to-edit
is the file to be edited; it can also deal with stdin
as input.
addr
specifies the range of rows we are going to modify, e.g. the 1st row, No. 3 to 100 row. It can be determined by regular expression.X
is a char sed command, e.g.p
is print;d
is delete;s
is substitute.options
is options forX
, e.g.g
with commands
means global.
sed will do echo
for matched lines by default. -n
will suppress this action.
sed '' filename
: Likecat
sed -n '1p' filename
: Print the first linesed -n '10,20p' filename
: Print 10-20 linesed -n '10,+9p' filename
: Print 10 lines starting from line 10sed -n '1~2p' filename
: Print from line 1 to the end, except line 2
delete
sed '1d' filename
: Delete the first linesed -i '1d' filename
: In-place, modify the file directlysed -i.bak '1d' filename
: In-place but do backup firstsed '2,10d' filename
: Delete 2-10 linesed /^$/d filename
: Delete blank linesed /^foo/d filename
: Delete line starting withfoo
sed /ERROR/!d filename
: Delete line withoutERROR
,!
is to negate the range
substitute
The format is sed 's/regex/replacement/' filename
. We can specify range before s
as well.
sed 's/this/This/' filename
: Substitute only the first occurrancesed 's/this/This/g' filename
:g
, Globalsed 's/this/This/2 filename
: Substitute the second occurrance in matched rowecho "thisthisthis" | sed 's/this/This/2'
sed -n 's/this/This/2p' filename
: Print the substituting linessed 's/this/This/i filename
:i
, case insensitivesed -e 's/this/This/' -e 's/that/That/' filename
: Multiple sed
Tutorial
See tutorials as below.
cut
cut
can do some simple manipulations on csv files.
Options
-d
: field delimeter-f
: fields
Examples
cut -d ',' -f1 filename
: Get the first columncut -d ',' -f1,3 filename
: Get the first and the second columnscut -d':' -f2-4 filename
: Get the second to the fourth columns with delimeter:
cut -d ',' -f3 --complement filename
: Get all columns other than the third