awk, sed, cut, content manipulating techniques in shell

awk

awk is a powerful tool. It can deal with rows and columns at the same time. Many C functions can be used with it.

Its basic pattern is awk 'BEGIN{print "start"} pattern {commands} END {print "end"} file'. BEGIN and END are optional. They are actions before process and after process, respectively.

Variable syntax

NR: number of current row
NF: number of fields, default delimeter is space ` `
$0: content of current row
$1: the content of the first field

Logic

Execute the BEGIN{ }
Process
- Read a row of content (file or stdin).
- If the content matches pattern, Execute {commands}. Else, pass. If the pattern does not exist, execute.
- Repeat
Execute the END{ }

Examples

awk '{print $1}' student.csv: Print the first field
awk '/Tom/ {print $2}' student.csv: If the line contains Tom, print the second field
awk -F ',' '{print $NF}' student.csv: Set the delimeter to be ,; print the last field
awk '{s+=$3} END {print s}' student.csv: Calculate the sum of column 3, without header
awk 'BEGIN {getline; print $0} {s+=$3} END {print s}' student.csv: Jump the headline; calculate the sum of column 3
awk 'END{print NR}' file: Get how many lines
awk -F"," 'BEGIN{getline} max < $3 {max = $3; maxline=$0} END{print maxline}' student.csv: Calculate the max of column 3; print this line
awk -F"," 'BEGIN{OFS=","} {tmp=$3; $3=$4; $4=tmp; print $0}' student.csv: Swap column 3 and column 4. OFS is Output Field Separator, space by default.
awk 'BEGIN {getline; print "id," $0} {print NR-1 "," $0}' student.csv: Add a column showing row number

sed

sed is a stream editor. It can print, delete and substitute text. Its basic format is sed [options] commands [file-to-edit]. command is the key component. The pattern of commands is [addr]X[options]. file-to-edit is the file to be edited; it can also deal with stdin as input.

addr specifies the range of rows we are going to modify, e.g. the 1st row, No. 3 to 100 row. It can be determined by regular expression.
X is a char sed command, e.g. p is print; d is delete; s is substitute.
options is options for X, e.g. g with command s means global.

print

sed will do echo for matched lines by default. -n will suppress this action.

sed '' filename: Like cat
sed -n '1p' filename: Print the first line
sed -n '10,20p' filename: Print 10-20 line
sed -n '10,+9p' filename: Print 10 lines starting from line 10
sed -n '1~2p' filename: Print from line 1 to the end, except line 2

delete

sed '1d' filename: Delete the first line
sed -i '1d' filename: In-place, modify the file directly
sed -i.bak '1d' filename: In-place but do backup first
sed '2,10d' filename: Delete 2-10 line
sed /^$/d filename: Delete blank line
sed /^foo/d filename: Delete line starting with foo
sed /ERROR/!d filename: Delete line without ERROR, ! is to negate the range

substitute

The format is sed 's/regex/replacement/' filename. We can specify range before s as well.

sed 's/this/This/' filename: Substitute only the first occurrance
sed 's/this/This/g' filename: g, Global
sed 's/this/This/2 filename: Substitute the second occurrance in matched row
- echo "thisthisthis" | sed 's/this/This/2'
sed -n 's/this/This/2p' filename: Print the substituting lines
sed 's/this/This/i filename: i, case insensitive
sed -e 's/this/This/' -e 's/that/That/' filename: Multiple sed

Tutorial

See tutorials as below.

cut

cut can do some simple manipulations on csv files.

Options

-d: field delimeter
-f: fields

Examples

cut -d ',' -f1 filename: Get the first column
cut -d ',' -f1,3 filename: Get the first and the second columns
cut -d':' -f2-4 filename: Get the second to the fourth columns with delimeter :
cut -d ',' -f3 --complement filename: Get all columns other than the third

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 16

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 16

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 16

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1