Sed
Sed
is a “Stream Editor”. Meaning unlike a traditional editor, examination window is limited to a single screen. Sed is called as sed <options> <script> input_files
. The line which sed
is currently reading is loaded into the “pattern space”. We then modify this line via the script, and then we either hold the line, or output the modified file. The modified line goes into std::out
; but can be redirected.
Passing Scripts to sed
<script>
variable is passed either in-line enclosed within quotes, or with the -f
option and writing the script in a text file and doing sed -f <script file> input_files
.
Regular Expression Syntax
/^x1/
: Search for lines beginning withx1
Options
-n
: Don’t print lines by default. That is, usually,sed in.txt
would print all lines, and we don’t want that. So, if we dosed -n '/search/p' in.txt
would only print lines with the word “search”. Similarlysed -n '1,10 p' in.txt
would print the first 10 lines.-s
: Treats independent files differently. Meaning,sed -n '1,10 p' in1.txt in2.txt
would only apply the script to in1.txt and not in2.txt. But if we wanted to print the first ten lines of both, then this option is used.sed -n -s '1,10 p' in1.txt in2.txt
-r
: enables usage of extended regular expressions, for convenience-e
: Allows grouping of commands together, example given below.
Scripts
A script is of the form: [address] [!] command
; []
represent that the argument is optional. If address is not given, command is applied for all lines. !
is the NOT
operator.
Address Usage
sed -n '3 p' in.txt
: prints the third linesed -n '$ p' in.txt
: Prints the last linesed -n -e '10 s/__/ /' -e '10 p' in.txt
: ins/__/ /
;s
stands for replace, and here we replace ‘__’ with ‘ ‘; after which that line is printed.sed -n -r '/19D[0-9]{6}/ p' in.txt
: Print lines in the form “19D123456”
Using regular expressions, we can do some powerful stuff. Suppose, I wanted to print the lines before line 15, which started with the word engi
; we do this as sed -n -r '/^engi/,15 p' in.txt
and to print all statements which are between lines starting with x1
and x2
; sed -n -r '/^x1/,/^x2/ p' in.txt
Nesting is done using the curly braces {}
; for example, to show all statements between lines 4 and 20 beginning with x1
is done as sed -n -e '4,20{^x1 p}' in.txt
Commands
=
: print the line number andp
: print the line
Modify Commands
-
Insert
i\
: Used to insert the given string before the line address.sed '10 i\ /'This string is inserted at line 10'/' in.txt
-
Append
i\
: Similar to insert, but the new line is AFTER the given line number. Insert and append dont work for ranged addresses. -
change
c\
: Change the entire matched line with given text.sed -n -r '/19D[0-9]{6}/ c\ Dual Degree' in.txt
would change entire row to just “Dual Degree” -
delete
d\
: Similar to Change, but deletes the entire line. Can be used for all types of addresses. -
Substitute
s
:s/x1/x2/[flags]
; replaces the first occurrence ofx1
withx2
. If/g
(global) is passed into flag; all occurrences are removed.&
: To findx1
, and replace withx1+x2
; dosed -n 's/x1/&x2/' in.txt
n\
: Suppose we wanna replace “Blue is very sus” to “Red is not sus” (Imagine we have roll no instead of colours and using -r)sed 's/Blue( is )very( sus)/Red\1not\2' in.txt
;\1
holds the value in the first parentheses and so on. -
Transform
y
Syntax:[address1],[address2]y/x1/x2/
: replace occurrences ofx1
withx2
; x1 and x2 must have same length
I/O Commands
n
: (default) Read a line, without\n
char, execute commands, print if-n
is not usedN
: Read a line, add a newline, add to pattern space and execute. Used to join lines.sed -r 'N; s/\n//' in.txt
; joins lines 1,2; 3,4 and so onp
: Copies entire pattern space to outputP
: Prints only the first line of the pattern spaceh
: Overwrites pattern space onto hold spaceH
: Appends pattern space to hold space (pattern space unchanged in both)g
: Overwrites hold space onto pattern spaceG
: Appends hold space to pattern space(hold space unchanged in both)d
: deletes pattern spacer filename
: reads file and sends to output. The file is not passed to pattern space.w filename
: writes to the file from the pattern spaceb
: Similar togoto
statement; can be used for if-else statements ig, usage given belowq
: Quit reading the file
Script Writing
#!/bin/bash
#just put each indiv word in a new line
sed -n -r '
/19D[0-9]{6}/ b save #branch to save; otherwise continue
w others.txt
b #branch to end
:save
w dd-students.txt
'
in.txt
Awk
Scans every line, splits each line into field, compare fields to pattern, perform action. The difference between awk
and sed
is that awk
has capability to remember history between files. The syntax of the script is very similar to C.
#Basic Awk Syntax
awk [options] 'script' file(s) #Direct console
awk [options] -f scriptfile file(s) #input via file
The script part is as follows: pattern [action]
; if pattern
is missing then action is done for all lines, and if action
is missing, then matched line is printed. Either pattern or action must be given.
A line is divided into fields by awk
by using a delimiter (default - whitespace
); and each of these fields can be accessed by using variables, $1
refers to the first field and so on.
Pre-Defined Variables
FS
– Field Separator, default: whitespaceRS
– Record Separator, default: newlineNF
– Number of fields in current recordNR
– Number of the current record – idxOFS
– Output Field Separator, default: whitespaceFILENAME
- Current Filename
ls -al | awk '{print NR, $9}' #Print all files with numbering
##
Script structure
The scripts are divided into three parts; BEGIN, BODY, END. The statements in BEGIN are executed once(declare variables and stuff), the ones in BODY are executed for every line, and the statements in END are executed once at the end of the file (post processing).
#Structure of Body
pattern {statement}
pattern {statement; statement; ...}
pattern {
statement
statement
...
}
Pattern Types
-
awk '/special/ {print}'
: prints the lines with the word “special”. -
Instead of this, we can check if the field matches a regular expression as:
awk -F, '$1~/19D[0-9]{6}/{print NR, $0}'
; meaning if the student is dd, then print his name with numbering. -
~
stands for Matching; and!~
stands for not matching in this case -
However, they need just be pattern matching; we can use boolean arguments as well such as
$1 + $2 > 5 {statement}
Expressions
#We are feeding the input from wc -c *; in which the first variable is the number of characters of the given line
awk '
BEGIN{
lines=0; char=0;
}
{
lines++; char+=$1;
}
END{
print lines, " lines read";
print "Total characters: ",char;
}
'
Passing Variables to awk
To pass the variables from the shell script to the awk script, use the flag -v
; as:- awk -v var1="$shell_var" '{script}'
Associative Arrays
No need for pre-allocating, they get defined with use.
awk -F, '
#($9~/[A-Z][A-Z]/){
state[$9]+=$10
}
END{
for(i in state){
print state[i], i;
}
}
'
Output Statements
print
: simple w/o no formattingprintf
: print with formattingsprintf
: format a string