Grep, Regex, and Cut
Streams and Redirects
These are what's displayed in the linux terminal screen. There are 3 types:
- STDIN - Standard input
- STDPUT - Standard output
- STDERR - Standard error
Display contents to screen:
echo "Hey!"
To redirect stdout to a new file, use ">". Note that this overrides the contents of a file:
cat file1 > new-file
To redirect stdout and just append to new file, use ">>". This just adds at the bottom of content:
cat file1 >> new-file
To redirect echoed message to a file:
echo "Hey!" > newfile
To append echoed message to a file:
echo "How r u?" >> newfile
To get the list of files in a dir and direct to a file:
ls /dir > newfile
To redirect error to a file:
<command> 2> file1
Example: List unknown dir and redirect error to file-err.txt.
ls /weird/ 2> file-err.txt
If we dont want to log errors, just redirect to null:
ls /weird/ 2> /dev/null
To redirect stdout and stderr at the same time:
<command> > file-out 2> file-err
Example: Display an existing file and a non-existing file100.
cat file1 file100 > outfile 2> errfile
Display contents of file1 and redirect error of file100 to a file:
cat file1 file100 2> errorfile
Display error, and redirect contents of existing file1:
cat file1 file100 > file2
Redirect both output to a single file3:
cat file1 file100 > file3 2>&1
To prevent overwriting when using ">":
set -o noclobber
To allow overwriting (default):
set +o noclobber
To see other options:
set -o
grep, egrep, and fgrep
The grep
, egrep
, and fgrep
commands are used for searching text using patterns. They differ mainly in the type of pattern matching they support and their default behavior.
grep
grep
is used to search text using basic regular expressions.
- Global Regular Expression Print
- It prints lines that match a given pattern.
Syntax:
grep [options] pattern [file...]
Examples:
-
Search for the word "hello" in a file:
grep "hello" filename.txt
-
Search for lines that start with "hello:
grep "^hello" filename.txt
-
Search for lines that end with "world:
grep "world$" filename.txt
-
Search recursively in all
.txt
files in a directory:grep -r "hello" /path/to/directory/*.txt
-
Search for words that contain "h" inside a file:
grep [h] file1.txt
-
Search for words that contain multiple specific letters inside a file:
grep [hzjkl] file1.txt
-
Search for characters between "A" to "G":
grep [a-g] file1
-
Search for characters between "3" to "8":
grep [3-8] file1
-
We can also put the pattern in a file and reference that file when doing grep. Example:
# patternfile
[4-6]To reference file, use "-f":
grep -f patternfile file1.txt
egrep
egrep
is equivalent to grep -E
.
- Extended Global Regular Expression Print
- Used to search text using extended regular expressions.
Syntax:
egrep [options] pattern [file...]
Examples:
-
Search for either "cat" or "dog":
egrep "cat|dog" filename.txt
-
Search for words starting with "cat" and followed by any number of characters:
egrep "cat.*" filename.txt
-
Search for for all lines that contain "hello" and "world":
egrep "hello.*world" filename.txt
-
Search for lines containing a digit:
egrep "[0-9]" filename.txt
-
Search recursively for lines that contain either "foo" or "bar":
egrep -r "foo|bar" /path/to/directory
-
Search for all lines that DOES NOT contain either "hello" or "world":
egrep -v "hello|world" filename.txt
-
Search for all lines that contain either "hello" or "world" but should not contain "hey":
egrep "hello|world" file1 | grep -v "hey"
fgrep
fgrep
is equivalent to grep -F
.
- Fixed-string Global Regular Expression Print
- Used to search text using fixed strings (no regular expressions).
Syntax:
fgrep [options] string [file...]
Examples:
-
Search for the exact string "hello.world" without interpreting
.
as a wildcard:fgrep "hello.world" filename.txt
-
Search for lines containing the exact phrase "error occurred":
fgrep "error occurred" filename.txt
-
Search for lines containing any of the patterns listed in a file:
fgrep -f patterns.txt filename.txt
Common Options
-
-i
: Ignore case.grep -i "hello" filename.txt
-
-v
: Invert match (show lines that do not match the pattern).grep -v "hello" filename.txt
-
-r
: Recursively search directories.grep -r "hello" /path/to/directory
-
-l
: Print only the names of files containing matches.grep -l "hello" *.txt
-
-n
: Print line numbers with output.grep -n "hello" filename.txt
Combined Use
To search for the pattern "error" in all .log
files in a directory, ignoring case, and showing line numbers:
grep -i -n "error" /path/to/directory/*.log
Regex
Regular expressions (regex) are patterns used to match character combinations in strings. They are supported in many programming languages and tools.
Basic Syntax
-
Literal Characters:
- Matches the exact characters.
- Example:
hello
matches the string "hello".
-
Dot:
- Matches any single character except newline.
- Example:
h.llo
matches "hello", "hallo", "hxllo".
.
-
Caret:
- Matches the start of a line.
- Example:
^hello
matches "hello" at the beginning of a line.
^
-
Dollar:
- Matches the end of a line.
- Example:
world$
matches "world" at the end of a line.
$
-
Asterisk:
- Matches zero or more occurrences of the preceding element.
- Example:
he*llo
matches "hello", "hllo", "heeeello".
*
-
Plus:
- Matches one or more occurrences of the preceding element.
- Example:
he+llo
matches "hello", "heeeello", but not "hllo".
+
-
Question Mark:
- Matches zero or one occurrence of the preceding element.
- Example:
he?llo
matches "hello" and "hllo".
?
-
Braces:
- Matches between
n
andm
occurrences of the preceding element. - Example:
he{2,3}llo
matches "heello" and "heeello".
({n,m})
- Matches between
-
Brackets:
- Matches any one of the enclosed characters.
- Example:
h[aeiou]llo
matches "hallo", "hello", "hillo".
([])
-
Parentheses:
- Groups elements together.
- Example:
(hello|hi)
matches "hello" or "hi".
(())
-
Backslash:
- Escapes a special character.
- Example:
hello\.
matches "hello.".
(\\)
Advanced Syntax
-
Alternation:
- Matches either the expression before or the expression after.
- Example:
cat|dog
matches "cat" or "dog".
(|)
-
Character Classes:
\d
: Matches any digit (equivalent to[0-9]
).\D
: Matches any non-digit.\w
: Matches any word character (equivalent to[a-zA-Z0-9_]
).\W
: Matches any non-word character.\s
: Matches any whitespace character.\S
: Matches any non-whitespace character.
-
Anchors:
\b
: Matches a word boundary.\B
: Matches a non-word boundary.
Examples
-
Match a phone number pattern:
- Pattern:
\d{3}-\d{3}-\d{4}
- Matches: "123-456-7890"
- Pattern:
-
Match an email address:
- Pattern:
\w+@\w+\.\w+
- Matches: "example@example.com"
- Pattern:
-
Match a URL:
- Pattern:
https?://(\w+\.)*\w+
- Matches: "http://example.com", "https://example.com"
- Pattern:
-
Match a date (YYYY-MM-DD):
- Pattern:
\d{4}-\d{2}-\d{2}
- Matches: "2023-06-30"
- Pattern:
-
Match a word starting with "a" and ending with "e":
- Pattern:
\ba\w*e\b
- Matches: "apple", "arise"
- Pattern:
Using Regex with grep
-
Basic
grep
:grep "pattern" file.txt
-
Extended regex with
egrep
orgrep -E
:egrep "pattern" file.txt
# or
grep -E "pattern" file.txt -
Case-insensitive search:
grep -i "pattern" file.txt
-
Recursive search in directories:
grep -r "pattern" /path/to/directory
Using Regex with sed
sed
(stream editor) is used for parsing and transforming text using regex.
-
Substitute pattern in a file:
sed 's/oldpattern/newpattern/g' file.txt
-
Delete lines matching a pattern:
sed '/pattern/d' file.txt
Using Regex with awk
awk
is a powerful text processing language with regex support.
-
Print lines matching a pattern:
awk '/pattern/ {print}' file.txt
-
Print specific fields of lines matching a pattern:
awk '/pattern/ {print $1, $3}' file.txt
More examples
This is a sample regtext file:
$ cat regtext
bt
bit
bite
boot
bloat
boat
Search for a word with 1st character as "b" and 3rd character as "t". The second character can be any character.
# '.' represents a single character
$ grep 'b.t' regtext
bit
bite
Search for 'b' followed by any single character, followed by any characters, which is then followed by 't'.
# '*' means any character
$ grep 'b.*t' regtext
bt
bit
bite
boot
bloat
boat
Search for any word that has 'bo', a 't', and any character in between them.
$ grep 'bo*t' regtext
bt
boot
To look for any character/s which may or may not be sandwiched between 'b' and 't':
$ egrep 'b*?t' regtext
bt
bit
bite
boot
bloat
boat
$ egrep 'b.?t' regtext
bt
bit
bite
Cut
Useful if we want to get specific field of a line in a file.
As an example, to get only "users" field in the /etc/passwd file, we can specify the field number (f1) and the character acting as the delimiter (d:):
# to get only users:
cut -f1 -d: /etc/passwd
# to get only the shells
cut -f7 -d: /etc/passwd
# to get only user, pw, uid, gid, comment
# here we're telling it to search everything before "/"
# thus everything before the first "/" is field 1
cut -f1 -d/ /etc/passwd