Overview of Unix Filters Text Processing Utilities:
In this tutorial, we will learn about filters and then work with various filter commands. Filters are commands that read input from stdin and write output to stdout.
By default, when using a shell terminal, the stdin is from the keyboard, and the stdout is to the terminal. Mechanisms to change the stdin and stdout will be covered in the next tutorials.
This Unix Text Processing Commands Tutorial is divided into 3 parts:
- Unix Filters
- Unix Pipes
- More Filter Commands like awk and sed
Unix Video #19:
Text Processing in Unix
Unix provides a number of powerful commands to process texts in different ways. These text processing commands are often implemented as filters.
Filters are commands that always read their input from ‘stdin’ and write their output to ‘stdout’. Users can use file redirection and ‘pipes’ to set up ‘stdin’ and ‘stdout’ as per their needs. Pipes are used to directing the ‘stdout’ stream of one command to the ‘stdin’ stream of the next command.
Some standard filter commands are described below. These commands may also take an input file as a parameter, but by default when the file is not specified, they operate as filter commands.
Unix Filter Commands
- grep: Find lines in stdin that match a pattern and print them to stdout.
- sort: Sort the lines in stdin, and print the result to stdout.
- uniq: Read from stdin and print unique (that are different from the adjacent line) to stdout.
- cat: Read lines from stdin (and more files), and concatenate them to stdout.
- more: Read lines from stdin, and provide a paginated view to stdout.
- cut: Cut specified byte, character, or field from each line of stdin and print to stdout.
- paste: Read lines from stdin (and more files), and paste them together line-by-line to stdout.
- head: Read the first few lines from stdin (and more files) and print them to stdout.
- tail: Read the last few lines from stdin (and more files) and print them to stdout.
- wc: Read from stdin, and print the number of newlines, words, and bytes to stdout.
- tr: Translate or delete characters read from stdin and print to stdout.
Next, let’s work through some of these commands in detail:
Command | grep - It is a command for pattern searching in a file and prints those lines containing that specified pattern. If the file name is not mentioned, grep searches in stdin. |
---|---|
Common Syntax | $ grep [option] pattern [filename …] |
Example | $ grep ‘[A-M]’ file1 Prints those lines which contains capital letters in the range of A to M |
Command | wc - It is a command to count the number of lines, words and characters in a file |
---|---|
Common Syntax | $ wc [OPTION] ….[FILE] |
Example | $ cat file1 Hello How do you do $ wc file1 2 5 20 file1 No of lines-2 No of words-5 No of characters(bytes)-20 |
Command | more - This command is used to display the page one screen at a time |
---|---|
Common Syntax | more [options] file… |
Example | $ls -l | more Will display long listing of files and directories one screen at a time |
Command | paste: this command is used to paste the contents of two files. |
---|---|
Common Syntax: | paste [OPTION] ….[FILE]…. |
Example: | paste file1 file2 This command will combine the contents of file1 and file2 |
Hope you enjoyed this tutorial. Check out our upcoming tutorial to explore more on Text Processing in Unix with Unix Pipes.