7  Unix Streams and pipes

Author

Laurent Modolo

Creative Commons License

Objective: Understand function of streams and pipes in Unix systems

When you read a file you start at the top from left to right, you read a flux of information which stops at the end of the file.

Unix streams are much the same things instead of opening a file as a whole bunch of data, process can process it as a flux. There are 3 standard Unix streams:

  1. stdin the standard input
  2. stdout the standard output
  3. sterr the standard error

Historically, stdin has been the card reader or the keyboard, while the two others where the card puncher or the display.

The command catsimply read from stdin and displays the results on stdout

cat
I can talk with
myself

It can also read files and display the results on stdout

cat .bashrc

7.1 Streams manipulation

You can use the > character to redirect a flux toward a file. The following command makes a copy of your .bashrc files.

cat .bashrc > my_bashrc

Check the results of your command with less.

Following the same principle create a my_cal file containing the calendar of this month. Check the results with the command less

Reuse the same command with the unnamed option 1999. Check the results with the command less. What happened ?

Try the following command

cal -N 2 > my_cal

What is the content of my_cal what happened ?

The > command can have an argument, the syntax to redirect stdout to a file is 1> it’s also the default option (equivalent to >). Here the -N option doesn’t exist, cal throws an error. Errors are sent to stderr which have the number 2.

Save the error message in my_cal and check the results with less.

We have seen that > overwrite the content of the file. Try the following commands:

cal 2020 > my_cal
cal >> my_cal
cal -N 2 2>> my_cal

Check the results with the command less.

The command > sends the stream from the left to the file on the right. Try the following:

cat < my_cal

What is the function of the command <?

You can use different redirection on the same process. Try the following command:

cat <<EOF > my_notes

Type some text and type EOF on a new line. EOF stand for end of file, it’s a conventional sequence to use to indicate the start and the end of a file in a stream.

What happened ? Can you check the content of my_notes ? How would you modify this command to add new notes?

Finally, you can redirect a stream toward another stream with the following syntax:

cal -N2 2&> my_redirection
cal 2&>> my_redirection

7.2 Pipes

The last stream manipulation that we are going to see is the pipe which transforms the stdout of a process into the stding of the next. Pipes are useful to chain multiples simple operations. The pipe operator is |

cal 2020 | less

What is the difference between with this command ?

cal 2020 | cat | cat | less

The command zcat has the same function as the command cat but for compressed files in gzip format.

The command wget download files from a url to the corresponding file. Don’t run the following command which would download the human genome:

wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

We are going to use the -q switch which silence wget (no download progress bar or such), and the option -O which allows use to set the name of the output file. In Unix setting the output file to - allow you to write the output on the stdout stream.

Analyze the following command, what would it do ?

wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz | gzip -dc | less

Remember that most Unix command process input and output line by line. Which means that you can process huge datasets without intermediate files or huge RAM capacity.

We have users the following commands:

  • cat/ zcat to display information in stdout
  • > / >> / < / << to redirect a flux
  • | the pipe operator to connect processes
  • wget to download files

You can head to the next session to apply pipe and stream manipulation.

License: Creative Commons CC-BY-SA-4.0.
Made with Quarto.