How does piping work? If I run a program via CLI and redirect output to a file will I be able to pipe that file into another program as it is being written?
Basically when one line is written to the file I would like it to be piped immediately to my second application (I am trying to dynamically draw a graph off an existing program). Just unsure if piping completes the first command before moving on to the next command.
Any feed back would be greatly appreciated!
If you want to redirect the output of one program into the input of another, just use a simple pipeline:
program1 arg arg | program2 arg arg
If you want to save the output of
program1 into a file and pipe it into
program2, you can use
program1 arg arg | tee output-file | program2 arg arg
All programs in a pipeline are run simultaneously. Most programs typically use blocking I/O: if when they try to read their input and nothing is there, they block: that is, they stop, and the operating system de-schedules them to run until more input becomes available (to avoid eating up the CPU). Similarly, if a program earlier in the pipeline is writing data faster than a later program can read it, eventually the pipe’s buffer fills up and the writer blocks: the OS de-schedules it until the pipe’s buffer gets emptied by the reader, and then it can continue writing again.
If you want to use the output of
program1 as the command-line parameters, you can use the backquotes or the
# Runs "program1 arg", and uses the output as the command-line arguments for # program2 program2 `program1 arg` # Same as above program2 $(program1 arg)
$() syntax should be preferred, since they are clearer, and they can be nested.
Piping does not complete the first command before running the second. Unix (and Linux) piping run all commands concurrently. A command will be suspended if
It is starved for input.
It has produced significantly more output than its successor is ready to consume.
For most programs output is buffered, which means that the OS accumulates a substantial amount of output (perhaps 8000 characters or so) before passing it on to the next stage of the pipeline. This buffering is used to avoid too much switching back and forth between processes and kernel.
If you want output on a pipeline to be sent right away, you can use unbuffered I/O, which in C means calling something like
fflush() to be sure that any buffered output is immediately sent on to the next process. Unbuffered input is also possible but is generally unnecessary because a process that is starved for input typically does not wait for a full buffer but will process any input you can get.
For typical applications unbuffered output is not recommended; you generally get the best performance with the defaults. In your case, however, where you want to do dynamic graphing immediately the first process has the info available, you definitely want to be using unbuffered output. If you’re using C, calling
fflush(stdout) whenever you want output sent will be sufficient.
If your programs are communicating using
stdout, then make sure that you are either calling
fflush(stdout) after you write or find some way to disable standard IO buffering. The best reference that I can think of that really describe how to best implement pipelines in C/C++ is Advanced Programming in the UNIX Environment or UNIX Network Programming: Volume 2. You could probably start with a this article as well.
If your two programs insist on reading and writing to files and do not use stdin/stdout, you may find you can use a named pipe instead of a file.
Create a named pipe with the mknod(1) command:
$ mknod /tmp/named-pipe p
Then configure your programs to read and write to /tmp/named-pipe (use whatever path/name you feel is appropriate).
In this case, both programs will run in parallel, blocking as necessary when the pipe becomes full/empty as described in the other answers.