(echo red; echo green 1>&2) | echo blue
This creates two subshells separated by the pipe | symbol.
A subshell is a child process of the current shell, and as such it inherits important properties of the current shell, notably including the open file descriptor table.Since they are child processes, both subshells run concurrently, while their parent shell will simply wait() for all child processes to terminate. The order in which the childs get to run is to a large extent unpredictable, on a multi-core system they may run literally at the same time.
Now, before the subshells get to process their actual tasks, file redirections have to be performed. The left subshell gets its stdout redirected to the write end of the kernel pipe object that is "created" by the pipe symbol. Likewise, the right subshell gets stdin redirected to the read end of the pipe object.
The first subshell contains two processes (red and green) that run in sequence (";"). "Red" is indeed printed to stdout and thus (because of the redirection) sent to the pipe. However, nothing is ever read out of the pipe: The only process that is connected to the read end of the pipe ("echo blue") never reads anything, it is output only.
Unlike "echo red", "echo green >&2" doesn't have stdout connected to the pipe. Its stdout is redirected to whatever stderr is connected to. Here is the explanation what ">&2" (or equivalently, "1>&2") means: For the execution of "echo green", make stdout (1) point to the same object that stderr (2) points to. You can imagine it as being a simple assignment: fd[1] = fd[2].
For "echo blue", stdout isn't explicitly redirected, so it gets run with stdout set to whatever it inherited from its parent shell, which is (probably) your terminal.
Seeing that both "echo green" and "echo blue" write directly to the same file (again, probably your terminal) we have a race -- who wins is basically a question of who gets scheduled to run first. For one reason or other, it seems that blue is more likely to win on your system. It might be due to the fact that the left subshell needs to finish the "echo red" first, which does print to the pipe, and that might introduce a delay / a yield, or such.
Yes the pipe runs two subcommands in parallel but that is not why the blogpost is interesting (or its author surprised). It's because 'echo red' is supposed to block, thus introducing synchronization between the two branches of the pipe, yet it doesn't!
And I must confess, when reading the command my first though was: "Ok so that first echo will die with a SIGPIPE and stderr will be all about the broken pipe." And I was wrong, because of that small buffer.
I wonder what other unices do allow a write to a broken pipe to complete successfully?
It is not actually supposed to block. Pipes block when they are full, but there's not enough data here to fill a pipe buffer. When pipes are broken, SIGPIPE is sent to the writer. Pipes do not block just because nobody is reading from the read end--as long as the read end is still open somewhere, a process could read from it, and that is enough.
When you see "blue", what happened is the left-hand side of the pipe got killed because the right-hand side already finished before "echo red", which closed the read end completely, and then "echo red" got killed with SIGPIPE. That takes out "echo green" with it, because "echo" is a built-in, and so "echo" is not a subprocess. If you use "/bin/echo red" instead, then "green" will always be printed (because SIGPIPE is going to /bin/echo, and not the entire shell).
In other circumstances, the "echo blue" will never read stdin, but the kernel doesn't know or care. As far as the kernel is concerned, "echo blue" could possibly read from stdin, as long as stdin is open.
But indeed the author wasn't aware that readers and witers of the pipe aren't fully synchronized because the buffer in between allows for some concurrency. My writeup wasn't very explicit about that (at least not that writing to the pipe can block when the pipe is full) but I think it's technically accurate and hope it can clear up some confusion -- a lot of readers probably do not understand well how the shell works.
If you want a program to read from stdin and write to stdout, you can use the `cat`, e.g. `echo a | cat` will print “a”.
Lastly, be aware that `echo` is usually a shell builtin that functions like `print`. I’m not sure of all the ways that it might behave differently, but something to be aware of (that it’s not a child process like `cat`).
Pipes would still be a useful way to structure your program. They would just be less useful.
Nushell looks like it also has an internal implementation of pipelines. But I can't read rust so that's just my assumption.
Every Powershell command has a begin, process, and end block. (If you don't write these explicitly, your code goes in an implicit end block.)
When a pipeline is evaluated:
1. From left to right, the begin block of each command is run, sequentially. No process or end blocks are run until every begin block has run.
2. Each command's process block is run, once per object piped in. A process block can output zero, one or many objects; I'd have to check on a computer, but IIRC this is "breadth-first" - each object that a process block outputs is passed to the next process block before returning control to the current process block.
3. After all process blocks are exhausted, from left to right, each command's end block is run. Commands that did not declare a process block receive all piped objects as a single collection. Any output from the end block triggers the process block to the right.
4. When all end blocks have completed, the pipeline is stopped
5. Errors in Powershell can be terminating or non-terminating. When a terminating error is thrown, the pipeline is stopped
6. There is a special StopPipeline error which stops the pipeline but is handled by the engine so the user never sees it. That's how `select -First 5` works (for PS `select`, not gnu select).
Pipelines only operate on streams 0 and 1, as with OS pipes. The other streams (ps has 7) are handled immediately, modulo some buffering behaviour intoxicated for performance reasons. Broadly speaking, the alternate streams are suppressed or enabled by defaults and by switches on each command individually and are rendered by the engine and given to the console to display. But they can also be redirected or captured in variables.
You can do asynchrony in Powershell; threading is offered by a construct called "runspaces". These are not inherently connected to the pipeline, but pipelined commands can implement them, e.g. `foreach -Parallel {do-stuff}`
PS: Precision of language to avoid confusion: "Indeterministic" is a philosophy term, while the CS term is "nondeterministic".
0. https://blog.superpat.com/zero-copy-in-linux-with-sendfile-a...
The command, perhaps intentionally, looks unusual (any code reviewer would certainly be scratching their head):
There's an "echo red" in there but it's never sent anywhere (perhaps a joke with "red herring"?).
There's an "echo green" sent to stderr, that will only be visible if it terminates before "echo blue".
The exact order would be dependent on output buffering, which will depend on which time slice is sorted first, which will vary with number of cpus and their respective load. So yes, it will be indeterministic, but in the same way "top" is.
>>> Note: The ordering of "green" and "blue" in the output might vary because these streams (stdout and stderr) might be buffered differently by the shell or operating system. Most commonly, you will see the output as illustrated above.
https://www.gibney.org/the_output_of_linux_pipes_can_be_inde...