Comment by 4death4 - Hacker Neue

4death4 Oct 5, 2023 parent

That may have been surprising, but, if you think about it a little deeper, it makes perfect sense. Programs in a pipeline execute concurrently. If they didn’t, pipelines wouldn’t be useful. For instance a pipeline that downloads a tar file with curl and then untars it. If you wait for curl to finish before running tar, you run in to all sorts of problems. For instance, where do you store the intermediate tar file if it’s really large? Tar needs to run while curl is running to keep buffers small and make execution fast. The only control flow between pipeline programs is done via stdin and stdout. In your example program, you write to stderr so naturally that’s not part of the deterministic control flow.

eru Oct 6, 2023

> If they didn’t, pipelines wouldn’t be useful.

Pipes would still be a useful way to structure your program. They would just be less useful.

psd1 Oct 7, 2023

Powershell implements pipelines deterministically and without concurrency, and you can be very precise about it. Of course, it will use OS pipes if you include binaries in your pipeline.

Nushell looks like it also has an internal implementation of pipelines. But I can't read rust so that's just my assumption.

4death4 OP Oct 7, 2023

What do you mean “without concurrency”? One program runs entirely before the other starts?

psd1 Oct 7, 2023

Powershell pipelines are an engine construct rather than OS pipes or file descriptors. (If you include OS binaries in a PS pipeline, it will map the internal pipeline to OS pipes for that element of the pipeline, of course.)

Every Powershell command has a begin, process, and end block. (If you don't write these explicitly, your code goes in an implicit end block.)

When a pipeline is evaluated:

1. From left to right, the begin block of each command is run, sequentially. No process or end blocks are run until every begin block has run.

2. Each command's process block is run, once per object piped in. A process block can output zero, one or many objects; I'd have to check on a computer, but IIRC this is "breadth-first" - each object that a process block outputs is passed to the next process block before returning control to the current process block.

3. After all process blocks are exhausted, from left to right, each command's end block is run. Commands that did not declare a process block receive all piped objects as a single collection. Any output from the end block triggers the process block to the right.

4. When all end blocks have completed, the pipeline is stopped

5. Errors in Powershell can be terminating or non-terminating. When a terminating error is thrown, the pipeline is stopped

6. There is a special StopPipeline error which stops the pipeline but is handled by the engine so the user never sees it. That's how `select -First 5` works (for PS `select`, not gnu select).

Pipelines only operate on streams 0 and 1, as with OS pipes. The other streams (ps has 7) are handled immediately, modulo some buffering behaviour intoxicated for performance reasons. Broadly speaking, the alternate streams are suppressed or enabled by defaults and by switches on each command individually and are rendered by the engine and given to the console to display. But they can also be redirected or captured in variables.

You can do asynchrony in Powershell; threading is offered by a construct called "runspaces". These are not inherently connected to the pipeline, but pipelined commands can implement them, e.g. `foreach -Parallel {do-stuff}`

4death4 OP Oct 7, 2023

Ok, so it sounds like Powershell would have the exact same issue as the Linux pipes. The issue has nothing to do with determinism with the pipeline construction and everything to do with the fact that part of the pipeline writes to stderr, which you could call stream 2.

Dylan16807 Oct 8, 2023

The fact that echo green writes to stderr mainly just means that you can see the non-determinism happening, because if it wrote to stdout its output would be invisible.

The big part that's not deterministic is whether echo red succeeds or dies, along with which order the programs exit in. That would be nondeterministic even if you just ran "echo red | echo blue". But in that case you would always see "blue" so it would be hard to tell.

In powershell, it would be deterministic. It sounds like echo red would always succeed.

psd1 Oct 8, 2023

You mean https://www.gibney.org/the_output_of_linux_pipes_can_be_inde... ?

Absolutely not, that would never happen in Powershell. I just explained how it works...?

This item has no comments currently.