Comment by packetlost

packetlost Oct 5, 2023 parent

This is why threads aren't nearly as important as many programmers seem to think. Chances are, whatever application you're building can be done in a cleaner way using pipes + processes or green/user-space threads depending on the workload in question. It can be less convenient, but message passing is usually preferable to deadlock hell.

jstimpfle Oct 5, 2023

Pipes are FIFO data buffers implemented in the kernel. For communication between threads of the same process, you can replace any pipe object by a userspace queue implementation protected by e.g. mutex + condition variable. It is functionally equivalent and has potential to be faster. And if you wrap all accesses in lock/unlock pairs (without locking any other objects in between) there is no danger of introducing any more deadlocks compared to using kernel pipes.

Threads are an important structuring mechanism: You can assume that all your threads continue to run, or in the event of a crash, all your threads die.

Also, unidirectional pipes aren't exactly sufficient for inter-process / inter-thread synchronisation. They are ok for simple batch processing, but that's about it.

gpderetta Oct 5, 2023

Incidentally you can use the exact same setup (plush mmap) for interprocess queues.

The advantage of threads is that you can pass pointers to your data through the queue, while that's harder to do between processes and you have to resort to copying data in the queue instead.

another2another Oct 6, 2023

>while that's harder to do between processes and you have to resort to copying data in the queue instead.

I could be wrong - I've never done it, but I understood that you can even store POSIX mutexes and condition vars in shared mem so that 2 processes (or more?) can process data without copying, so long as they use the both use the same locks stored in the shared memory.

gpderetta Oct 6, 2023

Yes, when the mutex or condvar is inited with attribute PTHREAD_PROCESS_SHARED.

packetlost OP Oct 6, 2023

There are domain sockets if you need something more such as passing file descriptors. Both pipes and sockets (including TCP, with obvious limitations) can be done with zero copy given the right set of flags, thought things get harder if you have a complicated runtime (ie. garbage collection) involved. There's always explicitly mapped shared pages

rewmie Oct 5, 2023

> This is why threads aren't nearly as important as many programmers seem to think. Chances are, whatever application you're building can be done in a cleaner way using pipes + processes or green/user-space threads depending on the workload in question.

I think you're making wild claims based on putting up your overgeneralized strawman (i.e., "threads aren't nearly as important as many programmers seem to think") that afterwards you try to water down with weasel words ("depending on the workload in question").

Threads are widely used because they bring most of the benefits of processes (concurrent control flow, and in multicore processors also performance) without the constraints and limitations they bring (exclusive memory space, slow creation, performance penalty caused by serialization in IPC, awkward API, etc).

In multithreaded apps, to get threads to communicate between each other all you need to do to is point to the memory address of the object you instantiated. No serialization needed, no nothing. You simply cannot beat this in terms of "clean way" of doing things.

> It can be less convenient, but (...)

That's quite the euphemism, and overlooks why threads are largely preferred.

packetlost OP Oct 6, 2023

> afterwards you try to water down with weasel words ("depending on the workload in question")

I was saying that the choice between multi-process with message passing or userspace/green-threads depends on workload, not watering down my assertion, though there are exceptions to that statement (see below).

> without the constraints and limitations they bring (exclusive memory space, slow creation, performance penalty caused by serialization in IPC, awkward API, etc).

That just isn't true for pretty much any UNIX-like system, but is sorta true for native Windows. Threads are processes, they are created, scheduled, and killed in the same way as processses on *nix systems. You add a flag to `fork()` that tells it to give thread semantics (ie. shared memory) to the newly forked process and that's it. There's some implicit handling of signal masks and a few other things that are important that get some saner defaults for threads, but that's about it. There are many ways to share data efficiently between processes that doesn't even involve copying. You can map shared memory pages if you really don't want to be using pipes or sockets, but the latter can both be used with zero copy and zero serialization. Sure, the native APIs for those are wonky, but nothing stops languages from making them less so.

> In multithreaded apps, to get threads to communicate between each other all you need to do to is point to the memory address of the object you instantiated. No serialization needed, no nothing. You simply cannot beat this in terms of "clean way" of doing things.

I was referring to the fact that being able to share memory freely like that encourages bad application designs because you aren't forced to distinguish between shared and unshared memory, it's just all shared by default.

Most of an exception to this is certain high performance applications on Windows, which means mostly video games these days (there's obv. exceptions, but it's the most obvious case). I think those are one of the few cases where there isn't really a way to hit your targets without threads.

Regardless of all of this, I'm mostly coming at this from the programming language design perspective, not the OS perspective. Threads are a helpful abstraction, but mostly one of convenience.

Anyways, here's some cold hard data to back up my claims:

- The 2 most popular languages on the planet, JavaScript and Python, have singlethreaded runtimes with greenthreads/async-await concurrency (just google this one, it's not controversial) - The most popular RDBMS, PostgreSQL, as well as nginx[0], the most popular web server do not use threads, yet are highly performant and flexible - Scaling is often done horizontally across a network these days, which lends itself to message passing architecture nicely

[0]: https://w3techs.com/technologies/overview/web_server

gpderetta Oct 5, 2023

Message pass enough and you'll easily deadlock as well.

djbusby Oct 6, 2023

Like how Postfix works. That's a fun architecture to look at. Multiple processes and file based queue. Meanwhile I panic if I don't have PostgreSQL to save my data :/

packetlost OP Oct 6, 2023

Postgres doesn't use threads, it's a multiprocess architecture. Postfix probably does that on purpose to prevent losing outgoing (or incoming emails if you're doing POP3) in the event of a system crash/power loss.

lelanthran Oct 6, 2023

The problems with pipes is that passing a message involves a kernel context switch, no matter how small the message is.

Passing a message in-process is orders of magnitude faster than passing a message out-of-process.

This item has no comments currently.