Preferences

dspillett parent
> The problem is ‘make -j’ spinning up 100s of C++ compilation jobs, using up all of the systems RAM+swap, and causing major instability.

I would put that in the “using it improperly” category. I never use⁰ --jobs without specifying a limit.

Perhaps there should have been a much more cautious default instead of the default being ∞, maybe something like four¹, or even just 2, and if people wanted infinite they could just specify something big enough to encompass all the tasks that could possibly run in the current process. Or perhaps --load-average should have defaulted to something like min(2, CPUs×2) when --jobs was in effect⁴.

The biggest bottleneck hit when using --jobs back then wasn't RAM or CPU though, it was random IO on traditional high-latency drives. A couple of parallel jobs could make much better use of even a single single-core CPU, by the CPU-crunching of a CPU-busy task or two and the IO of other tasks ending up parallel, but too many concurrent tasks would result in an IO flood that could practically stall the affected drives for a time, putting the CPU back into a state of waiting ages for IO (probably longer than it would be without multiple jobs running) - this would throttle a machine² before it ran out of RAM even with the small RAM we had back then compared to today. With modern IO and core counts, I can imagine RAM being the bigger issue now.

--------

[0] Well, used, I've not touched make for quite some time

[1] Back when I last used make much at all small USB sticks and SD cards were not uncommon, but SSDs big++quick+hardy enough for system or work drives were an expensive dream. With frisby-based drives I found a four job limit was often a good compromise, approaching but not hitting significantly diminishing returns if you had sufficient otherwise unused RAM, while keeping a near-zero chance of effectively completely stalling the machine due to a flood of random IO.

[2] Or every machine… I remember some fool³ bogging down the shared file server of most of the department with a vast parallel job, ignoring the standing request to run large jobs on local filesystems where possible anyway.

[3] Not me, I learned the lesson by DoSing my home PC!

[4] Though in the case of causing an IO storm on a remote filesystem, a load-average limit might be much less effective.


davemp
Thanks for the historical perspective. It probably was less of an issue on older hardware because you can ctrl-c if you’re IO starved. Linux user spaces do not do well when the OOM killer comes out to play.

Personally, I don’t think these footguns need to exist.

dspillett OP
Though in the shared drive example, only the host causing the problem can have ctrl+c done to solve it. Running something on the file server to work out the culprit (by checking the owner of the files being accessed for instance) will be pretty much blocked behind everything else affected by the IO storm.

This item has no comments currently.