Comment by debugnik - Hacker Neue

debugnik Jan 22, 2025 parent

No amount of locking can make the getenv API thread-safe, because it returns a pointer which gets invalidated by setenv, but lacks a way to release ownership over it and unblock setenv safely (or to free a returned copy).

So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.

tsimionescu Jan 23, 2025

This is actually not that hard to fix.

Getenv() could keep several copies of the value around: one internal copy protected by a mutex, that it never returns, and one copy per thread that it stores in thread local storage. When you call getenv(), it locks the mutex, checks if the current thread's value exists, populates it from the internal copy if not, and returns it. It will also install a new setenv-specific signal handler on this thread and store info about this thread having a copy.

Setenv() will then take the same mutex as getenv(), check if the internal copy is different from the new value; if it is, it will modify the internal copy, modify the local thread's copy if that has one, and then signal each other thread in the process that has a copy in TLS. The setenv signal handler will modify the local copy that thread holds.

It's gonna be slow for a large multi-threaded program, but since setenv() used to corrupt memory for such programs, they probably don't care. And for single-threaded programs, or even for programs that don't access getenv()/setenv() on multiple threads, there should be no extra overhead other than the mutex and the bookkeeping.

The only issues that would remain are programs which send the pointer they get from getenv() to other threads without ensuring locking access, and programs which rely on modifying the pointer from getenv() directly as a way to set an env var, and expect this to be visible across threads. Those are just hopelessly broken and can't use the same API - but aren't more broken then they are today.

Of course, in addition to this complex work to make the old API (mostly) thread safe, it should also offer a new API that simply returns a copy every time, doesn't promise to show modifications to your copy when setenv() gets called (you need to call getenv() again), and puts the onus on you to free that copy explicitly.

OskarS Jan 23, 2025

> it should also offer a new API that simply returns a copy every time

Returning a copy isn't great (memory allocation!), the API should probably be something like:

    int getenv(const char *varName, char *buf, size_t bufSize, size_t *varSize);

Where the caller manages the buffer and getenv writes into it (so it can e.g. be stack or statically allocated), the third argument is the size of the caller-managed buffer, then the last variable is an "out parameter" that returns the "true" length of the environment variable. Then afterwards, you can check if `*varSize > bufSize`, and if so, you need to make your buffer larger. The return value is an error code.

Doing it like this, you can easily implement the "return a malloced copy" if you want to, but it also gives you the option to avoid allocation entirely. This is important for e.g. embedded or real-time applications, or anything that just likes to avoid `malloc()/free()`.

o11c Jan 23, 2025

If you only consider `getenv`/`setenv` there are indeed many solutions, but it's not that simple. You also need to consider `putenv` (not that nasty, you just need to treat it like initial environment, which means you can't use a single range check) and accessing the `environ` variable directly (nasty).

Your particular solution doesn't work because people expect `getenv` to be async-signal-safe, which means you shouldn't be allocating memory.

Hmm ... doing an incref-like operation during `getenv` for a previously `setenv`ed variable that hasn't yet been accessed in this thread would be fine ... clear those refs during calls we know indicate knowledge refreshes ...

oguz-ismail Jan 23, 2025

>`putenv` (not that nasty,

It's equally nasty. POSIX requires that the argument to `putenv()' not be copied, so it's not very different from assigning to `environ' directly.

gpderetta Jan 23, 2025

> accessing the `environ` variable directly (nasty).

"easy": protect the page containing environ and handle the mutation from the signal handler.

/s of course.

fch42 Jan 23, 2025

"mutating" there involves the need to (re)allocate memory. To do so in a signal handler is hard ... because memory allocators are, while threadsafe, not async-signal-safe. You can't make a hard problem easy by asserting dependence on another (unsolved) hard problem.

Btw, you can _also_ substitute libc's setenv/getenv/putenv with your own (locking) implementations, courtesy preload and all the funky features of ELF symbol resolution. Actually easy. But impossible if you link against static code using it (go ... away). Hmm. easy ? impossible ? damn this grey world. Gimme some color.

throwaway2037 Jan 23, 2025

Someone above mentioned getenv_r(). I needed to Google about it. It is not impl'd by GNU GLibC (that I know). I do see it on NetBSD: https://man.netbsd.org/getenv_r.3

It looks useful.

alkonaut Jan 23, 2025

There has to be some sort of nuance regarding why this seemingly simple fix hasn't been made yet. Changing from crashing to blocking doesn't seem like a big breaking change.

account42 Jan 23, 2025

Because it doesn't actually solve anything: You're still replacing whatever getenv returned from under the nose the program code - if that happens in another thread or in a signal handler in the same thread doesn't make any difference.

And that's before you even get to the `extern char *environ` global.

xxs Jan 23, 2025

B/c you never need setenv outside a single threaded command line utilities, and even then it's questionable.

kazinator Jan 24, 2025

According to ISO C, getenv returns a pointer to storage that can be overwritten by another call to getenv! Only POSIX slightly fixes it: the string comes from the environ array, and operations on environ by the library preserve the strings themselves (when not replacing or deleting them), just not the array. A program that calls nothing but getenv is okay on POSIX, not necessarily on ISO C.

josefx Jan 22, 2025

C could provide functions to lock/unlock a mutex and require that any attempt to access the environment has to be done holding the mutex. This would still leave the correctness in the hands of the user, but at least it would provide a standard API to secure the environment in a multi threaded application that library and application developers could adopt.

fch42 Jan 23, 2025

That is basically "what it means" if an interface is non-MT: you can call this no-problem if you know you're singlethreaded, and if you're not, find your own way to serialize (meaning: have your own locking prinitive you acquire/release where you make calls to these functions).

One could "dream of" a func that tells libc "acquire/drop this mutex of mine around get/set/putenv calls" but that'd simply move the problem - because the nifty "frameworks" would do that (independently of each other, we're sovereign and entitled frameworks around here) and race each other's state nonetheless.

josefx Jan 23, 2025

> because the nifty "frameworks"

Malicious software exists, does that mean we should remove all threading primitives from the standard?

fch42 Jan 23, 2025

Obviously not, but _threading_ primitives are not the subject of this post at all. Declared-as Non-threadsafe interfaces are. And of course one (as is happening here) one can argue whether all "system runtimes" shall be threadsafe. Right now though, they are not, and agreed/sanctioned standards don't require them to be. Again (also as happening here) opinions may differ whether changes-to-make-threadsafe would be bugfixes, enhancements, or (require) new interfaces. I have expressed my views on this. Happy to agree to disagree, though.

xxs Jan 23, 2025

They can have copy on write of course.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous