Profile: charleslmunger

charleslmunger

Joined Mar 3, 2013 1,241 karma

charleslmunger Jan 2, 2026 parent

Yup this works but there's as of yet no HBR13.5 or better input so you're not getting full hdmi 2.1 equivalent. But if you don't care about 24 bits per pixel DSC then you can have an otherwise flawless 4k120hz experience.
https://trychen.com/feature/video-bandwidth
charleslmunger Dec 10, 2025 parent

It's so weird to see the leading heroin story phrased like a hypothetical, when:
1. Heroin itself was marketed as a "non-addictive morphine substitute", and sold to the public. It didn't become a controlled substance until 1914 (according to Wikipedia) 2. The opioid crisis was basically started and perpetuated by Purdue pharma, again marketing Oxycodone with the label “Delayed absorption as provided by OxyContin tablets, is believed to reduce the abuse liability of a drug.” and other more egregious advertising. 3. Britain went to war with China twice to force the Qing dynasty to allow them to sell opium there. 4. President Teddy Roosevelt's grandfather made a ton of money in the opium trade.
It's supposed to be sort of shocking hypothetical, except actually that's basically the history of the actual drug.
charleslmunger Dec 8, 2025 parent

That depends on your workload. If you're making a game that's expected to use near 100% of system resources, or a real time service pinned to specific cores, your local application is the overall system.
charleslmunger Dec 8, 2025 parent
>Critical section under 100ns, low contention (2-4 threads): Spinlock. You’ll waste less time spinning than you would on a context switch.
If your sections are that short then you can use a hybrid mutex and never actually park. Unless you're wrong about how long things take, in which case you'll save yourself.
>alignas(64) in C++
```
    std::hardware_destructive_interference_size
```
Exists so you don't have to guess, although in practice it'll basically always be 64.
The code samples also don't obey the basic best practices for spinlocks for x86_64 or arm64. Spinlocks should perform a relaxed read in the loop, and only attempt a compare and set with acquire order if the first check shows the lock is unowned. This avoids hammering the CPU with cache coherency traffic.
Similarly the x86 PAUSE instruction isn't mentioned, even though it exist specifically to signal spin sections to the CPU.
Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex. Spinning for consumer threads can be done in specialty exclusive thread per core cases where you want to minimize wakeup costs, but that's not the same as a spinlock which would cause any contending thread to spin.
charleslmunger Dec 2, 2025 parent

>The compiled compressed binary for an APK
This doesn't undermine your argument at all, but we should not be compressing native libs in APKs.
https://developer.android.com/guide/topics/manifest/applicat...
charleslmunger Nov 29, 2025 parent

>Not at all? Most memory-safety issues will never even show up in the radar
Citation needed? There's all sorts of problems that don't "show up" but are bad. Obvious historical examples would be heartbleed and cloudbleed, or this ancient GTA bug [1].
1: https://cookieplmonster.github.io/2025/04/23/gta-san-andreas...
charleslmunger Nov 19, 2025 parent

Unfortunately the standard library mutex is designed in such a way that condition variables can't use requeue, and so require unnecessary wakeups. I believe parking lot doesn't have this problem.
charleslmunger Nov 2, 2025 parent

You can influence the choice of conditional moves (usually inserting them) with
__builtin_expect_with_probability(..., 0.5)
https://github.com/protocolbuffers/protobuf/commit/9f29f02a3...
charleslmunger Oct 13, 2025 parent

Jetbtains IDEs let you configure this - my favorite use is to highlight kotlin extension functions differently than normal functions.
This kind of highlighting as a secondary information channel for compiler feedback is great. Color, weight, italics, underlines - all help increase information density when reading code.
charleslmunger Sep 22, 2025 parent

If you're working on something where the cost of bugs is high and they're tricky to detect, LLM generated code may not be a winning strategy if you're already a skilled programmer. However, LLMs are great for code review in these circumstances - there is a class of bugs that are hard to spot if you're the author.
As a simple example, accidentally inverting feature flag logic will not cause tests to fail if the new behavior you're guarding does not actually break existing tests. I and very senior developers I know have occasionally made this mistake and the "thinking" models are very good at catching issues like this, especially when prompted with a list of error categories to look for. Writing an LLM prompt for an issue class is much easier than a compiler plugin or static analysis pass, and in many cases works better because it can infer intent from comments and symbol names. False positives on issues can be annoying but aren't risky, and also can be a useful signal that the code is not written in a clear way.
charleslmunger Sep 14, 2025 parent

Surprised not to see the foreign functions and memory apis on this list, or varhandle. Try-with-resources was a 10/10 feature for sure though.
charleslmunger Sep 6, 2025 parent

Mobile apps store SQLite dbs in their private data directory that only they can access. In order to exploit a vulnerability you'd have to first break the sandbox. Desktop OSes generally have far weaker protections than that, if you have access to the user's profile directory you can steal all of their credentials or plant executables etc.
When I think application file format I think of something like .txt, pdf, or .doc, where it's expected that you'll receive untrusted input passed around. In that case it makes a lot more sense to restrict which features of SQLite are accessible, and even then I'd worry about using it in widely - there's so much surface area, plus the user confusion of shm and wal files.
charleslmunger Sep 5, 2025 parent

That's true, but most usage of sqlite is not as an application file format, and many of those CVEs don't apply even to that use case. The reason people have policies around CVE scanning is because CVEs often represent real vulnerabilities. But there's also a stuff like "this regex has exponential or polynomial runtime on bad inputs", which is a real security issue for some projects and not others, depending on what the input to the regex is. That's also true for SQLite, and I'm guessing that the author of that page has spent a bunch of time explaining to people worried about some CVE that their usage is not vulnerable. The maintainer of cURL has expressed similar frustration.
charleslmunger Sep 5, 2025 parent

Very cool. Hardware asan did not catch the pointer provenance bug in the previous implementation of that code because it relies on tag bits, and the produced pointer was bit-identical to the intended one. It sounds like fil-c would have caught it because the pointer capabilities are stored elsewhere.

charleslmunger Sep 5, 2025 parent

Out of curiosity, does this idiom work in fil-c?

https://github.com/protocolbuffers/protobuf/blob/cb873c8987d...

      // This somewhat silly looking add-and-subtract behavior provides provenance
      // from the original input buffer's pointer. After optimization it produces
      // the same assembly as just casting `(uintptr_t)ptr+input_delta`
      // https://godbolt.org/z/zosG88oPn
      size_t position =
      (uintptr_t)ptr + e->input_delta - (uintptr_t)e->buffer_start;
      return e->buffer_start + position;

It does use the implementation defined behavior that a char pointer + 1 casted to uintptr is the same as casting to uintptr then adding 1.

charleslmunger Sep 5, 2025 parent

>and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed.
I think that's an unfair reading. Sqlite runs fuzzers itself and quickly addresses bugs found by fuzzers externally. There's an entire section in their documentation about their own fuzzers and thanking third party fuzzers, including credit to individual engineers.
https://www.sqlite.org/testing.html
The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
charleslmunger Sep 5, 2025 parent

This is really cool! I noticed
>The fast path of a pollcheck is just a load-and-branch.
A neat technique I've seen used to avoid these branches is documented at https://android-developers.googleblog.com/2023/11/the-secret... under "Implicit suspend checks".
charleslmunger Aug 25, 2025 parent

>If you switch to WAL mode, the default behavior is that transactions are durable across application crashes (or SIGKILL or similar) but are not necessarily durable across OS crashes or power failures. Transactions are atomic across OS crashes and power failures. But if you commit a transaction in WAL mode and take a power loss shortly thereafter, the transaction might be rolled back after power is restored.
How is this behavior reconciled with the documentation cited in my comment above? Are the docs just out of date?
charleslmunger Aug 24, 2025 parent

>EXTRA synchronous is like FULL with the addition that the directory containing a rollback journal is synced after that journal is unlinked to commit a transaction in DELETE mode. EXTRA provides additional durability if the commit is followed closely by a power loss.
It depends on your filesystem whether this is necessary. In any case I'm pretty sure it's not relevant for WAL mode.
charleslmunger Aug 24, 2025 parent

The default is FULL
https://sqlite.org/compile.html#default_synchronous
>SQLITE_DEFAULT_SYNCHRONOUS=<0-3> This macro determines the default value of the PRAGMA synchronous setting. If not overridden at compile-time, the default setting is 2 (FULL).
>SQLITE_DEFAULT_WAL_SYNCHRONOUS=<0-3> This macro determines the default value of the PRAGMA synchronous setting for database files that open in WAL mode. If not overridden at compile-time, this value is the same as SQLITE_DEFAULT_SYNCHRONOUS.
Many wrappers for sqlite take this advice and change the default, but the default is FULL.
charleslmunger Aug 16, 2025 parent
I too interpretat those docs as contradictory, and I wonder if, like how Java 5 strengthened volatile semantics, this happened at some point in C# too and the docs weren't updated? Either way the specification, which the docs say is definitive, says it's acquire/release.
https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...
"When a field_declaration includes a volatile modifier, the fields introduced by that declaration are volatile fields. [...] For volatile fields, such reordering optimizations are restricted:
```
    A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.

    A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence."
```
charleslmunger Aug 16, 2025 parent

>A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write. A volatile read operation prevents later memory operations on the thread from being reordered to occur before the volatile read
Looks like release/acquire to me? A total ordering would be sequential consistency.
charleslmunger Jul 31, 2025 parent

Avoiding false sharing is a separate problem best solved by explicitly aligning the struct or relevant members to std::hardware_destructive_interference_size.
charleslmunger Jul 25, 2025 parent

I tried implementing an optimized varint encoder that did something similar, by populating an 8 byte value and then storing it to ram, but unaligned overlapping stores caused big regressions. The approach that worked required a different technique. This one is for encoding backwards:
1. One branch for the one-byte case, always inlined, just store the byte
2. Calculate the size size of the unencoded zero prefix with a branch-free construction: (((uint32_t)clz + 7) * 9) >> 6
3. Hand roll a jump table taking advantage of arm's fixed instruction width to calculate the jump target with a shift.
https://github.com/protocolbuffers/protobuf/commit/b039dfe26...
This results in one branch for 1 byte varints, plus one additional branch for any larger size, and the branch predictor does not have to track a varying trip count through a loop. This approach resulted in a 2.64% speedup for overall encoding (which includes a lot more than just varints) on mid sized arm cores.
I think it's very difficult to beat a single comparison and branch for the 1 byte case for actual encoding forwards or backwards, unless you know there's going to be long runs of one-byte values.
charleslmunger Jul 15, 2025 parent

A connection pool is absolutely a best practice. One of the biggest benefits is managing a cache of prepared statements, the page cache, etc. Maybe you have temp tables or temp triggers too.
Even better is to have separate pools for the writer connection and readers in WAL mode. Then you can cache write relevant statements only once. I am skeptical about a dedicated thread per call because that seems like it would add a bunch of latency.
charleslmunger Jul 7, 2025 parent

How did you make it wait free with only compare and swap?
charleslmunger Jul 7, 2025 parent

It's too hard because the variations you could add to it (multi threading) that add enough depth to make it hard make it too hard, in my opinion. If you look at the implementation I linked in my previous comment, it's fully lock-free, which is pretty unreasonable to expect from anyone who isn't already familiar with lock free concurrency. On the other hand the version with a lock is basically identical to the single thread version. Asking for the two-lock queue is also a bad interview question because it's not something you'd reasonably expect someone to derive in an interview.
The other examples given for fleshing it out are all pretty similar; if a candidate can do one, chances are they can do the others too. If you want to get a decent signal if candidate skill, you have to ask a question easy enough that any candidate you'd accept can answer it, then incrementally add difficulty until you've given the candidate a chance to show off the limit of their abilities (at least as applied to your question).
Otherwise you ask a too-easy question which everyone nails, then make it way too hard and everyone fails. Or you ask a too-easy question and follow it up with additional enhancements that don't actually add much difficulty, and again all the candidates look similar. That's just my experience; the author seems pleased with the question so maybe they're getting good signal out of it.
charleslmunger Jul 7, 2025 parent

I've implemented multiple production versions of this problem (but not in JavaScript)[1], so maybe my view of this problem is miscalibrated...
This feels both too easy and too hard for an interview? I would expect almost any new grad to be able to implement this in the language of their choice. Adding delays makes it less trivial, except that the answer is... Just use the function provided by the language. That's the right answer for real code, but what are you really assessing by asking it?
[1] https://github.com/google/guava/blob/master/guava/src/com/go...
charleslmunger Jul 6, 2025 parent

Check out CVE-2017-13156 which is a real exploit that leveraged differences in zip parsing to bypass a signature scheme.
charleslmunger Jul 1, 2025 parent

This is doable via this trick:
https://github.com/protocolbuffers/protobuf/blob/ae0129fcd01...

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous