Comment by wbl - Hacker Neue

wbl Oct 20, 2025 parent

Dynamic types have classically used the lower bits freed by alignment constraints. If I know a cons cell is 16 bytes then I can use the low 4 bits of an address to store enough type info to disambiguate.

afishhh Oct 20, 2025

There's a technique known as "NaN boxing" which exploits the fact double precision floats allow you to store almost 52 bits of extra data in what would otherwise be NaNs.

If you assume the top 16 bits of a pointer are unused[1], you can fit a pointer in there. This lets you store a pointer or a full double by-value (and still have tag bits left for other types!).

Last I checked LuaJIT and WebKit both still used this to represent their values.

[1] On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.

sparkie Oct 21, 2025

> On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.

Pointers need to be canonical if LAM/UAI is not enabled. The simplest way to do it is to shift left by 16, then shift arithmetic right by 16. (Or 7 if using 5-level paging). Alternatively, you can store the pointer shifted left by 16 bits, and have the tag in the lower 16 bits, then canonicalizing the pointer is just a single shift-arithmetic-right. If combining with NaN-boxing, then you rotate right to recover the double. (Demo: https://godbolt.org/z/MvvPcq9Ej). This is actually more efficient than messing with the high bits directly.

With LAM/UAI, the requirement is that the 63rd bit matches the 47th (or 56th) bit, which gives 15-bits of tag space on LAM48 and 6-bits of tag space on LAM57.

With LAM enabled, care needs to be taken when doing any pointer comparison, as two pointers which point to the same address may not be equal. There have been multiple exploits with LAM, including speculative execution exploits.

sparkie Oct 21, 2025

Apologies, there's a mistake in the godbolt link above. `SIGN_BIT` should be `0x8000` and not `0x1000`.

jandrewrogers Oct 21, 2025

If you restrict yourself to all variants of x86 and ARM, the number of high bits for which I could not find conflicting uses is 6 bits (bits 57-62). The other high bits are reserved in some hardware contexts and therefore may create conflicts.

Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.

There is no guarantee that those 6 bits are safe either. They are just the only bits for which I could not find existing or roadmap usage across x86 and ARM sources when I last did a search.

sparkie Oct 21, 2025

> Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.

Linux will not allocate past the 47-bit range, even with 5-level paging enabled, unless specifically requested, by providing a pointer hint to `mmap` with a higher address.

https://www.kernel.org/doc/html/v5.14/x86/x86_64/5level-pagi...

jandrewrogers Oct 21, 2025

Ah, thanks for the detail! I was unaware that this was how it worked.

sparkie Oct 21, 2025

There's numerous techniques used. Many are covered in Gudeman's 1993 paper "Representing Type Information in Dynamically Typed Languages"[1], which includes low-bits tagging, high-bits tagging, and NaN-boxing.

The high bits let us tag more types, and can be used in conjunction with low bits tagging. Eg, we might use the low bits for GC marking.

[1]:https://web.archive.org/web/20170705085007/ftp://ftp.cs.indi...

monocasa Oct 20, 2025

Depends on the architecture. Top bit usage lets you do what the hardware thinks if as an 'is negative' check for very cheap on a lot of archs for instance.

themafia Oct 20, 2025

Is it a guarantee that a 16 byte object would be 16 byte aligned?

vidarh Oct 20, 2025

Not in general, but it is a guarantee a runtime where all allocation are 16 byte cons cells can choose to make quite trivially.

ksherlock Oct 20, 2025

For memory allocation, POSIX (posix_memalign) has been guaranteeing alignment since 2001. C11 added equivalent functionality (aligned_alloc). C++17 incorporated it (std::aligned_alloc) as well.

fweimer Oct 21, 2025

More importantly, C++17 no longer ignores alignment in dynamic memory allocation: https://en.cppreference.com/w/cpp/memory/new/operator_new

C++11 already had alignas, but it was not really integrated well.

bluGill Oct 20, 2025

If you implement malloc you can do that. The os generally gives you 4k (or other number in that range) at a time and malloc subdivides it.

language runtimes can call malloc whatever they want.

secondcoming Oct 20, 2025

In C++ you can force that with alignas(), I would imagine other low level languages offer something similar.

If you're using a custom allocator you'd have to enfore it yourself which should be fine since you have full control.

https://en.cppreference.com/w/cpp/language/alignas.html

sparkie Oct 21, 2025

C23 also has `alignas` and `alignof` (`_Alignas`/`_Alignof` in C11 with the lowercase as macros in stdalign.h), and also provides `aligned_alloc` and `free_aligned_size` in stdlib.

fweimer Oct 21, 2025

Dynamic languages usually come with their own memory manager. They can come up with their own alignment constraints. That being said, most contemporary (Linux) architectures require that malloc returns 16 byte alignned pointers. Some mallocs only promise this for allocations larger than 8 bytes, though (and I think the C standard was updated to permit that).

ComputerGuru Oct 20, 2025

No. It depends on the object.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous