Preferences


I've been using this recently in WASM, in particular for the counters feature. It's really great and it makes it super easy to track and follow the evolution of your app's memory!

In my console, I have something akin to this:

  TRACE client_wasm::plugins::allocation: Memory stats counters=Counters { allocation_count: 165454, total_allocation_count: 18756119, allocated_bytes: 34654828, total_allocated_bytes: 3185258585, available_bytes: 82802636, fragment_count: 5026, heap_count: 1, total_heap_count: 1, claimed_bytes: 118423552, total_claimed_bytes: 118423552 }
I haven't carefully benchmarked dlmalloc (Rust's default WASM allocator, https://github.com/alexcrichton/dlmalloc-rs), but it's nothing special (to my knowledge). The swap to Talc is pretty trivial and it's clear that the author is paying attention to its performance.
[Author of talc] Glad this feature is proving useful. Seeing this makes me think I should implement a better-looking Display implementation than the default Debug impl though. Something for the next update ^-^
Thats a pretty big heap size for a wasm bundle! What are you doing in wasm that allocates so much and so often?
34 MB? Doesn't seem that big to me.
At least I was also confused too because I was looking at total_allocated_bytes, but I guess that includes allocations that have been freed.
Added a new issue [1] to add TLSF to the benchmarks as it's likely going to be faster in a single-threaded environment according to the rlsf crate [2].

[1] https://github.com/SFBdragon/talc/issues/26 [2] https://github.com/yvt/rlsf

Thanks for opening the issue. The allocator looks pretty interesting. Happy to try add it to the benchmarks, although doing apples-to-apples tests with its limitations might not be possible without some changes.
Some extra context for comparison: Talc is faster than Frusa when there is no contention, but slower when there are concurrent allocations. Both are much slower than Rust's system allocator. Benchmark here: https://crates.io/crates/frusa.
Your results caught me off guard. Particularly, the (linux) system allocator is too fast. I think the simplicity of the benchmark (allocating and immediately deallocating) might be causing issues... perhaps unwanted optimizations? I'm not sure.

On my random actions benchmarks (this resembles real allocation patterns somewhat better?):

- 1 thread: Talc is faster than Frusa and System, Frusa is comparable to System

- 4 threads: System is fastest, Frusa does about ~half as well, Talc does ~half as well as Frusa

Our benchmarks agree on the Frusa vs Talc comparison.

Benchmarks aside, Frusa seems neat. In particular, I had some misconceptions about how to tackle concurrency in Talc which Frusa's code demonstrates not to be true. I may give writing a concurrent version of Talc another shot soon.

Apologies, the benchmark is fine. The reason the system allocator is faster than I expected is because Linux's slab allocator takes over for especially small allocation sizes, and it's terrifically fast.

I'm changing up my random-actions benchmark to display results over various allocation sizes, as some allocators do much better than others at different sizes. As a heads up, Frusa takes a large hit at higher allocation sizes. Perhaps tuning bucket sizes or something could help? I'll try to have the benchmarks on GitHub this weekend so you can play around with them, if you'd like to investigate.

As a guy who lives in the JVM most days and mostly ignores allocation optimizations, what are some examples of things that are actually newer features in such a project? Isn't allocation a mostly solved problem? Is something about WebAssembly or no_std actually requiring different features?
In a no_std Rust environment, there is no allocator. There is no heap. So an allocator is needed to use things like Vectors or Strings.

This is very common in embedded contexts, where you can take nothing for granted.

What's the point of using `no_std` if you're just going to add an allocator anyway? You may as well just use `std` at that point no? (You can use `std` on embedded devices with a small amount of work.)
To get std you need some kind of libc replacement. You don't need that if you just want to use alloc.
Yeah, just adding, in Rust, there's three main levels:

1. no_std (like no libc + no malloc in C)

2. no_std + alloc (like no libc + malloc in C)

3. std (like a full libc + malloc in C)

The difference between 1 + 2 is like three lines. The difference between 2 + 3 is a change to the entire standard library. ATM only ESP32 devices support option 3 (they build a standard library implementation on top of FreeRTOS/ESP-IDF).

The `core` library is still available on `no_std` and contains a lot of useful stuff, so it’s not exactly like no libc on C. That would be `no_core` which is pretty hardcore (heh). The big things missing in `no_std` are

* file and other I/O, including filesystem ops

* access to system time

* threads

* collections and some other things that require an allocator (not many things actually do in Rust’s stdlib!)

* floating-point functions (the types themselves and builtin operators work fine)

`alloc` gives you `Vec`, `String`, `Box`, `BTreeMap/Set`, ref-counted pointers, and a few `Vec`-derived collections like `VecDeque`. Very annoyingly not `HashSet/Map` though, due to a literally single-line dependence on a system entropy source which happens to not be easily factorable out because reasons.

You don't. It currently requires `-Z build-std=std,panic_abort` and some nightly flags (e.g. `#![feature(restricted_std)]`) but you can build `std` programs on bare metal targets. I can't remember exactly what it does if you try to open files or start threads or whatever (probably panics?) but you can compile and run it. If you don't do any of those things it works fine.

Currently the `sys` crate implementation is hard-coded into the compiler but eventually you will be able to provide it without modifying the compiler so you can e.g. target a RTOS or whatever.

It looks like that work started really recently actually:

https://github.com/rust-lang/rust/commit/99128b7e45f8b95d962...

Maybe you're implementing your own "std".
Small binary size (and thus a simpler implementation) are the min things you want for wasm and embedded targets. For wasm this is because its being sent over the network, for embedded because the device may have low amounts of available ram and storage.

It's also the case that such targets often can't take advantage of the advanced features (like multithreading optimisations) that "full fat" allocator provide.

An allocator and deallocator go hand in hand. For the JVM, most of the GCs are compacting so they can use bump allocators. In that case, yes, it is a solved problem. However, it depends on the alloator/deallocator pair and traditional malloc/free implementations involve many trade offs, so allocation work continues.
It's only a solved problem if you can live with the downsides of automatic memory management. For instance, do you know exactly how much time your garbage collector spends with memory management and when exactly that time is spent, where objects are located in memory relative to each other, and how much time is wasted on cache misses when accessing those objects? If those questions are not important, then automatic memory management is good enough. For other applications those questions may be more important, and automatic memory management is then usually harder to optimise than coming up with specialised manual allocation strategies.
I am also living with GC (mostly .net and go) but recently took a look at Zig.

Zig having allocators totally opaque is interesting and you get to learn a lot if you dig deeper. Tons of different designs. Allocators on top of allocators. Tiny ones for small bundles. Non deallocating ones for one shot apps. Stuff you never care about in a GC environment.

I suggest checking it out if it interests you.

I’d say it’s solved if you want a garbage collector. If you don’t, then there is plenty of room for innovation
Why isn't it benchmarked against the vanilla glibc allocator? Is it similar enough to dlmalloc that the differences don't matter?
Because the glibc allocator is designed for hosted systems with threading (uses pthreads) and memory management utilities not found on bare metal/other smaller platforms. You shouldn't be using Talc where MiMalloc, Jemalloc, the glibc allocator, etc. would be used instead, besides some very particular situations. (Correct me if I'm wrong.)

I could add these benchmarks. They were there at one point in the past, but it's a disingenuous comparison unless the reader understands the particulars of the workload and the particulars of the tradeoffs each allocator makes. Talc will probably beat these allocators in single-threaded allocation, but will suffer under heavily multithreaded loads and does not currently have the system integrations to release unused blocks of memory back to the system (this can be achieved, to a degree via the OOM handler system, but I haven't yet implemented something like this), nor will it be making syscalls like mmap/sbrk at all.

There is the case where you'd want a faster single-threaded allocation pool within a larger application though, which is a case to be made for using Talc when you have access to the system allocator or mimalloc/jemalloc. Perhaps I'll set up something for that.

Pardon my ignorance, but is this something that a language with a GC could potentially use to run in a WebAssembly environment?
A GC language would be more fit to use Wasm-GC

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal