Profile: thomasmg - Hacker Neue

thomasmg

Joined Jun 19, 2018 520 karma

Original author of Hypersonic SQL and H2 database (Java relational databases): h2database.com

thomasmg Jan 3, 2026 parent

Yes. I did the exact same thing a few weeks ago (in Java, but the idea is to port it to my own programming language). I wrote a terminal UI. Lots of small challenges. Many things to learn, like minimax, bitboards, immutable vs mutable.
thomasmg Dec 11, 2025 parent

Heap sort is in-place (and does not even need recursion, unlike quicksort). But yes, it is not stable, and usually slower than even shell sort (except for very large arrays).
thomasmg Dec 11, 2025 parent

I'm surprised that the simple, ~80 lines version of stable-in-place merge sort (see link in the above comments) is not more widely known. It is O(n log n log n) and not all that hard to implement.
thomasmg Dec 11, 2025 parent

There is stable in-place merge sort [1], which is O(n*log(n)^2) and not that complex or hard to implement (about 80 lines, and that includes the ~15 lines of binarySearch, which you might need anyway).
[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...
thomasmg Dec 11, 2025 parent

There is stable in-place merge sort, it runs in O(n*log(n)^2). It is about 3 times more complex than shell sort. I implemented it here https://github.com/thomasmueller/bau-lang/blob/main/src/test... (most sort algos you mentioned above are in the same direcory btw)
You didn't mention heap sort. A simple implementation, which doesn't do any method calls just like shell sort (also next to the merge sort above) is about twice as complex than shell sort.
thomasmg Nov 12, 2025 parent

Yes I know Java and the challenges with exceptions there (checked vs unchecked exceptions, errors). But at least (arguably) in Java, the methods (for checked exceptions at least) declares what class the exception / exceptions is. I personally do not think wrapping exceptions in other exception types, in Java, is a major problem. In Swift, you just have "throws" without _any_ type. And so the caller has to be prepared for everything: a later version of the library might suddenly return a new type of exception.
One could argue Rust is slightly better than Java, because in Rust there are no unchecked exceptions. However, in Rust there is panic, which is in a way like unchecked exceptions, which you can also catch (with panic unwinding). But at least in Rust, regular exceptions are fast.
thomasmg Nov 12, 2025 parent

I saw that in Swift, a method can declare it throws an exception, but it doesn't (can't) declare the exception _type_. I'm not a regular user of Swift (I usually use Java - I'm not sure what other languages you are familiar with), but just thinking about it: isn't it strange that you don't know the exception type? Isn't this kind of like an untyped language, where you have to read the documentation on what a method can return? Isn't this a source of errors itself, in practise?
thomasmg Nov 8, 2025 parent

I agree security issues are often hyped nowadays. I think this is often due to two factors: (A) security researches get more money if they can convince people a CVE is worse. So of course they make it sound extremely bad. (B) security "review" teams in software companies do the least amount of work, and so it's just a binary "is a dependency with a vulnerability used yes/no" and then force the engineering team to update the dependency, even thought its useless. I have seen (was involved) in a number of such cases. This is wasting a lot of time. Long term, this can mean the engineering team will try to reduce the dependencies, which is not the worst of outcomes.
thomasmg Nov 7, 2025 parent

> then the sort order there is indeterminate
Well each programming language has a "sort" method that sorts arrays. Should this method throw an exception in case of NaN? I think the NaN rules were the wrong decision. Because of these rules, everywhere there are floating point numbers, the libraries have to have special code for NaN, even if they don't care about NaN. Otherwise there might be ugly bugs, like sorting running into endless loops, data loss, etc. But well, it can't be changed now.
The best description of the decision is probably [1], where Stephen Canon (former member of the IEEE-754 committee if I understand correctly) explains the reasoning.
[1] https://stackoverflow.com/questions/1565164/what-is-the-rati...
thomasmg Nov 7, 2025 parent

Well floating point operations never throw an exception, which I kind of like, personally. I would rather go in the opposite direction and change integer division by zero to return MAX / MIN / 0.
But NaN could be defined to be smaller or higher than any other value.
Well, there are multiple NaN. And NaN isn't actually the only weirdness; there's also -0, and we have -0 == 0. I think equality for floating point is anyway weird, so then why not just define -0 < 0.
thomasmg Nov 7, 2025 parent

Well I didn't mean GC is the reason for Fil-C being slower. I mean the performance drop of Fil-C (as described in the article) limits the usage, and the GC (independently) limits the usage.
I understand raw speed (of the main thread) of Fil-C can be faster with tracing GC than Fil-C without. But I think there's a limit on how fast and memory efficient Fil-C can get, given it necessarily has to do a lot of things at runtime, versus compile time. Energy usage, and memory usage or a programming language that uses a tracing GC is higher than one without. At least, if memory management logic can be done at compile time.
For Fil-C, a lot of the memory management logic, and checks, necessarily needs to happen at runtime. Unless if the code is annotated somehow, but then it wouldn't be pure C any longer.
thomasmg Nov 7, 2025 parent

I find floating point NaN != NaN quite annoying. But this is not related to Rust: this affects all programming languages that support floating point. All libraries that want to support ordering for floating point need to handle this special case, that is, all sort algorithms, hash table implementation, etc. Maybe it would cause less issues if NaN doesn't exist, or if NaN == NaN. At least, it would be much easier to understand and more consistent with other types.
thomasmg Nov 7, 2025 parent

Sure, it's not a big performance drain. For the vast majority of software, it is fine. Usually, the ability to write programs more quickly in eg. Java (not having to care about memory management) outweighs the possible gain of Rust that can reduce memory usage, and total energy usage (because no background thread are needed for GC). I also write most software in Java. Right now, the ergonomics of languages that don't require tracing GC is just too high. But I don't think this is a law of nature; it's just that there a now better languages yet that don't require a tracing GC. The closest is probably Swift, from a memory / energy usage perspective, but it has other issues.
thomasmg Nov 7, 2025 parent

I'm not sure what you mean. Do you mean there is a bug _in the garbage collection algorithm_, if the object is not freed in the very next garbage collection cycle? Well, it depends: the garbage collection could defers collection of some objects until memory is low. Multi-generation garbage collection algorithm often do this.
thomasmg Nov 7, 2025 parent

Ok, you are right. My point is, yes it is possible to panic on use-after-free with Fil-C. With Fil-C, a life reference to a freed object can be detected.
thomasmg Nov 7, 2025 parent

You are right, languages with tracing GC are fast. Often, they are faster than C or Rust, if you measure peak performance of a micro-benchmark that does a lot of memory management. But that is only true if you just measure the speed of the main thread :-) Tracing garbage collection does most of the work in separate threads, and so is often not visible in benchmarks. Memory usage is also not easily visible, but languages with tracing GC need about twice the amount of memory than eg. C or Rust. (When using an area allocator in C, you can get faster, at the cost of memory usage.)
Yes, Python is specially slow, but I think it's probably more because it's dynamically typed, and not not compiled. I found PyPy is quite fast.
thomasmg Nov 7, 2025 parent

I agree transpile to C will not result in the fastest code (and of course not the fastest toolchain), but having the ability to convert to C does help in some cases. Besides the ability to support some more obscure targets, I found it's useful for building a language, for unit tests [1]. One of the targets, in my case, is the XCC C compiler, which can run in WASM and convert to WASM, and so I built the playground for my language using that.
> transpiling to C (even Go and Lua)
Go: I'm sorry, I thought TinyGo internally converts to C, but it turns out that's not true (any more?). That leaves https://github.com/opd-ai/go2c which uses TinyGo and then converts the LLVM IR to C. So, I'm mistaken, sorry.
Lua: One is https://github.com/davidm/lua2c but I thought eLua also converts to C.
> Your only forced to use it when you’re storing references within a struct.
Well, that's quite often, in my view.
> Not sure when the last time was you tried to write rust code.
I'm not a regular user, that's true [2]. But I do have some knowledge in quite many languages now [3] and so I think I have a reasonable understanding of the advantages and disadvantages of Rust as well.
> Any language targeting the performance envelope rust does needs GC to be opt in.
Yes, I fully agree. I just think that Rust has the wrong default: it uses single ownership / borrowing by _default_, and RC/Arc is more like an exception. I think most programs could use RC/Arc by default, and only use ownership / borrowing where performance is critical.
> The main disadvantage of Rust, in my view, is that it's verbose. >> Sure, but compared to what?
Compared to most languages, actually [4]. Rust is similar to Java and Zig in this regard. Sure, we can argue the use case of Rust is different than eg. Python.
[1] https://github.com/thomasmueller/bau-lang [2] https://github.com/thomasmueller/lz4_simple [3] https://github.com/thomasmueller/bau-lang/tree/main/src/test... [4] https://github.com/thomasmueller/bau-lang/blob/main/doc/conc...
thomasmg Nov 7, 2025 parent

Yes. I do like Swift as a language. The main disadvantages of Swift, in my view, are: (A) The lack of an (optional) "ownership" model for memory management. So you _have_ to use reference counting everywhere. That limits the performance. This is measurable: I converted some micro-benchmarks to various languages, and Swift does suffer for the memory managment intensive tasks [1]. (B) Swift is too Apple-centric currently. Sure, this might be become a non-issue over time.
[1] https://github.com/thomasmueller/bau-lang/blob/main/doc/perf...
thomasmg Nov 7, 2025 parent

Yes, Fil-C uses some kind of garbage collector. But it can still detect use-after-free: In the 'free' call, the object is marked as free. In the garbage collection (in the mark phase), if a reference is detected to an object that was freed, then the program panics. Sure, it is also possible to simply ignore the 'free' call - in which case you "just" have a memory leak. I don't think that's what Fil-C does by default however. (This would be more like the behavior of the Boehm GC library for C, if I understand correctly.)
thomasmg Nov 7, 2025 parent

I don't think it's odd statement. It's not about segfaults, but use-after-free (and similar) bugs, which don't crash in C, but do crash in Fil-C. With Fil-C, if there is such a bug, it will crash, but if the density of such bugs is low enough, it is tolerable: it will just crash the program, but will not cause an expensive and urgent CVE ticket. The bug itself may still need to be fixed.
The paragraph refers to detecting such bugs during compilation versus crashing at runtime. The "almost all programs have paths that crash" means all programs have a few bugs that can cause crashes, and that's true. Professional coders do not attempt to write 100% bug-free code, as that wouldn't be efficient use of the time. Now the question is, should professional coders convert the (existing) C code to eg. Rust (where likely the compiler detects the bug), or should he use Fil-C, and so safe the time to convert the code?
thomasmg Nov 7, 2025 parent

Well "transpiling to C" does include GCC and clang, right? Sure, trying to support _all_ C compilers is nearly impossible, and not what I mean. Quite many languages support transpiling to C (even Go and Lua), but in my view that alone is not sufficient for a C replacement in places like the Linux kernel: for this to work, tracing GC can not be used. And this is what prevents Fil-C and many other languages to be used in that area.
Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical. Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose. (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
thomasmg Nov 7, 2025 parent

There are surprisingly many languages that support transpiling to C: Python (via Cython), Go (via TinyGo), Lua (via eLua), Nim, Zig, Vlang. The main advantage (in my view) is to support embedded systems, which might not match your use case.
thomasmg Nov 7, 2025 parent

I agree. Nim is memory safe, concise, and fast. In my view, Nim lacks a very clear memory management strategy: it supports ARC, ORC, manual (unsafe) allocation, move semantics. Maybe supporting viewer options would be better? Usually, adding things that are lacking is easier than removing features, specially if the community is small and if you don't want to alienate too many people.
thomasmg Nov 7, 2025 parent

Yes, safety got more important, and it's great to support old C code in a safe way. The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
But all existing programming languages seem to have some disadvange: C is fast but unsafe. Fil-C is C compatible but requires GC, more memory, and is slower. Rust is fast, uses little memory, but us verbose and hard to use (borrow checker). Python, Java, C# etc are easy to use, concise, but, like Fil-C, require tracing GC and so more memory, and are slow.
I think the 'perfect' language would be as concise as Python, statically typed, not require tracing GC like Swift (use reference counting), support some kind of borrow checker like Rust (for the most performance critical sections). And leverage the C ecosystem, by transpiling to C. And so would run on almost all existing hardware, and could even be used in the kernel.
thomasmg Nov 6, 2025 parent

Pypy is great for performance. I'm writing my own programming language (that transpiles to C) and for this purpose converted a few benchmarks to some popular languages (C, Java, Rust, Swift, Python, Go, Nim, Zig, V). Most languages have similar performance, except for Python, which is about 50 times slower [1]. But with PyPy, performance is much better. I don't know the limitations of PyPy because these algorithms are very simple.
But even thougt Python is very slow, it is still very popular. So the language itself must be very good in my view, otherwise fewer people would use it.
[1] https://github.com/thomasmueller/bau-lang/blob/main/doc/perf...
thomasmg Nov 5, 2025 parent

Interesting, I was not aware of OxCaml. (If this is what you mean.) It does seem to tick a few boxes actually. For my taste, the syntax is not as concise / clean as Python, and for me, it is "too functional". It shares with Python (and many high-level languages) the tracing garbage collection, but maybe that is the price to pay for an easy-to-use memory safe language.
In my view, a "better" language would be a simple language as concise as Python, but fully typed (via type inference); memory safe, but without the need of tracing GC. I think memory management should be a mix of Swift and Rust (that is, a mix of reference counting and single ownership with borrowing, where need for speed).
thomasmg Nov 5, 2025 parent

Yes, there are quite many real-world cases of this architecure. But, wouldn't it be better it the same language (more or less) can be used for both? I don't think such a language exists currently, but I think it would be a nice goal.
thomasmg Nov 3, 2025 parent

> the language most of the team know the best
I fully agree. The challenge is, some will want to use the latest languages and technologies because they want to learn it (personal development, meaning: the next job). Sometimes the "new thing" can be limited to (non-critical) testing and utilities. But having many languages and technologies just increases the friction, complicates things, and prevents refactoring. Even mixing just scripts with regular languages is a problem; calling one language from another is similar. The same with unnecessary remote APIs. Less technologies is often better, even if the technologies are not the best (eg. using PostgreSQL for features like fulltext search, event processing, etc.)
This is a bit related to external dependencies vs build yourself (AKA reinvent the wheel). Quite often the external library, long term, causes more issues than building it yourself (assuming you _can_ build a competent implementation).
thomasmg Nov 3, 2025 parent

What one considers a "good bread" or "good bakery" depends on the person. I'm from Switzerland. When I was in the United States (Bay Area, San Francisco), in 2000-2003, I did _not_ find what I consider a "good bread". I did find "bakery".
thomasmg Nov 2, 2025 parent

The author of Fil-C does have some ideas to avoid a garbage collector [1], in summary: Use-after-free at worst means you might see an object of the same size, but you can not corrupt data structures (no pointer / integer confusion). This would be more secure than standard C, but less secure than Fil-C with GC.
[1] https://x.com/filpizlo/status/1917410045320650839

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous