MyObject can't be null. MyObject? can be null. Handling nullability as a special thing might help with the billion-dollar mistake without generating pressure to have a fully fleshed out ADT solution and everything downstream of that.
To people who would dismiss ADTs as a hard problem in terms of ergonomics: Rust makes it less miserable thanks to things like the question-mark shorthand and a bazillion trait methods. Languages like Haskell solve it with a monads + do syntax + operating overload galore. Languages like Scala _don't_ solve it for Result/Option in any fun way and thus are miserable on this point IMHO
Nullability is a good retro-fit, like Java's type erased generics, or the DSL technology to cram a reasonable short-distance network protocol onto the existing copper lines for telephones. But in the same way that you probably wouldn't start with type erased generics, or build a new city with copper telephone cables, nullability isn't worth it for a new language IMO.
- `Option<T>` and `Result<T,E>` at core;
- `?T` and `T!E` as type declaration syntax that desugars to them;
- and `.?` and `.!` operators so chains like `foo()?.bar()!.baz()` can be written and all the relevant possible return branches are inserted without a fuss.
Having `Option` and `Result` be simply normal types (and not special-casing "nullable") has benefits that are... obvious, I'd say. They're just _normal_. Not being special cases is great. Then, having syntactic sugars to make the very, _very_ common cases be easy to describe is just a huge win that makes correct typing more accessible to many more people by simply minimizing keystrokes.
The type declaration sugar is perhaps merely nice to have, but I think it really does change the way the average programmer is willing to write. The chaining operators, though... I would say I borderline can't live without those, anymore.
Chaining operators can change the SLOC count of some functions by as much as... say, 75%, if we consider a language like Go with it's infamous "if err not nil" clause that is mandated to spread across three lines.
And if you only regard parametricity as valuable rather than essential then you can choose to relax that and say OK, you're allowed to specialize but if you do then you're no longer parametric and the resulting lovely consequences go away, leaving it to the programmers to decide whether parametricity is worth it here.
I agree that maintaining parametricity or not is a design decision. However, recent languages that break it (e.g. Zig) don't seem to understand what they're doing in this regard. At least I've never seen a design justification for this, but I have seen criticism of their approach. Given that type classes and their ilk (implicit parameters; modular implicits) give the benefits of ad-hoc polymorphism while mantaining parametricity, and are well established enough to the point that Java is considering adding them (https://www.youtube.com/watch?v=Gz7Or9C0TpM), I don't see any compelling reason to drop parametricity.
It's interesting that effect system-ish ideas are in Zig and Odin as well. Odin has "context". There was a blog post saying it's basically for passing around a memory allocator (IIRC), which I think is a failure of imagination. Zig's new IO model is essentially pass around the IO implementation. Both capture some of the core ideas of effect systems, without the type system work that make effect systems extensible and more pleasant to use.
The ergonomics of scala for comprehensions are, in my mind, needlessly gnarly and unpleasant to use despite the semantics being the same
E.g. if you have a list finding function that returns X?, then if you give it a list of MyObject?, you don't know if you found a null element or if you found nothing.
It's still obviously way better than having all object types include the null value.
Common sum types allow you to get around this, because they always do this "mapping" intrinsically by their structure/constructors when you use `Either/Maybe/Option` instead of `|`. However, it still doesn't always allow you to distinguish after "mixing" various optionalities - if find for Maps, Lists, etc all return `Option<MyObj>` and you have a bunch of them, you also don't know which of those it came from. This is often what one wants, but if you don't, you will still have to map to another sum type like above. In addition, when you don't care about null/not found, you'll have the dual problem and you will need to flatten nested sum types as the List find would return `Option<Option<MyObj>>` - `flatten`/`flat_map`/similar need to be used regularly and aren't necessary with anonymous sum types that do this implicitly.
Both communicate similar but slightly different intent in the types of an API. Anonymous sum types are great for errors for example to avoid global definitions of all error cases, precisely specify which can happen for a function and accumulate multiple cases without wrapping/mapping/reordering. Sadly, most programming languages do not support both.
This is a problem with the signature of the function in the first place. If it's:
template <typename T>
T* FindObject(ListType<T> items, std::function<bool(const T&)> predicate)
Whether T is MyObject or MyObject?, you're still using nullpointers as a sentinel value; MyObject* Result = FindObject(items, predicate);
The solution is for FindObject to return a result type; template <typename T>
Result<T&> FindObject(ListType<T> items, std::function<bool(const T&)> predicate)
where the _result_ is responsible for the return value wrapping. Making this not copy is a more advanced exercise that is bordering on impossible (safely) in C++, but Rust and newer languages have no excuse for it inline fun <T> Sequence<T>.last(predicate: (T) -> Boolean): T {
var last: T? = null
var found = false
for (element in this) {
if (predicate(element)) {
last = element
found = true
}
}
if (!found) throw NoSuchElementException("Sequence contains no element matching the predicate.")
@Suppress("UNCHECKED_CAST")
return last as T
}
A proper option type like Swift's or Rust's cleans up this function nicely.The two are not equal, and only the second one evaluates to true when compared to a naked nil.
In languages such as OCaml, Haskell and Rust this of course works as you say.
fn find<T>(self: List<T>) -> (T, bool)
to express what you want.But exactly like Go's error handling via (fake) unnamed tuple, it's very much error-prone (and return value might contain absurd values like `(someInstanceOfT, false)`). So yeah, I also prefer language w/ ADT which solves it via sum-type rather than being stuck with product-type forever.
I guess if one is always able to construct default values of T then this is not a problem.
this is how go handles it;
func do_thing(val string) (string, error)
is expected to return `"", errors.New("invalid state")` which... sucks for performance and for actually coding.An Option type is a cleaner representation.
In theory, NULL is still a perfectly valid memory address it is just that we have decided on the convention that NULL is useful for marking a pointer as unset.
Many languages (including Odin) now have support for maybe/option types or nullable types (monads), however I am still not a huge fan of them in practice as I rarely require them in systems-level programming. I know very well this is a "controversial" opinion, but systems-level programming languages deal with memory all the time and can easily get into "unsafe" states on purpose. Restricting this can actually make things like custom allocators very difficult to implement, along with other things.
n.b. Odin's `Maybe(^T)` is identical to `Option<&T>` in Rust in terms of semantics and optimizations.
It's a trade-off in design which is not immediately obvious. If you want to make pointers not have a `nil` state by default, this requires one of two possibilities: requiring the programmer to test every pointer on use, or assume pointers cannot be `nil`. The former is really annoying, and the latter requires something which I did not want to (which you will most likely not agree with just because it doesn't _seem_ like a bad thing from the start): explicit initialization of every value everywhere.
The first option is solved with `Maybe(^T)` in Odin, and that's fine. It's actually rare it is needed in practice, and when it is needed, it's either for documenting foreign code's usage of pointers (i.e. non-Odin code), or it's for things which are not pointers at all (in Odin code).
The second option is a subtle one: it forces a specific style and architectural practices whilst programming. Odin is designed around two things things: to be a C alternative which still feels like C whilst programming and "try to make the zero value useful", and as such, a lot of the constructs in the language and core library which are structured around this. Odin is trying to be a C alternative, and as such it is not trying to change how most C programmers actually program in the first place. This is why you are allowed to declare variables without an explicit initializer, but the difference to that of C is that variables will be zero-initialized by default (you can do `x: T = ---` to make it uninitialized stack memory, so it is an opt-in approach).
Fundamentally this idea of explicit individual-value based initialization everywhere is a viral concept which does lead to what I think are bad architectural decisions in the long run. Compilers are dumb---they cannot do everything for you, especially know the cost of the architectural decisions throughout your code, which requires knowing the intent of your code. When people argue that a lot of the explicit initialization can be "optimized" out, this is only thinking form a local position of individual values, which does total up to being slower in some cases.
To give an example of what I mean, take `make([]Some_Struct, N)`, in Odin, it just zeroes the memory because in some cases, it is literally free (i.e. `mmap` must zero). However, when you need to initialize each value of that slice, you are not turning a O(1) problem into a O(N) problem. And it can get worse if each field in the struct also needs its own form of construction.
But as I said in my original comment, I do not think the `nil` pointer problem, especially in Odin since it has other array constructs, is actually an empirical problem in practice. I know a lot of people want to "prove" things whatever they can at compile-time, but I still have to test a lot of code in the first place, and for this very very specific example, it is a trivial one to catch.
P.S. This "point" has been brought up a lot before, and I do think I need to write an article on the topic explaining my position because it is a bit tiring rewriting the same points out each time.
P.P.S. I also find this "gotcha" people bring up is the most common one because it is _seems_ like an obvious "simple" win, and I'd argue it's the exact opposite of either "simple" and even a "win". Language designing is all about trade-offs and compromises as there is never going to be a perfect language for anyone's problem. Even if you designed the DSL for your task, you'll still have loads of issues, especially with specific semantics (not just syntax).
This is true. However, you have done these fixes after noticing them at runtime. This means that you have solved the null problem for a certain control + data state in code but you don't know where else it might crop up again. In millions of lines of code, this quickly becomes a whack-a-mole.
When you use references in Rust, you statically prove that you cannot have null error in a function for all inputs the function might get if you use Rust style references. This static elimination is helpful. Also you force programmers to distinguish between &T and Option<&T> and Result<&T,E> -- all of which are so common in system's code.
Today it is safe to assume that a byte is 8 bits. Similarly it is safe to assume that the first page in virtual memory is non-readable and non-writable -- why not make use of this fore knowledge ?
> This is related to the drunkard’s search principle (a drunk man looks for his lost keys at night under a lamppost because he can see in that area).
This is a nice example and I do agree in spirit. But then I would offer the following argument: Say, a problem (illegal/bad virtual address) is caused 60% by one culprit (NULL dereference) and 40% by a long tail of culprits (wrong virtual memory address/use after free etc). One can be a purist and say "Hey, using Rust style references" only solves the 60% case, addresses can be bad for so many other reasons ! Or one can pragmatic and try to deal with the 60%.
I cringe every time I see *some_struct in Linux kernel/system code as function argument/return. Does NULL indicate something semantically important ? Do we need to check for NULL in code that consumes this pointer ? All these questions arise every time I see a function signature. Theoretically I need to understand the whole program to truly know whether it is redundant/necessary to check for NULL or not. That is why I like what Rust and Zig do.
But to answer your general points here: Odin is a different language with a different way of doing things compared to others, so their "solutions" to specific "problems" do not usually apply to a language like Odin.
The problem with solving the "60% case" means you now have introduced a completely different way of programming, which might solve that, but the expense of so many other cases. It's a hard language design question and people focusing on this specific case have not really considered how it effects anything else. Sadly, language features and constructs are rarely isolated from other things, even the architecture of the code the programmer writes.
As for C code, I agree it's bad that there is no way to know if a pointer uses NULL to indicate something or not, but that's pretty much not a problem in Odin. If people want to explicitly state that, they either use multiple return values, which is much more common in Odin (which is akin to Result<&T, E> in Rust, but of course not the same for numerous reasons), or they use `Maybe(^T)` (akin to Option<&T> in Rust).
I understand the problems that C programmers face, I am one, which is why I've tried to fix a lot of them without trying to be far from the general C "vibe".
> It's a hard language design question and people focusing on this specific case have not really considered how it effects anything else. Sadly, language features and constructs are rarely isolated from other things, even the architecture of the code the programmer writes
And my suggestion is that Rust has got the balance right when it comes to pointers. I can use the traditional unsafe nullable pointer *SomeStruct when it can be null and use &SomeStruct when it cannot be NULL in Rust. Initialization can be a bit painful in Rust but the wins are worth it. Yes, initialization can be less efficient in Rust but then most of the time spent is spend in algorithms when they run, not during initialization of the data structure.
Rust has needless complexity when it comes to asynchronous programming. The complexity is off the charts and the language just becomes unusable. But the non-async subset of Rust feels consistent and well engineered from a programming theory perspective.
In summary, Rust has not compromised itself by using non-null references as the default pointer type and neither will Odin, if it takes a different approach towards references. Take the example of OCaml - it also takes a very principled approach towards NULL. OTOH Java suffers from NULL problems as every object could be null in disguise.
Nevertheless Odin is a remarkably clean and powerful language and I'm following its progress closely ! Thanks for building it !
Rust was designed from day-0 around explicit individual-value based initialization. Odin from day-0 was not designed around this (explicitly).
This ever so minor choice might not seem like a big thing to you, as you have stated in your comment, but it leads to MASSIVE architectural decisions later on when the user programs.
Odin could not "just add" non-nil pointers and it be "fine". It would actually not be Odin any more, and the entire language would not even be a C alternative any more, and feel much more like C++ or Rust. Rust and OCaml (which Rust is based off) are different kinds of languages to Odin and their approach does not translate well to what Odin (or C) is trying to do.
Unfortunately I will not be able to explain this to you in a comment or article, and it is something that takes a while to understand since it is a subtle effect of locality affecting globality. The best video on this might be from Casey Muratori: https://www.youtube.com/watch?v=xt1KNDmOYqA
> Yes, initialization can be less efficient in Rust but then most of the time spent is spend in algorithms when they run, not during initialization of the data structure.
Sadly this isn't as true as you think it is. I've written a lot of C++ code before and doing its boilerplate for ctors/dtors actually leads to much slower code in general, and I'd argue this does apply to Rust too. Most of the time isn't necessarily spent in "algorithms", especially when "algorithms" also include initialization of values. You've turned something which could be O(1) into at best O(N), which does not help when things scale, especially with "algorithms".
see, this seems like something that's nice to actually put into the types; a Ptr<Foo> is a real pointer that all the normal optimizations can be done to, but cannot be null or otherwise invalid, and UnsafePtr makes the compiler keep its distance and allows whatever tricks you want.
In other words: after all this time, I feel that Tony Hoare framing null references as the billion-dollar mistake may be overselling it at least a little. Making references not nullable by default is an improvement, but the same problem still plays out so as long as you ever have a situation where the type system is insufficient to be able to guarantee the presence of a value you "know" must be there. (And even with formal specifications/proofs, I am not sure we'll ever get to the point where is always feasible to prove.) The only real question is how much of the problem is solved by not having null references, and I think it's less than people acknowledge.
(edit: Of course, it might actually be possible to quantify this, but I wasn't able to find publicly-available data. If any organization were positioned to be able to, I reckon it would probably be Uber, since they've developed and deployed both NullAway (Java) and NilAway (Go). But sadly, I don't think they've actually published any information on the number of NPEs/panics before and after. My guess is that it's split: it probably did help some services significantly reduce production issues, but I bet it's even better at preventing the kinds of bugs that are likely to get caught pre-production even earlier.)
The NaNs are, as their name indicates, not numbers. So the fact this 32-bit floating point value parameter might be NaN, which isn't even a number, is as unhelpful as finding that the Goose you were passed as a parameter is null (ie not actually a Goose at all)
There's a good chance you've run into at least one bug where oops, that's NaN and now the NaN has spread and everything is ruined.
The IEEE NaNs are baked into the hardware everybody uses, so we'll find it harder to break away from this situation than for the Billion Dollar Mistake, but it's clearly not a coincidence that this type problem occurs for other types, so I'd say Hoare was right on the money and that we're finally moving in the correct direction.
A language with non-nullability-by-default in its reference types is no worse than a language with no null. I say this because, again, there will always be situations where you may or may not have a value. For example, grabbing the first item in a list; the list may be empty. Even if you "know" the list contains at least one item, the compiler does not. Even if you check the invariant to ensure that it is true, the case where it is false may be too broken to handle and thus crashing really is the only reasonable thing to do. By the time the type system has reached its limits, you're already boned, as it can't statically prevent the problem. It doesn't matter if this is a nullable reference or if its an Option type.
Because of that, we're not really comparing languages that have null vs languages that don't. We're comparing languages that have references that can be non-nullable (or functionally equivalent: references that can't be null, but optional wrapper types) versus languages that have references that are always nullable. "Always nullable" is so plainly obviously worse that it doesn't warrant any further justification, but the question isn't whether or not it's worse, it's how much worse.
Maybe not a billion dollars worse after all.
P.S.: NaN is very much the same. It's easy to assign blame to NaN, and NaN can indeed cause problems that wouldn't have existed without it. However, if we had signalling NaNs by default everywhere, I strongly suspect that we would still curse NaN, possibly even worse. The problem isn't really NaN. It's the thing that makes NaN necessary to begin with. I'm not defending null as in trying to suggest that it isn't involved in causing problems, instead I'm suggesting that the reasons why we still use null are the true root issue. You really do fix some problems by killing null, but the true root issue still exists even after.
Maybe(T) would be for my own internal code. I would need to wrap/unwrap Maybe at all interfaces with external code.
In my view a huge value addition from plain C to Zig/Rust has been eliminating NULL pointer possibility in default pointer type. Odin makes the same mistake as Golang did. It's not excusable IMHO in such a new language.
Even ignoring the practical consequences, this means the programmer probably doesn't understand what their code does, because there are unstated assumptions all over the codebase because their type system doesn't do a good job of writing down what was meant. Almost might as well use B (which doesn't have types).
That's true. Which is why I wrote "IMHO" i.e. "In My Humble Opinion".
I'm not looking to create any controversy. _I_ don't like Odin's treatment of NULL. If you're cool with it, that's fine too.
`Maybe` does exist in Odin. So if you want a `nil` string either use `Maybe(string)` or `cstring` (which has both `nil` (since it is a pointer) and `""` which is the empty string, a non-nil pointer). Also, Odin doesn't have "interface"s since it's a strictly imperative procedural language.
As for you question about base values, I am not sure what you mean by this. Odin hasn't got a "everything is a pointer internally" approach like many GC'd languages. Odin follows in the C tradition, so the "base value" is just whatever the type is.
I have an important metric for new "systems" languages: does the language allow NULL for it's commonly used pointer type. Rust by default does not (References may _not_ be NULL). Here is where I think Odin makes a mistake.
In the linked blog post Odin mentions ^os.File which means pointer to File (somewhat like *FILE in C). Theoretically the pointer can be NULL. In practice, maybe I would need to check for NULL or maybe I would not (I would have to scan the Odin function's documentation to see what the contract is).
In Rust, if a function returns &File or &mut File or Box<File> etc. I know that it will never be NULL.
So Odin repeats the famous "billion-dollar mistake" (Tony Hoare). Zig in comparison is bit more fussy about NULL in pointers so it wins my vote.
Currently this is my biggest complaint about Odin. While Odin packages a lot of power programming idioms (and feels a bit higher level and ergonomic than Zig) it makes the same mistake that Golang, C and others make regarding allowing NULL in the default pointer type.