Region inference is another strategy in this space. It can limit the need for full-blown garbage collection in many cases, but also comes with its own set of added trade-offs.
Reference counting is just a different kind of garbage collection, really. It acts like a dual construction to a tracing GC in many cases. If you start optimizing both, you tend to converge to the same ideas over time. Refcounting isn't void of e.g. latency problems either: if I have a long linked list and snip the last pointer, then we have to collect all of that list. That's going to take O(n) time in the size of the list. For that reason, you'd have to delay collecting the large list right away, which means you are converging toward a tracing GC that can work simultaneously with the mutator. See e.g., Go's garbage collector.
These latency issues are inherent to deterministic destruction, which is an often desirable feature otherwise; they have little to do with reference counting itself. In principle, they can be addressed by "parking" objects for which delayed disposal is non-problematic onto a separate, lower-priority task.
yeah one of the most helpful realizations I’ve read is that tracing and ref counting are essentially two formulations of the same problem - one is finding objects that are alive (by tracing), and the other is finding things that are dead (i.e. their ref counts reach zero). and of course, every object is either dead or alive!
Another solution is to make things immutable (like Erlang), or "as-if" immutable (like Koka), which guarantees that data can only point to things that have already been defined, preventing cycles.* Erlang uses this to simplify generational collection - because old data can't point to young data, it doesn't need a card table or anything like that.
I think it's perfectly possible to have a general purpose language without cycles: you can just use integer indices into an array instead of pointers if you want cyclic data structures. This is common in Rust, when people want to avoid the overhead of reference counting, but don't want to use unsafe code.
* A hidden assumption here is that the language is eagerly evaluated. There are languages like Haskell that have immutability and cyclic data structures.
Edit: I think the common idea with both solutions is that our objects have some weak order (the order in which their types were defined, and the time at which the object was created, respectively), and objects are only allowed to point to objects strictly less than them in this order.
-- circular linked list with one item
repeat x = let xs = x:xs in xs
Here's a rough equivalent in JavaScript that doesn't use thunks, just functions: function repeat(x) {
function xs() {
return [x, xs]; // [first, rest]
}
return xs;
}
The Haskell version has a cycle because, after evaluation, `repeat x` will be a circular linked list, but all the "lists" we create in the JavaScript code above are just the closure `xs`.For completeness, here's a JavaScript version that uses thunks:
class Thunk {
constructor(f) { this.f = f; }
get() { if (this.f) { this.v = (this.f)(); delete this.f; } return this.v; }
}
function repeat(x) {
let xs = new Thunk(() => {
return [x, xs]; // [first, rest]
});
return xs;
}
If you try calling `x = repeat(1); x.get()`, you can see that we get a circular list.In general, I think that cannot be done, but if one restricts what programs can do, solutions exist.
A simple way to do it is by requiring all references “pointing out of” an object to be set the moment the object is created, and be immutable afterwards (that’s what Lisp cons (https://en.wikipedia.org/wiki/Cons) does. Without setf or similar, lisp code cannot create cycles)
That disallows quite a ome code that modifies structures without introducing cycles, but still allows for quite some code to work.
One could also store an ‘age’ field with each object and check, when a reference is updated in an object, that it points to an object that is older than the one being modified. That gives some more leeway, at the price of using more (a lot more, in code using small objects) memory.
Another idea is to add a bit to each object “there are no cycles containing this object”, and have the runtime clear that when it no longer can guarantee that (edit: unfortunately, maintaining that invariant can be very costly. Whenever code does foo.field = bar, with both foo and bar known to be not part of a cycle, you still have to do a search through all objects reachable from bar to check whether a cycle was created and, if so, clear that bit in all objects in the cycle(s). That makes this idea impractical)
If, as I suspect happens in programming languages which are “mostly immutable”, there are many objects for which that flag stays set, that can significantly speed up checking for the creation of cycles.
For instance, a named lexical function can have itself in scope so that it can call itself recursively. This means that, as an object, it has a pointer to an environment, and that environment has an entry which contains that function itself: cycle.
If you deny cycles, that blows up at the starting line.
But if it was a simple scripting language and you needed that constraint, it’s relativity easy to implement.
It probably wouldn't be usable for a general-purpose programming language, but for a special-purpose scripting language I could see it making the language implementation easier.