A somewhat different approach was recently proposed here: https://www.hackerneue.com/item?id=44319427 but it seems to have non-trivial overhead. (Still very much worthwhile, given the potential advantages of deterministic cycle collection.) The paper you reference is quite a bit older so it would of course be interesting to do a proper comparison.
The talk for this paper came up on YouTube just the other day: https://www.youtube.com/watch?v=GwXjydSQjD8
I'll look at that.
About performance: people in practice have always favored GC, so I think there's a lot to be discovered in optimization of reference counting algorithms, including concurrent traversal (which is easier because each node has local info in the form of refcounts and flags) and maybe detection of problematic worse-case graphs
https://savannah.nongnu.org/p/alisp