My favorite example is something like Substance designer's node graph or Disney's SeExpr. You'd often want custom nodes that do often something trivial like a lookup from a custom data format or a small math evaluation, but you're calling the node potentially a handful of times per pixel, on millions of pixels. The calling overhead often comes out to take as much time or more than the operation, but there's no easy way to rearrange the operations without making things a lot more complicated for everyone.
I kind of like python's approach, make it so slow that it's easy to notice when you're hitting the bottleneck. Encourages you to write stuff that works in larger operations, and you get stuff like numpy and tensorflow which are some of the fastest things out there despite the slowest binding.
https://www.disneyanimation.com/technology/seexpr-expression...
Those commands and buffers are represented as C structs. If you're in a language that can't speak C structs (like Java, Go, Dart, JavaScript, etc...), all of those command & buffer setup become function calls rather than simple field writes.
I guess I wasn't clear, but I meant the difference between C and Luajit.
Vulkan. Any sort of binding to Vulkan over a non-trivial FFI (so like, not from C++, Rust, etc...) is going to be murdered by this FFI overhead cost. Especially since for bindings from something like Java you're either paying FFI on every field set on a struct, or you're paying non-trivial marshalling costs to convert from a Java class to a C struct to then finally call the corresponding Vulkan function.