I think the whole point of BOLT is that in practice, you can't get a good idea of where instructions came from.
And it's not even about instructions as much as control flow. LLVM, GCC, and other good compilers (like the ones I wrote for JSC) can and absolutely will fuck up the control flow graph for fun and profit. So if the point of the FDO is to create better code layout, then feeding the profiling samples into before when the compiler did its fuckery will put you up shit creek without a paddle: the basic blocks and branches that the profiler will be telling you about don't exist, and won't, until the compiler does its thing.
You could try to run the compiler forward until it recreates the control flow structure that the profiler is talking about, but that seems hella sketchy since at that point you're leaning on the compiler's determinism in a way that would make me (and probably others) uncomfortable. It would rule out running BOLT on binaries optimized with PGO and it would create super nasty requirements for build system maintainers.
Maybe the bigger problem is at what point do the profiles feed back. Since a compiler may generate many object files which are then linked to form the final binary you'd sort of maybe want to do this in the linker vs. earlier on.
I guess specifically with the kernel there's an extra layer of complexity. It looks like they use `perf` to record the profile which is cool. And then they apply the results to the binary which is also cool.