Pretty sure it is doable with consumer cameras, although of course matching the physical movement would be a lot harder. For instance, a Sony a7R IV has a 1/20s readout. And you see that with electronic shutter, because the camera scans from top to bottom. Which for video is bad. But that does mean that you can record 10fps full-frame compressed raw photos, over a horizontal resolution of 6336 pixels. So that would be an “acquisition rate” of 63khz.
The problem of course being that you need to shift the camera by one sensor width every tenth of a second, accurate to the pixel, if you want to make use of that full horizontal temporal resolution. And I’m not sure how you match together the 1/20s readout with all of that. So pessimistically, maybe only ~30khz.
Actually, did the math and if you can accept video compression, the video modes might be sufficient. 4K@30fps looks like ~64khz. And if you had a more capable video camera, that could be 4-8 times better.
The problem of course being that you need to shift the camera by one sensor width every tenth of a second, accurate to the pixel, if you want to make use of that full horizontal temporal resolution. And I’m not sure how you match together the 1/20s readout with all of that. So pessimistically, maybe only ~30khz.
Actually, did the math and if you can accept video compression, the video modes might be sufficient. 4K@30fps looks like ~64khz. And if you had a more capable video camera, that could be 4-8 times better.