I share the full assembly code in the tasvideos writeup: https://tasvideos.org/8991S#HereSTheAsmCode
To summarize what I put in the writeup, the 7-bit PCM audio was streamed in at approximately 25 Khz, (reading from the controller and writing to address $4011 every 71 CPU cycles.) while occasionally dipping to 9 Khz while streaming in the graphics data.
My method of downsampling was complicated. Since the creation of the TAS was being automated, and since I also needed to stream in graphics data occasionally, I ran into the issue of needing to know exactly what byte to read from the .wav file at any given moment. I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
To be completely honest, this project was my first time directly reading the contents of a .wav file like this, and I had no prior experience writing code for audio conversion or playback. If I were to do this project again, I'd look into noise dithering + noise shaping, as well as filtering methods. I know at the very end of the TAS, there's certainly some weird audio artifacts that I couldn't figure out how to fix at the time.
As a very quick fix, you can dither by just adding a random value from [-0.5, +0.5] before rounding (to -64..+63 or whatever your range is). It will give you a dither, and probably sound slightly better; a bit more noise for much less distortion. Noise shaping is left as an exercise for the reader :-) (It is probably nontrivial to get perfect with variable sample rate anyway.)
> I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
It sounds like you are just picking one sample without any filtering/averaging/anything (nearest neighbor); this will cause aliasing, which is another part of the reason for the “roughness” you may hear in the sound. You can do a very cheap trick here as well: Take some audio software you trust (say, Audacity) and convert the .wav file to 25208 Hz. This means that you'll get good filtering for most of your audio, and less bad filtering for the 13.85kHz parts.
1. Get the game into a specific state by performing specific actions, moving to specific positions, performing specific inputs, etc. so that a portion of the game state in RAM happens to be an executable program.
2. Jump to that executable code such as by corrupting the return address in the stack with a buffer overflow
3. (optional) The program from 1 may be a simple "bootstrap" program which lets the player directly write a new, larger program using controller inputs then jumps to the new program.
4. The program reads the video and audio from the stream of controller inputs, decodes them, and displays them. The encoding is usually an ad-hoc scheme designed to take advantage of the available hardware. The stream of replayed inputs is computed directly from the media files.
The only "modification" is wholly external to the system, and is necessary to feed the controller inputs at a superhuman rate. The SMB1 (and SMB3) code is the exact same code Nintendo shipped on mask ROMs, and the Famicom (or NES) is also completely unmodified.
It's impressive what can be done if a lot of effort is put in.
[1]: https://somethingnerdy.com/unlocking-the-nes-for-former-dawn
Is it mentioned anywhere how big the payload is? How many button presses? Are the audio samples "streamed" or does it all fit in NES RAM?