> God knows what process led them to do video streaming for showing their AI agent work in the first place.
This was my first thought, too.Something I want to harp on because people keep saying this:
Video streaming is not complicated. Every youtuber and twitch streamer and influencer can manage it. By this I mean the actual act of tweaking your encoding settings to get good quality for low bitrate.
In 3 months with an LLM, they learned less about video streaming than you can learn from a 12 year old's 10 minute youtube video about how to set up Hypercam2
Millions and millions of literal children figured this out.
Keep this in mind next time anyone says LLMs are good for learning new things!
Video codecs are some of the most complex software I've ever encountered with the most number and the most opaque options.
It's easy for streamers because they don't have options, twitch et al give you about three total choices, there's nothing to figure out.
I've built the exact pipeline OP has done - Video, over TCP, over Websockets, precisely because I had to deliver video to through a corporate firewall. Wolf, Moonlight and maybe even gstreamer just shows they didn't even try to understand what they were doing, and just threw every buzzword into an LLM.
To give you some perspective 40Mbps is an incredible amount of bandwidth. Blu ray is 40mbps. This video, in 8K on Youtube is 20Mbps: https://www.youtube.com/watch?v=1La4QzGeaaQ
There's really no explanation for this.
I had a situation where I wanted to chop one encoded video into multiple parts without re-encoding (I had a deadline) and the difficulty getting ffmpeg to do sensible things in that context was insane. One way of splitting the video without re-encoding just left the first GOP without a I frame, so the first seconds of video were broken. Then another attempt left me with video that just got re-timed, and the audio was desynced entirely. I know encoding some frames will be necessary to fix where cuts would break P and B frames, but why is it so hard to get it to "smartly" encode only those broken GOPs when trying to splice and cut video? Clearly I was missing some other parameters or knowledge or incantation that would have done exactly that.
The few knobs that actual video encoder users need to tweak are clearly exposed and usable in every application I have ever used.
>twitch et al give you about three total choices
You don't configure your video encoding through twitch, you do it in OBS. OBS has a lot of configuration available. Also, those three options (bitrate type, bitrate value, profile, "how much encoding time to take" and """quality""" magic number) are the exact knobs they should have been tweaking to come up with an intuition about what was happening.
Regardless, my entire point is that they were screwing around with video encoding pipelines despite having absolutely no intuition at all about video encoding.
They weren't even using FFMPEG. They were using an open source implementation of a video game streaming encoder. Again, they demonstrably have no freaking clue even the basics of the space. Even that encoder should be capable of better than what they ended up with.
We've been doing this exact thing for decades. None of this is new. None of this is novel. There's immense literature and expertise and tons of entry level content to build up intuition and experience with what you should expect encoded video to take bandwidth wise. Worse, Microsoft RDP and old fashioned X apps were doing this over shitty dial up connections decades ago, mostly by avoiding video encoding entirely. Like, we made video with readable text work off CDs in a 2x drive!
Again, Twitch has a max bandwidth much lower than 40mb/s and people stream coding on it all the time with no issue. That they never noticed how obscenely off the mark they are is sad.
It would be like if a car company wrote a blog post about how "We replaced tires on our car with legs and it works so much better" and they mention all the trouble they had with their glass tires in the blog.
They are charging people money for this, and don't seem to have any desire to fix massive gaps in their knowledge, or even wonder if someone else has done this before. It's lame. At any point, did they even say "Okay, we did some research and in the market we are targeting we should expect a bandwidth budget of X mb/s"?
"AI" people often say they are super helpful for research, and then stuff like this shows up.
God knows what process led them to do video streaming for showing their AI agent work in the first place. Some fool must have put "I want to see video of the agent working" in.. and well, the LLM obliged!