I would imagine this could be one of the inputs along with a STT system as context to an LLM. Because in general we can speak faster than we can write/type and for me, specifically, after a point in the day typing creates a higher cognitive load than speaking.
1. "Create a reminder for reading this email at 5:00 pm" and this could infer what to do from the screen shot's description(plus a local MCP tool for calendar)
2. "Can you fetch that file form that project in that workspace and implement the pattern in the code on my vscode terminal?" It can lower cognitive fatigue of typing and clicking a bunch of place.
3. Take notes as I describe something on the screen. It could be for prompt composition e.g. get the link from my browser and the file on vscode and write code that does XYZ.
1. "Create a reminder for reading this email at 5:00 pm" and this could infer what to do from the screen shot's description(plus a local MCP tool for calendar)
2. "Can you fetch that file form that project in that workspace and implement the pattern in the code on my vscode terminal?" It can lower cognitive fatigue of typing and clicking a bunch of place.
3. Take notes as I describe something on the screen. It could be for prompt composition e.g. get the link from my browser and the file on vscode and write code that does XYZ.