Comment by ryandrake - Hacker Neue

ryandrake Mar 4, 2025 parent

> The latest system transcribes the VHF control audio at about ~1.1% WER (Word Error Rate), down from a previous record of ~9%.

I'd be curious about what happens when the ASR fails. This is not the place to guess or AI-hallucinate. As a pilot, I can always ask "Say Again" over the radio if I didn't understand. ASR can't do that. Also, it would be pretty annoying if my readback was correct, but the system misunderstood either the ATC clearance or my readback and said NO.

kristian1109 Mar 4, 2025

Good and fair questions.

In the very short term, we're deploying this tech more in a post-operation/training role. Imagine being a student pilot, getting in from your solo cross country, and pulling up the debrief will all your comms laid out and transcribed. In this setting, it's helpful for the student to have immediate feedback such as "your readback here missed this detail...", etc. Controllers also have phraseology and QA reviews every 30 days where this is helpful. This will make human pilots and controllers better.

Next, we'll step up to active advisory (mapping to low assurance levels in the certification requirements). There's always a human in the loop that can respond to rare errors and override the system with their own judgement. We're designing with observability as a first-class consideration.

Looking out 5-10 years, it's conceivable that the error rates on a lot of these systems will be super-human (non-zero, but better than a human). It's also conceivable that you could actually respond "Say Again" to a speech-to-speech model that can correct and repeat any mistakes as they're happening.

Of course, that's a long ways from now. And there will always be a human in the loop to make a final judgement as needed.

btown Mar 4, 2025

One of the challenges I imagine you'll face as you move towards active advisory is that the more an alerting tool is relied upon, the more an absence of a flag from it is considered a positive signal that things are fine. "I didn't hear from Enhanced Radar, so we don't need to worry about ___" is a situation where a hallucinated silence of the alerting tool could contribute to danger, even if it's billed as an "extra" safety net.

I imagine that aviation regulatory bodies have high standards for this - a tool being fully additive to existing tools does not necessarily mean that it's cleared for use in a cockpit or in an ATC tower, right? Do you have thoughts about how you'll approach this? Also curious from a broader perspective - how do you sell any alerting tool into a niche that's highly conscious of distractions, and of not just false positive alerts but false negatives as well?

kristian1109 Mar 4, 2025

Yes, fair points. In talking to controllers, this has already come up. There are a few systems that do advisory alerting and controllers have expressed some frustration because each alert triggers a bunch of paperwork and they are not 100% relevant.

There are lots of small steps on this ladder.

The first is post-operational. You trigger an alert async and someone reviews it after the fact. Tools like this help bring awareness to hot spots or patterns of error that can be applied later in real time by the human controller.

A step up from that is real-time alerting, but not to the main station controller. There's always a manager in the tower that's looking over everyone's shoulder and triaging anything that comes up. That person is not as focused on any single area as the main controllers. There's precedence for tools surfacing alerts to the manager, and then they decide whether it's worth stepping in. This will probably be where our product sits for a while.

The bar to get in front of an active station controller is extremely high. But it's also not necessary for a safety net product like this to be helpful in real time.

ryandrake OP Mar 4, 2025

Thanks for that. It must be exciting to be applying software skills to aviation. Life goals!

To me, speech to text and back seems like an incremental solution, but the holy grail would be the ability to symbolically encode the meaning of the words and translate to and from that meaning. People' phraseology varies wildly (even though it often shouldn't). For example, if I'm requesting VFR flight following, I can do it many different ways, and give the information ATC needs in any order. A system that can convert my words to "NorCal Approach Skyhawk one two three sierra papa is a Cessna one seventy two slant golf, ten north-east of Stockton, four thousand three hundred climbing six thousand five hundred requesting flight following to Palo Alto at six thousand five hundred," is nice, but wouldn't it be amazing if it could translate that audio into structured data:

    {
    atc: NORCAL,
    requester: "N123SP",
    request: "VFR",
    type: CESSNA_172,
    equipment: [G],
    location: <approx. lat/lon>,
    altitude: 4300,
    cruise_altitude: 6500,
    destination: KPAO,
    }

...for ingestion into potentially other digital-only analysis systems. You could structure all sorts of routine and non-routine requests like this, and check them for completeness, use it for training later, and so on. Maybe one day, display it in real time on ATC's terminal and in the pilot's EFIS. With structured data, you could associate people's spoken tail numbers with info broadcast over ADS-B and match them up in real time, too. I don't know, maybe this already exists and I just re-invented something that's already 20 years old, no idea. IMO there's lots of innovation possible bringing VHF transmissions into the digital world!

kristian1109 Mar 4, 2025

Who gave you our event schema!? ;)

Kidding aside, yes, you're exactly right. We're already doing this to a large degree and getting better. Lots of our own data labeling and model training to make this good.

ryandrake OP Mar 4, 2025

Best of luck to you. Finally a Launch HN that's important, potentially life-saving work.

kristian1109 Mar 4, 2025

Thanks! Appreciate your inputs here.

threeseed Mar 4, 2025

> Looking out 5-10 years, it's conceivable that the error rates on a lot of these systems will be super-human (non-zero, but better than a human). It's also conceivable that you could actually respond "Say Again" to a speech-to-speech model that can correct and repeat any mistakes as they're happening.

This is effectively AGI.

And I've not seen anyone reputable suggest that our current LLM track will get us to that point. In fact there is no path to AGI. It requires another breakthrough in pure research in an environment where money is coming out of universities.

fartfeatures Mar 4, 2025

It isn't AGI, it is domain specific intelligence.

kristian1109 Mar 4, 2025

AGI is a moving target, but agreed, lot's more research to be done.

Mikhail_K Mar 5, 2025

> I'd be curious about what happens when the ASR fails.

When, not if. The "artificial intelligence" as it is presently understood is statistical in nature. To rely on it for air traffic control seems quite irresponsible.

ibejoeb Mar 5, 2025

I think it would be handy to have it as a check. If I get an alert about a potentially incorrect readback, then I can call back for clarification.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous