My guess it's currently quite naive - automatically transliterate English to text (bad phonetic misspellings and all), then pass that to a language translation layer which garbles the output even more.It would be interesting if in fact this is AI, reflecting in its infancy that it still has a long way to go.
Just the idea of effectively translating one language into another in real time seems daunting. Especially given that there are any number of words and phrases that cannot be literally translated word-for-word to begin with. How would AI do such a better job of this?
Incorporating a LLM layer would be able to help because LLMs operate rather like sequence prediction outputs that can go very far back in history. A LLM should do a better job of automatically smoothing over common mistakes, e.g., it should be able to un-twist the English transcribed "I'll have my meet medium wear" back to "I'll have my meat medium rare."
Adding in that extra step which is effectively a shortcut for NLU auto-correction should produce significantly better output. We've been finding that LLMs have been outperforming our prior NLU efforts.