It's possible to identify a surprisingly large number of matching words by learning a linear transformation mapping word vectors from two different languages into the same space (e.g. https://arxiv.org/abs/1805.06297 ).
But the problem with ancient languages is typically that there's not enough data to usefully constrain a large enough model. Doubly so for undeciphered scripts where scholars might not even agree on how many different letters there are.
Presumably, they'd want to get at embeddings, and compare the dimensional space somehow to say: 'the relation between tokens a,b,c is close to the relation of tokens a1,b1,c1 in a similar model of texts of known language of apparently same family (same up to aN,bN,cN), and out of these N sequences, sequence X makes most sense given existing examples'.
(As you can tell, the argument involves some handwaving, but it may possible?)
In English. The decoder translates from the Dhofari to tokens the LLM understands. So you present the LLM with the decoded Dhofari, and a question in English, like "Please express the following in modern English" and the LLM would answer in English. There's also a chance the decoded Dhofari would be intelligible to humans directly, though I don't know how large that chance is.