My somewhat contrarian take: don't try to automatically replace the text with pre-recoreded content; instead focus on ensuring that the user is aware that both are available, and can access whichever is most appropriate for them based on the tools they have at their disposal.
Some more background context might help: from your description, it sounds like this is perhaps an academic or research site, that has fragments of text in these languages, with audio; but where the remainder of the site structure - headings and supporting narrative text - is in some 'well-supported' language (English, etc.)? (What is the encoding system used for this test?)
If so...
Be aware that a screenreader user does not typically read an entire page top-to-bottom in a completely linear fashion; they can browse the page using the heading structure. In a well-marked-up-page, the user has the freedom to skip over the portions that they are not interested in or which are not relevant to them. Focus on providing this flexibility rather than making (well-meaning, but potentially incorrect) policy decisions on behalf of the user.
Don't assume that a screenreader user is using speech in the first place; they could be using Braille, whether due to the fact that speech output is not an option for them, or simply because Braille is their preferred form of output.
Finally, don't assume that because a screenreader user can't hear the text properly (due to text-to-speech limitations), that the textual form of the content should be hidden from them entirely; they may still want the ability to cut-and-paste the characters that represent the text so that they can send them to a colleague, for example. Or, depending on the writing scheme used, a screenreader user may still be able to step through the characters letter-by-letter and have the words spelled out to them letter-by-letter - many screenreaders can call out non-latin characters by their Unicode name.