I have a list a plant names in various languages (botanical names; English, German and loads of historical variants of these two languages) in XML (however, transforming it into whatever serialization won't be a problem). This is what it looks like now:
<ingredients>
<ingredient>
<lemma>Rosmarinus officinalis L.</lemma>
<syn lang="deu">Rosmarin</syn>
<syn lang="ang">boþen</syn>
<syn lang="ang">feldmædere</syn>
<syn lang="ang">lauendie</syn>
<syn lang="eng">rosemary</syn>
<syn lang="ang">sundēaw</syn>
</ingredient>
<ingredient>
...
</ingredients>
Now I was wondering if there was a way to assign Wikidata and/or other URIs from controlled vocabularies automatically to this list. Of course considering a manageable error rate for strings that don't match 100%. What approaches could be used to get:
<ingredient wikidata="Q122679">
<lemma>Rosmarinus officinalis L.</lemma>
...
Thanks for any ideas!