0

I have a list a plant names in various languages (botanical names; English, German and loads of historical variants of these two languages) in XML (however, transforming it into whatever serialization won't be a problem). This is what it looks like now:

<ingredients>
  <ingredient>
    <lemma>Rosmarinus officinalis L.</lemma>
    <syn lang="deu">Rosmarin</syn>
    <syn lang="ang">boþen</syn>
    <syn lang="ang">feldmædere</syn>
    <syn lang="ang">lauendie</syn>
    <syn lang="eng">rosemary</syn>
    <syn lang="ang">sundēaw</syn>
  </ingredient>
  <ingredient>
    ...
</ingredients>

Now I was wondering if there was a way to assign Wikidata and/or other URIs from controlled vocabularies automatically to this list. Of course considering a manageable error rate for strings that don't match 100%. What approaches could be used to get:

 <ingredient wikidata="Q122679">
    <lemma>Rosmarinus officinalis L.</lemma>
    ...

Thanks for any ideas!

1 Answers1

1

Just wanted to say that OpenRefine does exactly what I needed!