0

Yet another English Wiktionary parsing question.

Overall, I am prepared to parse the wikitext format, so the standard API works for me.

The trouble is though that I want to use the English Wiktionary API to obtain the declension tables. For some odd reason, the tables are referenced by codes. Sometimes they are in the output, but in most cases they are missing. E.g. a call to a Russian word like http://en.wiktionary.org/w/api.php?format=xml&action=query&titles=крот&rvprop=content&prop=revisions&redirects=1 yields:

====Declension====
{{ru-noun-table|b|a=an}}

How do I convert it into a full declension table?

I played with a bunch of parameters from here: https://www.mediawiki.org/wiki/API:Query - no result.

One workaround I found is to use the new Wiktionary RESTful API, like this: https://en.wiktionary.org/api/rest_v1/page/html/крот (reference: https://en.wiktionary.org/api/rest_v1/#/). But it only returns HTML, which is more difficult to parse!

Is that the best that can be done?

Is there a special call to the declension tables perhaps? I mean, if it gets generated, there's got to be a way.

Vadim Berman
  • 1,932
  • 1
  • 20
  • 39

1 Answers1

0

The table is generated by a Module of wiktionary, namely Module:ru-noun, which is a lua script. It functions like a regular mediawiki template call, the script is contextualized with parameters (b,a=an) and has access to page name (крот).

See "Wikinflection: Massive semi-supervised generation of multilingual inflectional corpus from Wiktionary" for the rational behind this, then the resulting Dictionary builder project.

IRA1777
  • 593
  • 3
  • 11
  • Thanks for the background, @ira1777! Is the module exposed via any of the APIs? – Vadim Berman Jul 20 '20 at 01:05
  • No. In your case, i think html parsing is the simplest solution. – IRA1777 Jul 20 '20 at 07:03
  • Is this generated data included in a data dump? For example, I'd like the masc / fem / sing / pl IPA transcriptions contained in the table on https://fr.wiktionary.org/wiki/contagieux But these do not show up in the XML data dump, or even when you attempt to edit ( https://fr.wiktionary.org/w/index.php?title=contagieux&action=edit ) – zadrozny Mar 15 '21 at 16:15
  • no, its not. Pure on-the-fly data generation. – IRA1777 Mar 16 '21 at 15:37