0

I'm trying to parse Wiktionary Wikitext on the client side (with JavaScript). I found Wiky.js, but it has problems with some markups like {{}}, + etc. Do you know any JavaScript library which may help me with that? I found, that MediaWiki API may translate wikitext into HTML. but I get data from API using the query action, so it would be dummy to request server twice. Maybe there is some way to get HTML instead of Wikitext using the query action? I also found render action, but it sends me the entire page, not only an article.

//edit

Here is part of sample Wikitext:

=====Translations===== {{trans-top|on fingers and toes}} * [[Afrikaans]]: [[nael]] * Albanian: [[thua]] {{f}} * Arabic: {{Arab|[[ظفر]]}} (ẓufr) * Armenian: {{t-|hy|եղունգ|tr=eġung}} *: Old Armenian: {{tø|xcl|եղունգն|tr=ełungn|sc=Armn|xs=Old Armenian}} * [[Azeri]]: {{t+|az|dırnaq|xs=Azeri}} * Bosnian: {{t-|bs|nokat|m}} * [[Breton]]: [[ivin]] {{m}}, ivinoù {{p}} * [[Campidanese Sardinian]]: [[unga]] {{f}} * [[Catalan]]: [[ungla]] {{f}} * Chinese: {{zh-zh-p|指甲|zhǐjia}} * Croatian: {{t+|hr|nokat|m|alt=nȍkat}} * Czech: {{t+|cs|nehet|m}} * Danish: {{t+|da|negl}} * Dutch: {{t+|nl|nagel|m}} * [[Erzya]]: [[кенже]] (kenzhe) * Esperanto: {{t-|eo|ungo|xs=Esperanto}} * Estonian: [[küüs]] * Finnish: {{t+|fi|kynsi}} * French: {{t+|fr|ongle|m}} * [[Galician]]: [[unlla]] {{f}}, [[uña]] {{f}} * Georgian: {{t-|ka|ფრჩხილი|tr=p'rč'xili|sc=Geor|xs=Georgian}} * German: {{t+|de|Nagel|m}} * Greek: *: Anciemt: {{tø|grc|ὄνυξ|m|tr=onyx|xs=Ancient Greek}} *: Modern: {{t+|el|νύχι|n|tr=nýchi}} * [[Gujarati]]: [[નખ]] (nakh) {{m}} * Hindi: {{t-|hi|नाख़ुन|m|tr=nāḵẖun|xs=Hindi}} 

and Wiky.toHtml() output:

<h4>Translations</h4>
<p u"="" style="{trans-top</p></td>?(c_u) <li class=">Arabic: {{t-</p>
</li>
arصرعm?(c_u)
<li class="u">Bengali;"&gt;}, {{t-bspadavica?(c_u) </li>
<li class="u">Chinese: *: Mandarin: {{t</li>
cmn癲癇sc=Hani}}, {{tcmn癫痫tr=diānxiánsc=Hani}}, {{tcmn癲癇癥sc=Hani}}, {{tcmn癫痫症tr=diānxiánzhèng?(c_u)
<li class="u">Croatian: {{t-</li>
hrepilepsijafalt=epilèpsija}}, {{t-hrpadavicaf?(c_u)
<li class="u">Czech: {{t-</li>
csepilepsie?(c_u)
<li class="u">Estonian: {{t+</li>
etepilepsia}}, {{t+et?(c_u)
<li class="u">Finnish: {{t+</li>
fi?(c_u)
<li class="u">French: {{t+</li>
frépilepsie?(c_u)
<li class="u">German: {{t+</li>
deEpilepsief}}, {{t-deFallsucht?(c_u)
<li class="u">Greek: {{t+</li>
elεπιληψία?(c_u)
<li class="u">Hindi: {{t-</li>
hiअपस्मारtr=apasmārxs=Hindi}}, {{thiमिर्गीtr=mirgī?(c_u) 
Nemo
  • 2,441
  • 2
  • 29
  • 63
ciembor
  • 7,189
  • 13
  • 59
  • 100

1 Answers1

2

Wikitext has very complicated edge cases, you cannot expect a javascript library to reliably parse it (though it should be possible to do a much better job than Wiky does). The best is to use action=render, then strip the relevant part from the response (I'm not sure what you mean by entire page vs. article).

Tgr
  • 27,442
  • 12
  • 81
  • 118
  • I think the most problematic thing are templates, which are different for every mediawiki based site. Render gives me entire site, with menu, search form etc. I have to extract some data from wikitext into html list and I think (at this moment) the easiest way is to use regular expressions. But anyway, thanks for reply. – ciembor Aug 03 '11 at 23:20
  • 1
    @ciembor: Render should only include the content of the article, no search bar or menu (see e.g. [this](http://en.wikipedia.org/w/index.php?title=Elephant&action=render)). Maybe you are using a mistyped URL? – Tgr Aug 04 '11 at 21:09