Access wikicode from an xml file in c++

Asked Jan 01 '17 at 10:47

Active Jan 02 '17 at 22:07

Viewed 53 times

I have to read wikidumps and extract the headings, bold words, italics, etc. The formatting is done in wikicode. How can I read the wiki markup? I am using pugiXML to parse the document but I have no idea how to read the wiki markup and extract the text. How can I do this?

edited Jan 02 '17 at 22:07

Christian Gollhardt

16,510
17
74
111

asked Jan 01 '17 at 10:47

Rmcf

can you show some code of what you've tried and didn't work? – Paweł Łukasik Jan 01 '17 at 11:28
I don't have the code for this criteria yet. I want to achieve this using regex but haven't been able to do it. Can you guide me a bit on this? – Rmcf Jan 02 '17 at 18:19
I have made the following regex: – Rmcf Jan 02 '17 at 18:47
\'.*([a-z]|[A-Z])+.*\' But it only returns me the entire string. However, I want to extract each word of the string. How can I do that? – Rmcf Jan 02 '17 at 18:47
you need to show the example (representative) text you want to split with the regex - otherwise it might be hard – Paweł Łukasik Jan 02 '17 at 18:50
my text is the xml file of simple wikipedia. – Rmcf Jan 04 '17 at 18:24

Access wikicode from an xml file in c++

0 Answers0