How to parse Wikipedia talk page content by contributor?

Question

I am looking to parse the Wikipedia talk page (e.g., https://en.wikipedia.org/wiki/Talk:Elon_Musk). I would like to loop through texts by contributors/editors. Not sure how do I do it. For now, I have the following code:

import pywikibot as pw
wikiPage="elon_musk"
page = pw.Page(pw.Site('en'), wikiPage)
talkpage = page.toggleTalkPage()
s=talkpage.text 
cs=talkpage.contributors()

It seems pretty hard to parse the text (i.e., s) and find the talk text made by each contributor. Not sure where the talk begins and ends for a contributor and what talk text is in response to a talk text made by others. Is there a way that talk page returns segments that I can loop through?

Many thanks for your help!

score 1 · Accepted Answer · answered Mar 09 '20 at 10:34

1

I don't know about pywikibot, but you can do this via the normal API. This will fetch the revisions: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Talk:Elon%20Musk&rvlimit=500&rvprop=timestamp|user|comment|ids

Then you can pass the revision ids to get the change in each edit: e.g. https://en.wikipedia.org/w/api.php?action=compare&fromrev=944235185&torev=944237256

answered Mar 09 '20 at 10:34

smartse

1,026
7
12

Thank you. But I am not sure that talk page revision is a good approach to get each editor's comments and the related threaded conversations. – SanMelkote Mar 10 '20 at 09:57
1

Well then it might help if you explained what you are actually looking for then – smartse Mar 10 '20 at 17:01

How to parse Wikipedia talk page content by contributor?

1 Answers1