0

I am looking to parse the Wikipedia talk page (e.g., https://en.wikipedia.org/wiki/Talk:Elon_Musk). I would like to loop through texts by contributors/editors. Not sure how do I do it. For now, I have the following code:

import pywikibot as pw
wikiPage="elon_musk"
page = pw.Page(pw.Site('en'), wikiPage)
talkpage = page.toggleTalkPage()
s=talkpage.text 
cs=talkpage.contributors()

It seems pretty hard to parse the text (i.e., s) and find the talk text made by each contributor. Not sure where the talk begins and ends for a contributor and what talk text is in response to a talk text made by others. Is there a way that talk page returns segments that I can loop through?

Many thanks for your help!

SanMelkote
  • 228
  • 2
  • 12

1 Answers1

1

I don't know about pywikibot, but you can do this via the normal API. This will fetch the revisions: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Talk:Elon%20Musk&rvlimit=500&rvprop=timestamp|user|comment|ids

Then you can pass the revision ids to get the change in each edit: e.g. https://en.wikipedia.org/w/api.php?action=compare&fromrev=944235185&torev=944237256

smartse
  • 1,026
  • 7
  • 12
  • Thank you. But I am not sure that talk page revision is a good approach to get each editor's comments and the related threaded conversations. – SanMelkote Mar 10 '20 at 09:57
  • 1
    Well then it might help if you explained what you are actually looking for then – smartse Mar 10 '20 at 17:01