I want to create a graph database of actors and the movies in which they've acted. To get the list of actors and movies, I'm trying to use the pywikibot parser, but I've only been able to get the full page, when I just want the filmography portion of the page. Is there a way to parse the page so I can just obtain the filmography? Here's what I've done so far:
import pywikibot as pw
site = pw.Site()
page = pw.Page(site, actor_name) #will be put into loop to get multiple actors
print page.text #returns full text of the page in format below
print page.linkedPages #returns linked pages
One idea is had was to return all the linked pages associated with the actor, since most movies are linked. The format in which I get the text data is as follows:
{{Infobox person
| name =
| birth name =
}}
Summary
==Early life==
==Career==
==Filmography==
What can I do to only get the Filmography portion of the page?