0

i would like to get all (87) subcategories and all pages (200) in the "Pages in category "Masculine given names"" section on this site: https://en.wikipedia.org/wiki/Category:Masculine_given_names

I tried it with the following code:

import pywikibot
site = pywikibot.Site("en", "wikipedia")
page = pywikibot.Page(site, 'Category:Masculine_given_names')
print(list(page.categories()))

But with that i only get the categories at the very bottom of the page. How can i get the subcategoreis and (sub)-pages on this site?

Rapid1898
  • 895
  • 1
  • 10
  • 32

1 Answers1

2

How can i get the subcategories and (sub)-pages of a given category?

First you have to use a Category class instead of a Page class. You have to create it quite similar:

  >>> import pywikibot
  >>> site = pywikibot.Site("en", "wikipedia")
  >>> cat = pywikibot.Category(site, 'Masculine_given_names')

A Category class has additional methods, refer the documentation for further informations and the available parameters. The categoryinfo property for example gives a short overview about the category content:

  >>> cat.categoryinfo
  {'size': 1425, 'pages': 1336, 'files': 0, 'subcats': 89}

There are 1425 entries in this category, there are 1336 pages and 89 subcategories in this case.

To get all subcategories use subcategories() method:

  >>> gen = cat.subcategories()

Note, this is a generator. As shown below you will get all of them as found in categoryinfo above:

  >>> len(list(gen))
  89

To get all pages (articles) you have to use the articles() method, e.g.

  >>> gen = cat.subcategories()

Guess how many entries the corresponing list will have.

Finally there is a method to get all members of the category which includes pages, files and subcategories called members():

  >>> gen = cat.members()
xqt
  • 280
  • 1
  • 11
  • Can you get all nested sub-categories with this method? How do you deal with recursivity? – 8oris Jul 24 '23 at 09:26