1

I am using Python and MySQL to query mediawiki database to get the current status of articles (i.e. whether the article is FA, GA, GAN etc.) but have been unable to do so.

I know current status is stored in the old_text field of the text table. I was trying to something like:

loc = select (locate('currentstatus', old_text))
query = ('select substring(old_text, '%s', 20) from wikidb where page_id = 1234' % (loc))

but unfortunately loc gives the first occurrence of currentstatus and not the last which is not very 'current' since the newest/latest status is on the bottom.

I am not sure how to fix it or if I am using the right approach.

wallyk
  • 56,922
  • 16
  • 83
  • 148
hopeful
  • 35
  • 4
  • What SQL API are you using? Where does the data come from? Even if you just give the format of the `currentstatus` field, I'm sure someone could help you. – Michael Mior Jul 06 '11 at 17:52
  • What is the format of the database field `old_text`? – Michael Mior Jul 06 '11 at 18:24
  • I used special export to download articles, which then I imported into the mediawiki database, I'm using mysqldb (a python module for mysql) to query the mediawiki. old_text is a blob and currentstatus format can be viewed at http://en.wikipedia.org/wiki/Template:ArticleHistory – hopeful Jul 08 '11 at 04:58

1 Answers1

0

For Wikipedia, it would be more to the point to examine the categories the article is in. Or if processing raw wikitext, look for the corresponding template:

  • Featured articles (FA) are in [[category:Featured articles]] and use {{featured article}}, which references [[template:featured article]]
  • Good articles (GA) are in [[category:Good articles]] and use {{good article}}, which references [[template:good article]]

Both those categories are hidden, so you would have to enable the preference for displaying hidden categories, or traverse the category contents to see if the article is there.

Other article classes (A, B, C, FL, Start, Stub, List, undefined) are assessed on the corresponding talk page using one or more WikiProject templates. There is no standard.

wallyk
  • 56,922
  • 16
  • 83
  • 148