-1

I have a python script to do data analysis given a set of excel files. Now we are trying to automate this step by periodically searching the mediawiki server for any recent excel file uploads and apply my script.

We are trying to see if there's any way we can atleast get urls of the excel files.

Could anyone help us?

1 Answers1

1

To get up to 100 uploads to English Wikipedia since midnight, April 4th, 2021, in XML format, send https://en.wikipedia.org/w/api.php?action=query&list=logevents&leaction=upload/upload&lestart=2021-04-04T00:00:00Z&lelimit=100&format=xml.

To extract uploaded files' names ending with .xls or .xlsx, use XPath 1 query (//item[ends-with(@title,'.xls')] | //item[ends-with(@title,'.xlsx')])/@title.

To get actual image URLs from image pages' names, use https://en.wikipedia.org/w/api.php?action=query&titles=File:Limbo Royal Blood.jpg|File:Photo of Miriam Roth.jpg&prop=imageinfo&iilimit=100&iiprop=url and apply XPath //imageinfo/ii/@url.

See https://www.mediawiki.org/wiki/API:Logevents.

Alexander Mashin
  • 3,892
  • 1
  • 9
  • 15
  • Thank you very much! I improvised on your response and used mediawiki api and was able to filter only excel files by using "https://******.org/api.php?action=query&list=allimages&aiprop=user|mime|timestamp|url&aisort=timestamp&aidir=older&ailimit=500&aimime=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet&format=json". We plan to read json file and load the data into pandas dataframe, use urls to download and read excel files into another data frame. Only problem currently is we are stuck at "readapidenied" when I try to read json. please help if you can! thanks again! – The explorer Apr 07 '21 at 14:39
  • Switch Miser mode (https://www.mediawiki.org/wiki/Manual:$wgMiserMode) off; you may also need to send a cookie (https://www.mediawiki.org/wiki/API:Tokens_(action)#Important_Note) obtained after authentication (https://www.mediawiki.org/wiki/API:Tokens); you may want to make sure that the API module is not disabled for all, or for your user group (https://www.mediawiki.org/wiki/API:Restricting_API_usage). – Alexander Mashin Apr 07 '21 at 15:26