2

I'm grabbing some image metadata from the Wikipedia API but noticed the text could be truncated.

On this page:
http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=comment&format=xml&titles=File:BrolinFoxFassbenderJonahHexJuly09.jpg

I only see:

{{OTRS pending|year=2009|month=August|day=16}} {{Information
|Description={{en|Josh Brolin, Megan Fox, and Michael Fassbender promoting
the 2010 film ''Jonah Hex'' at San Diego Comic-Con.}} |Source=
http://www.flickr.com/photos/sdnatasha/3767292285/ |Date=

If i look at the real data for the file:
http://commons.wikimedia.org/wiki/Special:Export/File:BrolinFoxFassbenderJonahHexJuly09.jpg

I see the full information:

== {{int:filedesc}} =={{Information|Description={{en|Josh Brolin, Megan Fox,
and Michael Fassbender promoting the 2010 film ''Jonah Hex'' at San Diego
Comic-Con.}}|Source=
http://www.flickr.com/photos/sdnatasha/3767292285/|Date=2009-07-28|Author=NatashaBaucas
at
http://www.flickr.com/photos/sdnatasha/|Permission=Creative Commons
Attribution|other_versions=}}{{Location dec|32.705573|-117.160391|}}==
{{int:license}} =={{self|cc-by-2.0|author=Natasha
Baucas}}{{PermissionOTRS|ticket=
https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketID=3519937}}[[Category:MeganFox
in 2009]][[Category:Josh Brolin]][[Category:Michael
Fassbender]][[Category:2009 Comic-Con International]][[Category:Images
uploaded by User:Nehrams2020]

Can I use the Wikipedia API to get the non-truncated comments?

logi-kal
  • 7,107
  • 6
  • 31
  • 43
tommy chheng
  • 9,108
  • 9
  • 55
  • 72

1 Answers1

3

The comments you get with iiprop=comment are the short bits of text shown in the "File history" table, and the reason they're truncated to 255 bytes is because that's how they're stored in the database to begin with.

What you want, instead, is the content of the file description page, which you get the same way as you'd get any page content: rvprop=content.

(The reason why it's confusing like that is because the default MediaWiki upload interface is kind of weird. The image comments are really meant to be short notes similar to edit summaries, but when you first upload a new image, the same text you enter is used both for the page content and for the image comment. If it's too long to fit in a comment — as is common on Wikipedia these days — the comment is silently truncated, but the full text still goes into the page content. I guess someone thought that made sense back when that interface was first written; image descriptions tended to be much shorter back then.)

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • It looks like i have to use the commons api, so en.wikipedia.org has direct DB access to the commons wikipedia database? So I need to know ahead of time the source of the image(commons or en)? http://commons.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=File:BrolinFoxFassbenderJonahHexJuly09.jpg – tommy chheng Oct 30 '11 at 17:28
  • Yeah. If you're [also doing an `imageinfo` query](http://en.wikipedia.org/w/api.php?action=query&prop=revisions|imageinfo&rvprop=content&format=xml&titles=File:BrolinFoxFassbenderJonahHexJuly09.jpg), you'll see `imagerepository="shared"` for Commons images. Or you can just assume that any missing images are probably at Commons and check there. – Ilmari Karonen Oct 30 '11 at 20:39