6

I would want to get a structured version of a Wikiquote page via JSON (basically I need all phrases)

Example: http://en.wikiquote.org/wiki/Fight_Club_(film)

I tried with: http://en.wikiquote.org/w/api.php?format=xml&action=parse&page=Fight_Club_(film)&prop=text

but I get all HTML source code. I need each pharse as an element of an Array

How could I achieve that with DBPEDIA?

http://f.cl.ly/items/2v3w1U2c0J0z1M0V0k0b/Schermata%2012-2456269%20alle%2013.06.24.png

sparkle
  • 7,530
  • 22
  • 69
  • 131

2 Answers2

4

For one thing Iam not sure whether you can query wiki quotes using DBpedia and secondly, DBpedia gives you only info box data in a structured way, it does not in a any way the article content in a structured way. Instead with a little bit of trouble you can use the Media wiki api to get the data


EDIT

The URI you are trying gives you a text so this will make things easier, but not completely.

Try this piece of code in your console:

require 'Nokogiri'

content = JSON.parse(open("http://en.wikiquote.org/w/api.php?format=json&action=parse&page=Fight_Club_%28film%29&prop=text").read)

data = content['parse']['text']['*']

xpath_data = Nokogiri::HTML data

xpath_data.xpath("//ul/li").map{|data_node| data_node.text}

This is the closest I have come to an answer, of course this is not completely right because you will get a lot on unnecessary data. But if you dig into Nokogiri and xpath and find out how to pin point the nodes you need you can get a solution which will give you correct quotes at least 90% of the time.

Rolv Apneseth
  • 2,078
  • 2
  • 7
  • 19
djd
  • 1,007
  • 1
  • 7
  • 20
1

Just change the format to JSON. Look up the Wikipedia API for more details. http://en.wikiquote.org/w/api.php?format=json&action=parse&page=Fight_Club_(film)&prop=text

R891
  • 2,550
  • 4
  • 18
  • 30
  • 3
    Although the returned response is structured as a JSON object, the interesting data remains unstructured in a single field that contains a huge HTML string. – Simon Steinberger Apr 08 '16 at 10:52