0

I am making this request:

http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=self-administration&prop=revisions&rvprop=content&rvparse=&rvsection=0

My goal is to get the plain-text from the intro of an article.

It gives me back some HTML in a XML file. After strip_tags and preg_replace, to remove references, I get this:

Self-administration is, in its medical sense, the process of a subject administering a pharmacological substance to him-, her-, or itself. [...] Cite error: There are tags on this page, but the references will not show without a {{Reflist}} template or a tag; see the help page.

I want to remove

Cite error: There are tags on this page, but the references will not show without a {{Reflist}} template or a tag; see the help page.

How can I get ride of that either with php (preg_replace?) or in my initial query (ignoring errors?).

Wistar
  • 3,770
  • 4
  • 45
  • 70

1 Answers1

1
$bad = ' <br /><strong class="error">Cite error: There are <code>&lt;ref&gt;</code> tags on this page, but the references will not show without a <code>&#123;&#123;Reflist&#125;&#125;</code> template or a <code>&lt;references /&gt;</code> tag; see the <a href="/wiki/Help:Cite_errors/Cite_error_refs_without_references" title="Help:Cite errors/Cite error refs without references">help page</a>.</strong> ';

$good = str_replace($bad, '', $intro);
Alix Axel
  • 151,645
  • 95
  • 393
  • 500