0

I am scraping this page. I am accessing following HTML to fetch Section details:

<h2>
    <span class="mw-headline" id="Volume_one:_Quicksilver_.282003.29">Volume one:
        <i>
            <a href="https://en.wikipedia.org/wiki/Quicksilver_(novel)"
                class="extiw"
                title="w:Quicksilver (novel)">Quicksilver</a>
        </i> (2003)
    </span>
    <span class="mw-editsection">
        <span class="mw-editsection-bracket">[</span>
        <a href="/w/index.php?title=The_Baroque_Cycle&amp;action=edit&amp;section=1"
            title="Edit section: Volume one: Quicksilver (2003)">edit</a>
        <span class="mw-editsection-bracket">]</span>
    </span>
</h2>

I wanto grab the id, Volume_one:Quicksilver.282003.29. For that I wrote following code:

$sectionid = '#Volume_one:_Quicksilver_.282003.29';
print($crawler->filter( $sectionid ));

But it is not returning the details despite of it's there. Where am I doing wrong? It does fetch #Epilogs section well.

Please help.

Alvin Bunk
  • 7,621
  • 3
  • 29
  • 45
Volatil3
  • 14,253
  • 38
  • 134
  • 263

1 Answers1

0

Have you tried:

print( $crawler->filterXPath('//*[@id='Volume_one:_Quicksilver_.282003.29']') );

I used "Inspect in FirePath" in a FirFox browser (with FireBug installed) to get the xpath from that page.

Alvin Bunk
  • 7,621
  • 3
  • 29
  • 45
  • Can you please edit your post and show your code where you create the client and do a GET and create the crawler. You may have missed something... – Alvin Bunk Apr 24 '17 at 16:18