0

I have a document written by a naughty web developer, which looks something like:

<div id="details">
    Here is some text without a p tag. Oh, let's write some more.
    <br>
    <br>
    And some more.
    <table id="non-unique">
        ...
    </table>
    Replaces the following numbers:
    <table id="non-unique">
        ... good stuff in here
    </table>
</div>

So, it's not well marked up. I need to get hold of the table with the good stuff in it, however, it doesn't have a unique id value and it is not always in the same order, or last in the div etc.

The only running theme is that it always follows the text Replaces the following numbers:, though this text may be as it is in the example above, or sometimes in a h4 element!

Is it possible to use an XPath expression to wrangle this table out by searching for the replaces string and then asking for the next table element??

Thanks!

Edwardr
  • 2,906
  • 3
  • 27
  • 30

3 Answers3

1

That looks valid to me:

//text()[contains(.,"Replaces the following numbers")]/following-sibling::table[1]

There's no rule that id's must be unique.

pguardiario
  • 53,827
  • 19
  • 119
  • 159
1

Use:

//node()[self::h4 or self::text()]
         [normalize-space() = 'Replaces the following numbers:']
           /following-sibling::*[1][self::table]

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "//node()[self::h4 or self::text()]
             [normalize-space() = 'Replaces the following numbers:']
               /following-sibling::*[1][self::table]
   "/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided document (corrected to be made well-formed XML document):

<div id="details">
 Here is some text without a p tag. Oh, let's write some more.
    <br />
    <br />
    And some more.     
    <table id="non-unique">
     ...
  </table>
  Replaces the following numbers:
    <table id="non-unique">
    ... good stuff in here
    </table>
</div>

the XPath expression is evaluated and the selected node(s) are copied to the output:

<table id="non-unique">
    ... good stuff in here
    </table>

When the same transformation (XPath expression) is applied on this XML document:

<div id="details">
 Here is some text without a p tag. Oh, let's write some more.
    <br />
    <br />
    And some more.     
    <table id="non-unique">
     ...
  </table>
  <h4>Replaces the following numbers:</h4>
    <table id="non-unique">
    ... good stuff in here
    </table>
</div>

again the wanted element is selected and output:

<table id="non-unique">
    ... good stuff in here
    </table>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Hi, thanks for this detailed response. The expression works well for the case where the "Replaces" tag is an H4 tag, but not when it is just text (not surrounded by a H4 tag). I'm testing using Firebug in Firefox. – Edwardr Jun 26 '12 at 09:36
  • @Edwardr: I always test my answers by running the transformation -- with about ten different XSLT processors -- and this is also the current case. You have some problem running the transformation, or setting the source XML document, or both. Maybe you omitted the ':' or separated it with a space. – Dimitre Novatchev Jun 26 '12 at 11:57
-1

No, as XPath requires well-formed Xml to run on.

cf. this answer, which provides some additional info.

Community
  • 1
  • 1
O. R. Mapper
  • 20,083
  • 9
  • 69
  • 114
  • Hi, you added the edit after the downvote, so your previous answer (which could have merely been a comment on my question) was rather terse and not particularly helpful. It's more helpful now you have added some context via the link. Secondly, there are two working XPath-based solutions below, so I don't think it's reasonable for you to claim XPath can't work on this HTML. – Edwardr Jun 28 '12 at 09:02
  • @Edwardr: Ah, well, the question was "Is it possible to use an XPath expression to wrangle this table out by searching for the replaces string and then asking for the next table element??", so my answer was "No, it's not possible." Indeed, I added the link quite a while later and unrelated to the -1 when I stumbled over the other answer and remembered this question :-) – O. R. Mapper Jun 28 '12 at 09:40