2

I'm have some html such as below:

<html>
<body> 
... other html stuff ...
<form method="post" action="goSomewhere">
    <input type="hidden" value="something">
    <input type="hidden" value="something2">
<table>
    <tr><td><input type="checkbox" name="123">Stuff 1</td></tr>
    <tr><td><input type="checkbox" checked name="456">Stuff 2</td></tr>
    <tr><td><input type="checkbox" name="789">Stuff 3</td></tr>
</body> 
</html>

I'm trying to select everything in the <form> except for the tag with a particular name (innerhtml, that is). Here's the query I'm using:

$query = "//form//td[not(normalize-space() = 'Stuff 2')]"; 

This successfully filters out that particular <td> of content, but the problem is that it then only returns <td> content. As you can see, there are other <input> that are not in the <table> and I need those too.

Can anyone help with this query please? Thanks!

Kenny
  • 2,124
  • 3
  • 33
  • 63

1 Answers1

0

You are looking for //form//td[not(normalize-space() = 'Stuff 2')]/input|//input[not(ancestor::table)].

xabbuh
  • 5,801
  • 16
  • 18
  • Thank you! That seemed to work! Do you have an idea how I can just get all the items in the `form` tag? For some reasons the `input` return seems to be empty. I actually still need the `table/td` content as well. I tried changing `../form|//form[not(ancestor::form)]` but it doesn't seem to do anything. – Kenny Dec 30 '15 at 17:52
  • Not sure I understand. Can you explain which nodes you miss? – xabbuh Dec 30 '15 at 18:00
  • If I do a `print_r` of the crawler object (containing all the inputs) it prints empty objects. So basically I'm altering my original request: instead of getting all the inputs (minus that particular one with "Stuff 2" name), I'd like to get all the content between `
    ` and `
    `.
    – Kenny Dec 30 '15 at 18:03
  • You could use the `reduce()` method of the `Crawler` class which takes a callback that is used to filter nodes. – xabbuh Dec 30 '15 at 18:08