1

I have been trying to scrape a table from the FAA's website --> https://www.faa.gov/uas/legislative_programs/section_333/333_authorizations/ using the Dataminer and Scaper chrome extension. The source code of the table looks something like this

<table id="auth_granted" class="striped">
 <caption class="visuallyHidden">Authorizations Granted Via Section 333 Exemptions</caption>
 <thead>
  <tr>
   <th scope="col">Grant Issued</th>
   <th scope="col">Petitioner</th>
   <th scope="col">Operation / Mission</th>
   <th scope="col">Authorizations <small>(includes both petition and grant of exemption documents)</small></th>
  </tr>
 </thead>
 <tbody>
  <tr>
   <td width="10%">9/25/2014</td>
   <td width="25%">Astraeus Aerial</td>
   <td width="35%">Closed-set filming</td>
   <td width="30%"><a href="http://www.regulations.gov/#!docketDetail;D=FAA-2014-0352">View Documents</a></td>
  </tr>

My problem is finding the correct xpath to display the table rows. I have been trying

//*[@id="auth_granted"]/tbody/tr[1]/td[2]

but am having no luck. Does anyone have any thoughts? Suggestions would be greatly appreciated!

David Dehghan
  • 22,159
  • 10
  • 107
  • 95
Jmiz
  • 11
  • 2
  • 1
    Can you elaborate on "having no luck"? Do you get an error? Do you get a wrong result? no result? – LarsH Jun 16 '15 at 01:32

3 Answers3

2

In xpath when you specify the exact number of node you only get that node. So

//*[@id="auth_granted"]/tbody/tr[1]/td[2]

means you are going to the 1st tr and getting 2nd td. You need to do it without the brackets so

//*[@id="auth_granted"]/tbody/tr/td

Also you can skip all the way down to the td tag by doing something like this:

//*[@id="auth_granted"]/tbody//td

But I would specify that you're going to a table in the first segement and avoid the *, which mean all nodes. Whenever possible you should target specific nodes. So this is probably your best bet:

//table[@id="auth_granted"]/tbody//td
Ben Dehghan
  • 476
  • 4
  • 7
0

The xpath statement works in the Chrome console.

So I'd say that "Dataminer" chrome extension (it's web scraping, not data mining ...) extension is simply defect. Give a low mark, report a bug (suggest to choose a more appropriate name, too) and contact their support.

Or try using a programmers tool instead of a web browser extension... if you have a code question, you do have a chance to get a better answer than "it should work, the tool you are using is broken", sorry.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

All I needed was a simple IMPORTXML in my google doc. This did what I was trying to do.

=IMPORTXML("https://www.faa.gov/uas/legislative_programs/section_333/333_authorizations/", "//tr")

Thanks!

Jmiz
  • 11
  • 2