0

I'm scraping a webpage with R. I use the "RSelenium" and "XML" packages. The following table has a radio button in some rows. I need to know which rows (for instance the first and the third row) have a radio button disabled so as to skip the row during the scraping. Which is the best way to do that? I cant' figure out how to easily get the number of the rows with the disabled inputs.

<table cellspacing="1" cellpadding="0" border="0" width="100%" id="table1">
<tbody><tr>
    <td width="36">&nbsp;</td>
    <td width="100"><b>Matricola Inps</b></td>
    <td width="150"><b>Denominazione</b></td>
    <td width="100"><b>Stato Adesione</b></td>
    <td width="120"><b>Note</b></td>
</tr>

<tr>

    <td align="center" width="36">

    <input type="radio" disabled="" id="sistema" name="unitaId" value="XXX">        
    </td>
    <td><font color="#C0C0C0">
            N/D
    </font>
    </td>
    <td>
        <font color="#C0C0C0">
            blablabla
   </font>
   </td>
    <td>
        <font color="#C0C0C0">              
   </font>
   </td>

   <td>     
    </td>
</tr>

<tr>
    <td align="center" width="36">      
    <input type="radio" id="sistema" name="unitaId" value="XXX">        
    </td>
    <td>
            N/D        
    </td>
    <td>            
            blablabla       
   </td>
    <td>                                   
   </td>
   <td>     
    </td>   
</tr>

<tr>    
    <td align="center" width="36">      
    <input type="radio" id="registra" name="unitaId" value="XXXX">      
    </td>
    <td>
            XXXXX

    </td>
    <td>            
            blabla       
   </td>
    <td>            
            Aderente       
   </td>
   <td>     
            Sede Principale&nbsp;                           
            Sede Legale&nbsp;                               
    </td>   
</tr>         
</tbody></table>

Thank you very much.

Gianluca78
  • 794
  • 1
  • 9
  • 24
  • If every row has an input field, you get the row numbers by using sth like `which(unname(!is.na(sapply(xpathSApply(doc, "//table/tbody/tr/td/input[@type = 'radio']", xmlAttrs), "[", "disabled"))))`. – lukeA Mar 12 '15 at 09:09
  • Thank you. But it doesn't work because I realized that not every row has an input... I edited also the question according to this. – Gianluca78 Mar 12 '15 at 15:29

1 Answers1

0

I find the following solution.

nodesToString <- xpathSApply(doc, "//tr", saveXML) 
disabledIndexes <- which(grepl('disabled', nodesToString))

Maybe it could be useful to someone in the future...

Gianluca78
  • 794
  • 1
  • 9
  • 24