I am trying to extract data from this table at Espn cricinfo.
Each row is comprised of the folowing format (Data replaced by headers):
<tr class="data1">
<td class="left" nowrap="nowrap"><a>Player Name</a> (Country)</td>
<td>Score</td>
<td>Minutes Played</td>
<td nowrap="nowrap">Balls Faced</td>
<td etc...
</tr>
I have used the following code in a python script to capture the values in the table:
bats = content.xpath('//tr[@class="data1"]/td[1]/a')
cntry = content.xpath('//tr[@class="data1"]/td[1]/*')
run = content.xpath('//tr[@class="data1"]/td[2]')
mins = content.xpath('//tr[@class="data1"]/td[3]')
bf = content.xpath('//tr[@class="data1"]/td[4]')
The data is then put into a csv file for storage.
All of the data is successfully being captured apart from the country of the player. The player name and country are stored inside the same <td>
tag; however, the player name is also inside an <a>
tag, allowing it to be captured easily. My problem is that the value captured for the players country (the cntry
variable above) is the players name. I am sure that the code is incorrect but I am not sure why.