The Chinese website here mainly describes the information of one company. Since there are many pages containing similar contents, I decided to learn data crawler in Python.
Basic code
import requests
from bs4 import BeautifulSoup
page = requests.get('http://182.148.109.184/enterprise-
info!getCompanyInfo.action?companyid=1000356')
soup = BeautifulSoup(page.text, 'html.parser')
source_content = soup.find(class_='rightSide').find(class_='content register').find(class_='formestyle')
The information I want to collect
The figure was captured in Chrome inspect element page.
Maybe Chinese is not friendly here, I created an example here for better illustration.
<th> the variable name </th> => For example, "company name", "company location"
<td> the target data I want to save </td>
My question
Based on my basic code, the source_content
contain no information inside . The output file was shown like this:
Comparing fig1, 2, we can see that the information of longitude, latitude has gone.
How to get those data with Python? Any advice would be appreciated