I'm very new to Python, but using a few different online guides I've managed to stitch together some code that logs me into a website called cronometer.com (health tracking website/app, similar to myfitnesspal). Unfortunately, I'm having trouble actually scraping any data.
I have the following code (ignore the Hass/AppDaemon, I'm running this python script in Home Assistant):
import appdaemon.plugins.hass.hassapi as hass
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import requests
class Scraper(hass.Hass):
def initialize(self):
self.log("Scraper Initialized")
self.get_values(self)
def get_values(self,kwargs):
self.login_url = "https://cronometer.com/login/"
self.r = requests.get(self.login_url)
self.bs = BeautifulSoup(self.r.text, 'html.parser')
self.csrf_token = self.bs.find('input', attrs={'name': 'anticsrf'})['value']
self.url = "https://cronometer.com/"
self.session = requests.Session()
self.payload = {
"username": "MY_USERNAME",
"password": "MY_PASSWORD",
"anticsrf": self.csrf_token
}
self.headers = {'referer': self.login_url, 'User-agent': 'Chrome'}
self.sensorname = "sensor.scraper"
self.friendly_name = "Fasting Status"
try:
s = self.session.post(self.login_url, data=self.payload, headers=self.headers, cookies=self.r.cookies)
except:
self.log("Could not log in")
return
self.log(self.csrf_token)
s = self.session.get(self.url)
page = s.content
soup = BeautifulSoup(page, "html.parser")
# Test 1
fasting1 = soup.select('#cronometerApp > div:nth-child(2) > div:nth-child(1) > div > table > tbody > tr > td:nth-child(1) > div > div:nth-child(8) > div > div.diary-item-title > div')
self.log("TEST 1")
self.log(fasting1)
# Test 2
fasting2 = soup.select('#cronometerApp > div:nth-child(2) > div:nth-child(1) > div > table > tbody > tr > td:nth-child(1) > div > div:nth-child(8) > div > div.diary-item-content > div.GJES3IWDERB')
self.log("TEST 2")
self.log(fasting2)
# Test 3
fasting3 = soup.select('#w-node-dd7aab6f-acfc-dfa1-2372-313b5d39fc2b-0dd15747 > div.div__mobile__features-text-1 > h5')
self.log("TEST 3")
self.log(fasting3)
# Test 4
fasting4 = soup.select('#cronometerApp > div:nth-child(2) > div:nth-child(1) > div > table > tbody > tr > td:nth-child(2) > div > div.GJES3IWDHFD > button:nth-child(1) > span')
self.log("TEST 4")
self.log(fasting4)
# Test 5
fasting5 = soup.select('#cronometerApp > div:nth-child(2) > div:nth-child(1) > div > table > tbody > tr > td:nth-child(2) > div > div.diary_side_box.GJES3IWDIQB > div.GJES3IWDKQB > div > div.GJES3IWDITE > table > tbody > tr > td > div:nth-child(1) > span')
self.log("TEST 5")
self.log(fasting5)
self.set_state(self.sensorname, state= "Test", attributes = {"friendly_name": self.friendly_name})
From what I can tell, this code successfully logs into cronometer.com with no issues. The problem is (I think) the URL for my personal homepage is the same URL for the website before logging in. So after using session.post
to send my credentials to the website, I'm using session.get
to scrape data from my "profile". But it's only scraping data from the normal cronometer.com webpage (before you login), not my own personal webpage with the same URL.
One thing I did notice is that the URL does change slightly when I click on the tabs at the top, as you can see here:
When I click on Diary, the URL changes from cronometer.com to cronometer.com/#diary, and Trends is cronometer.com/#trends, so on and so forth. But using those specific URLs is not proving fruitful either.
Again, sorry for my lack of knowledge, but how can I overcome this issue? I've tried looking at some online guides about Selenium, but so far I haven't been able to make sense of how I could use Selenium to log in when the issue isn't necessarily logging in (I don't think), but scraping the right webpage. Thanks in advance for your help.