Currently doing web-scraping for the first time trying to grab and compile a list of completed Katas from my CodeWars profile. You can view the completed problems without being logged in but it does not display your solutions unless you have logged in to that specific account.
Here is an inspect preview of the page display when logged in and the relevant divs I'm trying to scrape:
The url for that page is https://www.codewars.com/users/User_Name/completed_solutions
with User_Name
replaced by an actual username.
The log-in page is: https://www.codewars.com/users/sign_in
I have attempted to get the divs with the class "list-item solutions" in two different ways now which I'll write:
#attempt 1
import requests
from bs4 import BeautifulSoup
login_url = "https://www.codewars.com/users/sign_in"
end_url = "https://www.codewars.com/users/Ash-Ozen/completed_solutions"
with requests.session() as sesh:
result = sesh.get(login_url)
soup = BeautifulSoup(result.content, "html.parser")
token = soup.find("input", {"name": "authenticity_token"})["value"]
payload = {
"user[email]": "ph@gmail.com",
"user[password]": "phpass>",
"authenticity_token": str(token),
}
result = sesh.post(login_url, data=payload) #this logs me in?
page = sesh.get(end_url) #This navigates me to the target page?
soup = BeautifulSoup(page.content, "html.parser")
print(soup.prettify()) # some debugging
# Examining the print statement shows that the "list-item solutions" is not
# there. Checking page.url shows the correct url(https://www.codewars.com/users/Ash-Ozen/completed_solutions).
solutions = soup.findAll("div", class_="list-item solutions")
# solutions yields an empty list.
and
#attempt 2
from robobrowser import RoboBrowser
from bs4 import BeautifulSoup
browser = RoboBrowser(history=True)
browser.open("https://www.codewars.com/users/sign_in")
form = browser.get_form()
form["user[email]"].value = "phmail@gmail.com"
form["user[password]"].value = "phpass"
browser.submit_form(form) #think robobrowser handles the crfs token for me?
browser.open("https://www.codewars.com/users/Ash-Ozen/completed_solutions")
r = browser.parsed()
soup = BeautifulSoup(str(r[0]), "html.parser")
solutions = soup.find_all("div", class_="list-item solutions")
print(solutions) # returns empty list
No idea how/what to debug from here to get it working.
Edit: My initial thoughts about what is going wrong is that, after performing either post I get redirected to the dashboard (behavior after logging in successfully) but it seems that when trying to get the final url I end up with the non-logged-in version of the page.