I have two scripts in python:
login >> go to the website, login using login form and store the cookies into JSON file for later use
import json
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(slow_mo=50)
context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36')
page = context.new_page()
page.goto('https://www.url.us/signin')
try:
page.wait_for_selector('#signInFormPage input[name="userName"]', state='visible')
page.type('#signInFormPage input[name="userName"]', "aaa")
page.type('#signInFormPage input[name="password"]', "aa")
page.click('#userNamePasswordSignInButton')
page.wait_for_timeout(3000)
cookies = context.cookies()
page.wait_for_timeout(10000)
f = open('./cookies.json', 'w')
f.write(json.dumps(cookies))
page.close()
context.close()
browser.close()
except Exception as e:
print("Error in playwright script.")
page.close()
context.close()
browser.close()
This script is working good. The second script is get the stored cookies from file and print page source of the other pages for the same website:
import json
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, slow_mo=50)
context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36')
page = context.new_page()
cookie_file = open('./cookies.json')
cookies = json.load(cookie_file)
context.add_cookies(cookies)
page.goto('https://www.url.us/Product/10aaa')
try:
page.wait_for_timeout(6000)
print(page.content())
page.close()
except Exception as e:
print("Error in playwright script.")
page.close()
And this script is working good too.
But the issue is this website has API for some information that I want to pull out and that Infos are not available through the page source visible as the front end user. SO when I put the API link in the second link I receive the empty JSON page. Those API requests are using token value, but since I'm using the cookies for getting the page source I don't have a token. I use those scripts because it was the only way o get through the Cloudflare protection that this website has. Is there some way that I can for example use the requests module with the combination of playwright module? Or any other suggestions that can be helpful for this situation, how I can get the JSON page using cookies?
updated code using Persistent context:
1script:
import json
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch_persistent_context(r'C:\Users\test\Downloads\pyyy', headless=False)
page = browser.new_page()
page.goto('https://www.url.us/signin')
try:
page.wait_for_selector('#signInFormPage input[name="userName"]', state='visible')
page.type('#signInFormPage input[name="userName"]', "aaaaa")
page.type('#signInFormPage input[name="password"]', "aaaa")
page.click('#userNamePasswordSignInButton')
page.wait_for_timeout(3000)
page.close()
except Exception as e:
print("Error in playwright script.")
page.close()
2:
import json
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch_persistent_context(r'C:\Users\test\Downloads\pyyy', headless=False)
page = browser.new_page()
page.goto('https://www.url.us/Product/aaa')
try:
page.wait_for_timeout(6000)
print(page.content())
page.close()
except Exception as e:
print("Error in playwright script.")
page.close()