0

I have two scripts in python:

login >> go to the website, login using login form and store the cookies into JSON file for later use

import json
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch(slow_mo=50)
    context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36')
    page = context.new_page()
    page.goto('https://www.url.us/signin')
    try:
        page.wait_for_selector('#signInFormPage input[name="userName"]', state='visible')
        page.type('#signInFormPage input[name="userName"]', "aaa")
        page.type('#signInFormPage input[name="password"]', "aa")
        page.click('#userNamePasswordSignInButton')
        page.wait_for_timeout(3000)
        cookies = context.cookies()
        page.wait_for_timeout(10000)
        f = open('./cookies.json', 'w')
        f.write(json.dumps(cookies))
        page.close()
        context.close()
        browser.close()             
    except Exception as e:
        print("Error in playwright script.")
        page.close()
        context.close()
        browser.close() 

This script is working good. The second script is get the stored cookies from file and print page source of the other pages for the same website:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False, slow_mo=50)
    context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36')
    page = context.new_page()
    cookie_file = open('./cookies.json')
    cookies = json.load(cookie_file)
    context.add_cookies(cookies)
    page.goto('https://www.url.us/Product/10aaa')
    try:
        page.wait_for_timeout(6000)
        print(page.content())
        page.close()
    except Exception as e:
        print("Error in playwright script.")
        page.close()

And this script is working good too.

But the issue is this website has API for some information that I want to pull out and that Infos are not available through the page source visible as the front end user. SO when I put the API link in the second link I receive the empty JSON page. Those API requests are using token value, but since I'm using the cookies for getting the page source I don't have a token. I use those scripts because it was the only way o get through the Cloudflare protection that this website has. Is there some way that I can for example use the requests module with the combination of playwright module? Or any other suggestions that can be helpful for this situation, how I can get the JSON page using cookies?

updated code using Persistent context:

1script:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch_persistent_context(r'C:\Users\test\Downloads\pyyy', headless=False)
    page = browser.new_page()
    page.goto('https://www.url.us/signin')
    try:
        page.wait_for_selector('#signInFormPage input[name="userName"]', state='visible')
        page.type('#signInFormPage input[name="userName"]', "aaaaa")
        page.type('#signInFormPage input[name="password"]', "aaaa")
        page.click('#userNamePasswordSignInButton')
        page.wait_for_timeout(3000)
        page.close()
    except Exception as e:
        print("Error in playwright script.")
        page.close()

2:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch_persistent_context(r'C:\Users\test\Downloads\pyyy', headless=False)
    page = browser.new_page()
    page.goto('https://www.url.us/Product/aaa')
    try:
        page.wait_for_timeout(6000)
        print(page.content())
        page.close()
    except Exception as e:
        print("Error in playwright script.")
        page.close()

1 Answers1

0

Instead of saving and loading the cookies, I would launch a Persistent context. This persistent context will preserve the information in the user_data_diryou provide.

hardkoded
  • 18,915
  • 3
  • 52
  • 64
  • Please check the updated code in my comment, I tried this but the second script gives me a page as non logged user, so did I do something wrong? –  Apr 27 '21 at 11:58
  • Are you sure that `page.wait_for_timeout(3000)` is enough to get the cookie there? – hardkoded Apr 27 '21 at 12:01
  • I put (13000) and still its the same –  Apr 27 '21 at 12:04