0

I am new to Pyppeteer (Python) and I am trying to know how to (in order):

  1. log into the page
  2. clink a tag
  3. take the data from the tag which I have clinked

The website is 'https://quotes.toscrape.com/login'

I think I managed to solve the first part which is logging in. However, I have difficulties in the second and third.

Appreciate if someone can guide me via python examples on this. For example, clinking the Tags = 'inspirational' under the third quotes (by Einstein) and taking all the quotes from the 'inspirational' page.

import asyncio
import nest_asyncio
nest_asyncio.apply()
from pyppeteer import launch

username = 'AAA'
password = 'BBB'
 
async def main():
 #   browser = await launch(headless=False, args=['--user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'])
    browser = await launch(headless=False)
    page = await browser.newPage()
    await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36')
    await page.goto('https://quotes.toscrape.com/login',)
    
    await page.waitForSelector( '[id="username"]')
    await page.focus('[id="username"]')
    await page.keyboard.type(username)
    
    await page.waitForSelector( '[id="password"]')
    await page.focus('[id="password"]')
    await page.keyboard.type(password)
    
    await asyncio.wait([
    page.click('[type="submit"]'),
    page.waitForNavigation()])
    
    
    
asyncio.get_event_loop().run_until_complete(main())
Omerge
  • 69
  • 1
  • 7

1 Answers1

1

Add this to main()

 page.click('span.tag-item:nth-child(3) > a:nth-child(1)')
 quotelist = page.JJ(".quote") #alias to querySelectorAll()
 quotetext = quotelist.JJeval('.text', '(nodes => nodes.map(n => n.innerText))')
 return quotetext

I wrote this based on their docs here https://miyakogi.github.io/pyppeteer/reference.html#browser-class

Of course JS is a much better language to work with webpages, so for more comlicated stuff I'd use JS based web scrapers

Petr L.
  • 414
  • 5
  • 13
  • I got an error on the 'feedhandle' (name 'feedHandle' is not defined).. How do I resolve that ? – Omerge Aug 06 '21 at 06:57
  • @Omerge I am sorry, I forgot to rename the variable name. Try it now. – Petr L. Aug 06 '21 at 09:23
  • Hey thanks alot. Unfortunately, I still face an error... the error is 'coroutine' object has no attribute 'JJeval'... I believe there is an error start at 'page.JJ' as well... Is that a module I need to install for 'JJ' ? – Omerge Aug 06 '21 at 11:51
  • @Omerge the page.JJ is a function from pyppeteer module. – Petr L. Aug 06 '21 at 12:32
  • @Omerge How about now? If you want I can give you the JS code I started with. – Petr L. Aug 06 '21 at 19:50
  • Thanks. But no luck. I still face the same error. I have looked at the api document and I think your code make sense. fyi an 'await' is needed for the clink => await page.clink... in main() 31 32 quotelist = page.JJ(".quote") #alias to querySelectorAll() ---> 33 quotetext = quotelist.JJeval('.text', '(nodes => nodes.map(n => n.innerText))') 34 35 return quotetext AttributeError: 'coroutine' object has no attribute 'JJeval' – Omerge Aug 07 '21 at 02:49
  • The code once you're on the page: `var els = document.querySelectorAll("a[href='/tag/inspirational/']"); els[0].click(); let testss = document.querySelectorAll(".text"); for(x of tests){ console.log(x.innerText) }` – Petr L. Aug 07 '21 at 09:37
  • You might use the JS code with addScriptTag in pyppeteer – Petr L. Aug 07 '21 at 09:39
  • @Omerge So what the answer is? You can help others by updating your post. – Petr L. Aug 07 '21 at 13:38
  • I have used this instead: quotes = [] await page.waitForSelector('.quote') quoteElements = await page.querySelectorAll('.quote') for quoteElement in quoteElements: textElement = await quoteElement.querySelector('.text') text = await page.evaluate('el => el.textContent', textElement) quotes.append(text) return quotes – Omerge Aug 08 '21 at 00:23
  • Thanks. I have one more question on your CSS on page.click('span.tag-item:nth-child(3) > a:nth-child(1)').... I try to play around with the child(n) but I couldn't figure out the pattern where it locate. How does the logic work ? (for example getting tag 'Classic' under the fourth quote by Jane.... I hope I am not taking too much of your time... – Omerge Aug 08 '21 at 02:18
  • The easiest way is through browser devtools. Select the element you need in Inspector and right click, select css selector – Petr L. Aug 08 '21 at 09:52