2

I've written a script in python in combination with pyppeteer to scrape the names and its phone numbers of different coffe shops from a webpage. Although the way I tried below serves the purpose, the script looks real messy. What is the ideal way of creating for loops using pyppeteer library?

I've written so far:

import asyncio
from pyppeteer import launch

url = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA"

async def get_names(link):
    wb = await launch(headless=True)
    page = await wb.newPage()
    await page.goto(link)

    containers = await page.querySelectorAll('div.v-card')
    for container in containers:
      name = await container.querySelector('.business-name span')
      phone = await container.querySelector('.phones')
      post = await page.evaluate('(element) => element.textContent', name)
      postAno = await page.evaluate('(element) => element.textContent', phone)
      print(f'{post}--{postAno}')

    await wb.close()

asyncio.get_event_loop().run_until_complete(get_names(url))
robots.txt
  • 96
  • 2
  • 10
  • 36

2 Answers2

3

I would do like this:

import asyncio
from pyppeteer import launch

url = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA"

async def get_names(link):
    wb = await launch()
    page = await wb.newPage()
    await page.goto(link)

    containers = await page.querySelectorAll('div.v-card')
    for container in containers:
        name = await container.querySelectorEval('.business-name span','e => e.innerText')
        phone = await container.querySelectorEval('.phones','e => e.innerText')
        print(name,phone)

asyncio.get_event_loop().run_until_complete(get_names(url))
SIM
  • 21,997
  • 5
  • 37
  • 109
1

Try that:

import asyncio
from pyppeteer import launch

url = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA"

async def get_names(link):
    wb = await launch(headless=True)
    page = await wb.newPage()
    await page.goto(link)

    names = await page.querySelectorAllEval('div.v-card .business-name span',
                                 '(elements => elements.map(e => e.textContent))')
    phones = await page.querySelectorAllEval('div.v-card .phones', 
                                 '(elements => elements.map(e => e.textContent))')
    result = {name: phones[idx] for (idx, name) in enumerate(names)}
    print(result)
    await wb.close()

asyncio.get_event_loop().run_until_complete(get_names(url))

And read documentation: querySelectorAllEval

SIM
  • 21,997
  • 5
  • 37
  • 109
Crazy
  • 324
  • 2
  • 8
  • I'm getting this error `name = await container.querySelectorAllEval('.business-name span', AttributeError: 'ElementHandle' object has no attribute 'querySelectorAllEval'` @Nick when I execute your script. – robots.txt Nov 26 '18 at 12:55