-1
from selenium import webdriver

from bs4 import BeautifulSoup

import csv

chrome_path = r"C:\Users\chromedriver_win32\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get('http://www.yell.com')

search = driver.find_element_by_id("search_keyword")

search.send_keys("plumbers")

place = driver.find_element_by_id("search_location")

place.send_keys("London")

driver.find_element_by_xpath("""//*[@id="searchBoxForm"]/fieldset/div[1]/div[3]/button""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

for names in soup.find_all("span", {"class": "businessCapsule--name"}):
    print(names.text)

Output = soup.find_all("span", {"class": "businessCapsule--name"})

with open('comple16.csv', 'w') as csv_file:
    csv.register_dialect('custom', delimiter='\n', quoting=csv.QUOTE_NONE, escapechar='\\')
    writer = csv.writer(csv_file, 'custom')
    row = Output
    writer.writerow(row)

Currently the code is producing this in the csv file = class": "businessCapsule-- (scraped text)

I would like to only print the scraped text into the CSV file (without the tags)

Please help.

Eduards
  • 1,734
  • 2
  • 12
  • 37
Felly0
  • 1
  • 3
  • 1
    Can you give an example of expected row in the csv? – 0buz Feb 18 '20 at 18:31
  • Currently the output is as follow: PLUMBERS 4 LESS - Boiler Repair/Replacement, Plumbing & Heating I would like it to be: PLUMBERS 4 LESS - Boiler Repair/Replacement, Plumbing & Heating. Basically only the text, without the html tag which BS4 used to find it. – Felly0 Feb 18 '20 at 18:34
  • Can you be more specific about what the issue is? What have you done to try to fix it? It's especially confusing since you write _I would like to only print the scraped text into the CSV file (without the tags)_, yet in the code you're clearly already extracting the text to print it to stdout, in `for names in soup.find_all("span", {"class": "businessCapsule--name"}): print(names.text)`. – AMC Feb 26 '20 at 19:38
  • Never mind the fact that, according what you wrote, this is clearly a duplicate of https://stackoverflow.com/questions/23380171/using-beautifulsoup-to-extract-text-without-tags. – AMC Feb 26 '20 at 19:42

2 Answers2

1
from selenium import webdriver

from bs4 import BeautifulSoup`

import csv

chrome_path = r"C:\Users\chromedriver_win32\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get('http://www.yell.com')

search = driver.find_element_by_id("search_keyword")

search.send_keys("plumbers")

place = driver.find_element_by_id("search_location")

place.send_keys("London")

driver.find_element_by_xpath("""//*[@id="searchBoxForm"]/fieldset/div[1]/div[3]/button""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

Output = []
for names in soup.find_all("span", {"class": "businessCapsule--name"}):
    Output.append(names.text)

with open('comple16.csv', 'w') as csv_file:
    csv.register_dialect('custom', delimiter='\n', quoting=csv.QUOTE_NONE, escapechar='\\')
    writer = csv.writer(csv_file, 'custom')
    row = Output
    writer.writerow(row)
TonyBKK
  • 11
  • 2
0

After:

Output = soup.find_all("span", {"class": "businessCapsule--name"})

add:

Output = [row.text for row in Output]

in order to extract text from SPAN fields.

Djordje
  • 124
  • 7