Write cleaned BS4 data to csv file

Question

from selenium import webdriver

from bs4 import BeautifulSoup

import csv

chrome_path = r"C:\Users\chromedriver_win32\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get('http://www.yell.com')

search = driver.find_element_by_id("search_keyword")

search.send_keys("plumbers")

place = driver.find_element_by_id("search_location")

place.send_keys("London")

driver.find_element_by_xpath("""//*[@id="searchBoxForm"]/fieldset/div[1]/div[3]/button""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

for names in soup.find_all("span", {"class": "businessCapsule--name"}):
    print(names.text)

Output = soup.find_all("span", {"class": "businessCapsule--name"})

with open('comple16.csv', 'w') as csv_file:
    csv.register_dialect('custom', delimiter='\n', quoting=csv.QUOTE_NONE, escapechar='\\')
    writer = csv.writer(csv_file, 'custom')
    row = Output
    writer.writerow(row)

Currently the code is producing this in the csv file = class": "businessCapsule-- (scraped text)

I would like to only print the scraped text into the CSV file (without the tags)

Please help.

Currently the output is as follow: PLUMBERS 4 LESS - Boiler Repair/Replacement, Plumbing & Heating I would like it to be: PLUMBERS 4 LESS - Boiler Repair/Replacement, Plumbing & Heating. Basically only the text, without the html tag which BS4 used to find it. — Felly0, Feb 18 '20 at 18:34
Can you be more specific about what the issue is? What have you done to try to fix it? It's especially confusing since you write _I would like to only print the scraped text into the CSV file (without the tags)_, yet in the code you're clearly already extracting the text to print it to stdout, in `for names in soup.find_all("span", {"class": "businessCapsule--name"}): print(names.text)`. — AMC, Feb 26 '20 at 19:38
Never mind the fact that, according what you wrote, this is clearly a duplicate of https://stackoverflow.com/questions/23380171/using-beautifulsoup-to-extract-text-without-tags. — AMC, Feb 26 '20 at 19:42

TonyBKK · Answer 1 · 2020-02-18T19:03:41.337

from selenium import webdriver

from bs4 import BeautifulSoup`

import csv

chrome_path = r"C:\Users\chromedriver_win32\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get('http://www.yell.com')

search = driver.find_element_by_id("search_keyword")

search.send_keys("plumbers")

place = driver.find_element_by_id("search_location")

place.send_keys("London")

driver.find_element_by_xpath("""//*[@id="searchBoxForm"]/fieldset/div[1]/div[3]/button""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

Output = []
for names in soup.find_all("span", {"class": "businessCapsule--name"}):
    Output.append(names.text)

with open('comple16.csv', 'w') as csv_file:
    csv.register_dialect('custom', delimiter='\n', quoting=csv.QUOTE_NONE, escapechar='\\')
    writer = csv.writer(csv_file, 'custom')
    row = Output
    writer.writerow(row)

It would help if you explained the difference you made to the code that fixed the problem. — ljetibo, Feb 18 '20 at 19:13

score 0 · Answer 2 · answered Feb 18 '20 at 18:51

0

After:

Output = soup.find_all("span", {"class": "businessCapsule--name"})

add:

Output = [row.text for row in Output]

in order to extract text from SPAN fields.

answered Feb 18 '20 at 18:51

Djordje

124
7

Write cleaned BS4 data to csv file

2 Answers2