Scrape the hyperlinks text in data frame and append it to a row in CSV

Question

I have a column of hyperlinks in a dataframe and I have to extract the text for sentimental analysis. I can read the text but I'm unable to proceed further, looping over the process and appending it to a file is what I am looking for.

import urllib.request
import requests
import csv
from bs4 import BeautifulSoup

quote_page = 'https://www.sec.gov/Archives/edgar/data/3662/0000950170-98-000413.txt'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page,'html.parser')
name_box = soup.find
print(name_box)
with open('index1.csv', 'a') as csv_file:
  writer = csv.writer(csv_file)
  writer.writerows([name_box])

Now, when I executed this, I got a CSV but the text was not in a row. So, What to do, and how to do this for each link in the dataframe.

What are you actually trying to achieve? What's your desired output? — zipa, May 07 '18 at 09:52
I want to extract text from all the links I have and get into the file for text analysis. — Mudit Gupta, May 07 '18 at 10:59

score 0 · Answer 1 · answered May 07 '18 at 12:04

We can write all the data into a .txt for analysis.

import urllib.request
import requests
import csv
from bs4 import BeautifulSoup
from time import sleep

quote_page = 'https://www.sec.gov/Archives/edgar/data/3662/0000950170-98-000413.txt'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page,'html.parser')
name_box = soup.find


with open('myfile1.txt', 'w+') as f:

    the_text = str(name_box)
    file = f.write(the_text)

To write it to a csv where each line is just a row in the text.

# if you really want to write it as a csv
with open('index1.csv', 'a+') as f:
    mydoc = csv.writer(f)
    for i in the_text.split('\n'):
        mydoc.writerow([i])

Scrape the hyperlinks text in data frame and append it to a row in CSV

1 Answers1