Response [400] when use file for parsing in python

Question

It is OK (response [200]) when I try to parse with manual texting but when I change the input from a file it becomes response [400].

This the code

import requests
from bs4 import BeautifulSoup

def people_spider():
    file = "D:\OneDrive\Documents\GPIP\Files\scraping\idtwitter.csv"
    dataset = open(file, "r")
    for account in dataset:
        href = 'https://twitter.com/' + account
        get_single_item_data(href)

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    print(source_code)
    print(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, features='html.parser')
    for item_name in soup.findAll('p', {'dir': 'ltr'}):
        print(item_name.string)


people_spider()

and the result is

<Response [400]>
https://twitter.com/mr_adhani

<Response [400]>
https://twitter.com/RahayuNarti

<Response [400]>
https://twitter.com/AllMicroJobs

<Response [400]>
https://twitter.com/adibambang05

<Response [400]>
https://twitter.com/NatasyaRD1

<Response [400]>
https://twitter.com/arumyuniadis

<Response [400]>
https://twitter.com/harusan_osk

<Response [400]>
https://twitter.com/LailyFauziana

<Response [400]>
https://twitter.com/Dovia_Liata707

<Response [400]>
https://twitter.com/hapzah_putry

I have changed the extension too. However, it does not change any situation

Response 400 corresponds to a bad HTTP request. You may want to check if you're creating the right request object or not. Also when you iterate over files like this, python won't remove the lingering newline character from the "account" variable. — Kartik Anand, Dec 26 '18 at 05:11

Reza Torkaman Ahmadi · Answer 1 · 2018-12-26T05:54:24.230

0

the problem is that you are not stripping account variable.

def people_spider():
    file = "D:\OneDrive\Documents\GPIP\Files\scraping\idtwitter.csv"
    dataset = open(file, "r")
    print(dataset)
    for account in dataset:
        href = 'https://twitter.com/' + account.strip()
        get_single_item_data(href)

edited Dec 26 '18 at 05:54

answered Dec 26 '18 at 05:11

Reza Torkaman Ahmadi

2,958
2
20
43

The way OP is reading lines also works -> https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects – Kartik Anand Dec 26 '18 at 05:14
sorry, the problem still goes on. – Samudra Ajri Kifli Dec 26 '18 at 05:36
1

I updated my response. it's because you are not stripping `account`. Checked it, it's working – Reza Torkaman Ahmadi Dec 26 '18 at 05:55

Response [400] when use file for parsing in python

1 Answers1