-2

I'm getting the error below when I'm parsing the xml from the URL in the code. the code is tring to scrap data from the website and and put them in varibels but the is a problem in parsing it that i cant findout why

Error:

'NoneType' object has no attribute 'text'

my code :

from bs4 import BeautifulSoup
from Database import Insert_Table
import re
import requests
import time
import datetime
import csv

head = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:104.0) Gecko/20100101 Firefox/104.0 Chrome/92.0.4515.159 Safari/537.36'}
today = datetime.date.today()
table_name = 'dataset'
table_key = ['model', 'mileage', 'age','color', 'accident', 'owners','price']
csv_file = open(f'results-{today.strftime("%y-%d-%m")}.csv', 'w')
csv.writer = csv.writer(csv_file)
csv.writer.writerow(table_key)

i = 1
while i <= 312:
    if i==100 or i==200 or i==300:
        time.sleep(300)
    else:
        pass

    response = requests.get(f'https://www.truecar.com/used-cars-for-sale/listings/location-irving-tx/',headers=head, params={'page': i}, timeout=30)
    if response.ok == True:
        print(response.url)
        soup = BeautifulSoup(response.text, 'lxml')
        for post in soup.find_all('div', attrs={"data-test": "cardContent"}):
            try:
                heading = post.find('div', class_="vehicle-card-top")
                v_model = heading.find('span', class_="vehicle-header-make-model text-truncate")
                v_model = v_model.text
                v_year = heading.find('span', class_="vehicle-card-year font-size-1")
                v_age = today.year - int(v_year.text)
                v_mileage = post.find('div', attrs={"data-test": "vehicleMileage"})
                mileage = re.match(r'(.+)\smiles', v_mileage.text)
                mileage = int(mileage.group(1).replace(',', ''))
                v_color = post.find('div', attrs={"data-test": "vehicleCardColors"})
                v_color = re.match(r'(.+)\sexterior', v_color.text)
                v_color = v_color.group(1)
                v_condition = post.find('div', attrs={"data-test": "vehicleCardCondition"})
                condition = re.match(r'(.+)\saccident[s]?.*(.+)\sOwner[s]?', v_condition.text)
                if condition.group(1) == 'No':
                    accident = 0
                else:
                    accident = int(condition.group(1))
                owners = int(condition.group(2))
                v_price = post.find('div', attrs={"data-test": "vehicleListingPriceAmount"})
                price = re.match(r'\$(.+)', v_price.text)
                price = int(price.group(1).replace(',', ''))
                values = [v_model.casefold(), mileage, v_age, v_color, accident, owners, price]
                csv.writer.writerow(values)
                Insert_Table(table_name, table_key, values)
            except Exception as err:
                print(err)
        i += 1
        time.sleep(1)
    else:
        time.sleep(60)

csv_file.close()

thanks

  • could you include the part of the stacktrace that contains the line number where the error was thrown in your script? – tom Sep 12 '22 at 12:22
  • Does this answer your question? [Why do I get AttributeError: 'NoneType' object has no attribute 'something'?](https://stackoverflow.com/questions/8949252/why-do-i-get-attributeerror-nonetype-object-has-no-attribute-something) – SitiSchu Sep 12 '22 at 12:24
  • @tom unfortunate it only gives the eror above, but i think its for the part where it tries to `soup = BeautifulSoup(response.text, 'lxml'` – amirreza es Sep 12 '22 at 20:07

1 Answers1

0

In my opinion, you have greatly complicated the task. And it's much easier to get the information you need through the api. Here is an example for the first five pages:

import pandas as pd
import requests


cars = []
for page in range(1, 5):
    url = f"https://www.truecar.com/abp/api/vehicles/used/listings?city=irving&collapse=true&fallback=true&include_incentives=true&include_seo\\[\\]=canonicalize_body_styles&include_seo\\[\\]=years&include_seo\\[\\]=inventory_summary&include_seo\\[\\]=faqs&include_targeted_incentives=true&new_or_used=u&page={page}&per_page=30&search_event=true&sort\\[\\]=best_match&sponsored=true&state=tx"
    response = requests.get(url)
    for vehicle in response.json()['listings']:
        car = {
            "Make": vehicle['vehicle']['make'],
            "Model": vehicle['vehicle']['model'],
            "Mileage": vehicle['vehicle']['mileage'],
            "Year": vehicle['vehicle']['year'],
            "Exterior Color": vehicle['vehicle']['exterior_color'],
            "Interior Color": vehicle['vehicle']['interior_color'],
            "Owners": vehicle['vehicle']['condition_history']['ownerCount'],
            "Accident": vehicle['vehicle']['condition_history']['accidentCount'],
            "Price": vehicle['vehicle']['list_price']
        }
        cars.append(car)
df = pd.DataFrame(cars)
print(df)

And you can save it to CSV - df.to_csv('filename.csv')

OUTPUT:

           Make             Model  Mileage  ...  Owners Accident    Price
0          Ford             F-150   147652  ...       2        0  16990.0
1          Ford  Super Duty F-250   113347  ...       1        0  24990.0
2          Ford             F-150   177517  ...       2        0   9990.0
3    Volkswagen             Jetta    36830  ...       1        0  20642.0
4        Toyota           Corolla    35289  ...       1        0  21994.0
..          ...               ...      ...  ...     ...      ...      ...
115      Nissan            Sentra    36376  ...       1        0  20425.0
116    Chrysler               300   123812  ...       2        1  18998.0
117        Ford            Escape    83506  ...       1        0  11999.0
118        Ford             F-150   142299  ...       2        0  21795.0
119     Hyundai           Elantra    55537  ...       1        0  18297.0
Sergey K
  • 1,329
  • 1
  • 7
  • 15
  • thanks, but i cant really fuger out whats wrong with my code:( – amirreza es Sep 12 '22 at 20:08
  • @amirrezaes you are trying to get a text from an object that is Null. Perhaps the page you are visiting does not exist, or you get a temporary block because you do not use the api, or you simply get into a timeout and the page does not have time to load. You have an incorrect logic for obtaining information, if at least one field cannot be obtained, you will not receive any information about the car at all – Sergey K Sep 13 '22 at 06:09