The goal is to write csv files for each Invoice from a webpage. I'm trying to do this with a webscraper, mainly using selenium
Each Invoice has its own number, date, close_date, amount, and list of Records
Each Record in an Invoice has its own id, description, storage, weight, price, and quantity
I was able to successfully print out to the console all of the data I need. Like so:
Going to: https://thewebsite/thing/my_account.whatever?is=checkout#invoices/429807/paid-invoices
Extracting...
------------------------------
ID: 30795 Description: YOGURT, BLUEBERRY, LOW FAT, DANNON
Storage: 35 Degree Cooler Weight: 110 Price: $0.00 Quantity: 22
------------------------------
ID: 86546 Description: SWEET POTATOES, P/L
Storage: 55 Degree Cooler Weight: 240 Price: $0.00 Quantity: 6
------------------------------
ID: 36446 Description: PINEAPPLE, FRESH, P/L
Storage: 55 Degree Cooler Weight: 560 Price: $0.00 Quantity: 20
I did this with these:
class myRecord(object):
id = ""
description = ""
storage = ""
weight = ""
price = ""
quantity = ""
def _init_(self, id, description, storage, weight, price, quantity):
self.id = id
self.description = description
self.storage = storage
self.weight = weight
self.price = price
self.quantity = quantity
class myInvoice(object):
number = ""
date = ""
close_date = ""
amount = ""
def _init_(self, number, date, close_date, amount, formatted_records_list = None):
self.number = number
self.date = date
self.close_date = close_date
self.amount = amount
if formatted_records_list is None:
self.formatted_records_list = []
else:
self.formatted_records_list = formatted_records_list
I assigned values to each attribute from html elements like this (I'll just use the "number" attribute as an example)
invoice_number_list = []
invoice_number_list = browser.find_elements_by_class_name("tranid")
i = 0
for invoice_link in invoice_link_list: #invoice links are basically urls to each invoice
invoice = myInvoice()
invoice.number = invoice_number_list[i].get_attribute('innerHTML')
i += 1
From what I've seen online, it's not super obvious how to make a csv file out of the objects I used
I found this: Writing list of objects to csv file
That guy basically says I should use namedtuples, which to my understanding are kind of like stripped-down objects made on a budget. With those, I (should) have an easier time making csv files. So I made this:
Record = namedtuple('Record', ['id', 'description', 'storage', 'weight', 'price', 'quantity'])
Invoice = namedtuple('Invoice', ['number', 'date', 'close_date', 'amount', 'Record_list'])
Already alarm bells are going off. Can I have a list of namedtuples be an attribute for a namedtuple? I need one csv file per Invoice. Each invoice has only one number, date, close_date, and amount. However, it can have a ton of Records. My thought process is telling me I need to have a list of Records attached to each Invoice.
I tried assigning values to a Invoice that is a namedtuple and had trouble.
Invoice_number_list = []
invoice_number_list = browser.find_elements_by_class_name("tranid")
i = 0
for Invoice_link in Invoice_link_list:
#Invoice.number = Invoice_number_list[i].get_attribute('innerHTML') #doesn't work
Invoice_list.extend(Invoice._make((Invoice_number_list[i].get_attribute('innerHTML'), None, None, None, None)))
i +=0
The other Invoice values of date, close_date, and amount go into index [1], [2], and [3]. I leave [4] as None since that's where the list of Records for the Invoice should go.
The "extend()" ends up making my Invoice list into a string, which looks like it could be useful for making a dictionary (which I might need if I make a csv the hard way - I think with the right namedtuple it's almost as easy as saying "write this data to a csv"), but I need to be able to attach a list of Records to each individual Invoice - I can't do that with a string.
Here are what I think my options are:
- make Invoice a regular object and make Record a regular object -> make csv out of these
- make Invoice a namedtuple and Record a namedtuple ->make csv out of these
- make one a namedtuple and the other a regular object -> make csv out of these
Nothing is glaringly obvious to me at the moment.
TL;DR: I'm trying to figure out how to write csv files from data. Should I stick with trying to make csvs out of namedtuples, or try and figure out how to do it with object attributes instead? How would I do either?