How to calculate difference between rows of csv file with python

Question

I am new to Python. I would like to do the difference between two rows of a csv file when they have the same id. This csv dataset is built from an sql table export which has more than 3 millions rows.

This is an example on how my timeserie's dataset looks like :

DATE -  Product ID - PRICE 

26/08  - 1 -  4
26/08 - 2 - 3
27/08 - 1 - 5
27/08 - 2 - 3

For instance I would like to calculate the difference between the price of the product with id 1 on the 26/08 and the price of this same product on the next day (27/08) to estimate the price's variation over time. I wondered what could be the best way to manipulate and do calculation over these datas in Python to do my calculations, whether with Python's csv module or with SQL queries in the code. I also heard of Pandas library... Thanks for your help !

This question appears to be off-topic because Stack Overflow is not a code-writing service. — chrisaycock, Aug 26 '14 at 16:57
You could start by checking out the python csv module https://docs.python.org/2/library/csv.html — asdf, Aug 26 '14 at 16:57
Could you please review the few edits I've made to your question. In the same time, you should try to improve it by taking the various comments above into account (BTW, are the "empty lines" parts of your data?) — Sylvain Leroux, Aug 26 '14 at 17:19

gkusner · Answer 1 · 2014-08-26T17:30:21.937

0

try building a dictionary by product id and analyzing each id after loading

dd = {}
with open('prod.csv', 'rb') as csvf:
    csvr = csv.reader(csvf, delimiter='-')
    for row in csvr:
        if if len(row) == 0 or row[0].startswith('DATE'):
            continue
        dd.setdefault(int(row[1]), []).append((row[0].strip(), int(row[2])))

dd

{1: [('26/08', 4), ('27/08', 5)], 
 2: [('26/08', 3), ('27/08', 3)]}

this will make it pretty easy to do comparisons

edited Aug 26 '14 at 17:30

answered Aug 26 '14 at 17:20

gkusner

1,244
1
11
14

This is a great tip, I tried it on a little chunk of my dataset and it worked. But if I try to do it on my whole dataset which has 3 millions rows python is stuck, I think it is a memory issue. – Augustin Lafanechère Aug 27 '14 at 11:51

How to calculate difference between rows of csv file with python

1 Answers1