Python : big csv file import

Question

I'm currently unsuccessfully trying to import a big csv dataset with Python. Basically, I've got a big csv file made of stocks quotations (one stock by column with for each stock another column which contains the dividends). I'm using the csv Module but the fact is that I can't get a np.array which columns are the stocks quotations.Python creates a np.array by rows and I would like a np.array by column. How can I do??

thanks for you help!!

Python's `csv` module uses a generator to iterate over data. If you're trying to store an enormous dataset in a numpy array and that's failing, you may simply not have enough RAM. — David Cain, Jun 22 '12 at 08:49
how does your data look like? What did you try? How is it failing? — Shawn Chin, Jun 22 '12 at 08:59
You can create the numpy array by rows and then transpose it afterwards (`myarray = myarray.T`), or, if you're initializing the array with `np.array(columns)`, you can change it to `np.array(zip(*columns))`. — Lauritz V. Thaulow, Jun 22 '12 at 09:00
Does it read if you use a small csv dataset? if yes you should consider David comment — dilip kumbham, Jun 22 '12 at 09:01

score 2 · Answer 1 · answered Jun 22 '12 at 09:55

I would recommend using Pandas library. It also enables you to read big csv files by smaller chuncks. Here's an examle from the docs:

Data:

year indiv zit xit
0 1977 A 1.2 0.60
1 1977 B 1.5 0.50
2 1977 C 1.7 0.80
3 1978 A 0.2 0.06
4 1978 B 0.7 0.20
5 1978 C 0.8 0.30
6 1978 D 0.9 0.50

Specify chunk size (you get an iterable):

reader = read_table(’tmp.sv’, sep=’|’, chunksize=4)


for chunk in reader:
.....: print chunk

Output:

year indiv zit xit
0 1977 A 1.2 0.60
1 1977 B 1.5 0.50
2 1977 C 1.7 0.80
3 1978 A 0.2 0.06
year indiv zit xit
0 1978 B 0.7 0.2
1 1978 C 0.8 0.3
2 1978 D 0.9 0.5

NB! If you need to further manipulate your stock data, Pandas is the best way to go anyway.

score 0 · Answer 2 · answered Jun 22 '12 at 09:00

I have created small piece of function which doe take path of csv file read and return list of dict at once then you loop through list very easily,

def read_csv_data(path):
    """
        Reads CSV from given path and Return list of dict with Mapping
    """
    data = csv.reader(open(path))
    # Read the column names from the first line of the file
    fields = data.next()
    data_lines = []
    for row in data:
        items = dict(zip(fields, row))
        data_lines.append(items)
    return data_lines

May be this will help you

Regards

Thanks for youir answers. I can't get what I want so I gonna be more precise. Here is what my database looks like : — marino89, Jun 22 '12 at 09:29

score 0 · Answer 3 · edited Dec 27 '22 at 21:00

What you are looking for is ndarray.shape and ndarray.reshape functions.

Link

Otherwise, you can just simply read it the way you are, then do a transpose by doing

x = x.transpose()

where x is a ndarray.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.transpose.html

All of these small things are usually in the docs. I would suggest reading those first carefully.

Python : big csv file import

3 Answers3