I am trying to split a CSV file containing stock data of 1500+ companies. The first column contains dates and subsequent columns contain company data.
Goal 1: I'm trying to split the huge CSV file into smaller pieces. Let's say 30 companies per smaller file. To do this, I need to split the CSV by column number, not rows. I've been looking up code snippets but I haven't found anything that does this exactly. Also, each separate file would need to contain the first column, i.e. the dates.
Goal 2: I want to make the company name a column of its own, the date a column of its own and the indicators columns of their own. So, I can call the data for a company as a single record (row) in Django - I don't need all the dates, just the last day of every quarter. Right now, I'm having to filter the data by date and indicator and set that as an object to display in my frontend.
If you have questions, just ask.
EDIT:
Following is some code I patched together.
import os
import csv
from math import floor
from datetime import datetime
import re
class SimFinDataset:
def __init__(self, dataFilePath, csvDelimiter = "semicolon"):
self.numIndicators = None
self.numCompanies = 1
# load data
self.loadData(dataFilePath, csvDelimiter)
def loadData(self, filePath, delimiter):
numRow = 0
delimiterChar = ";" if delimiter == "semicolon" else ","
csvfile = open(filePath, 'rb')
reader = csv.reader(csvfile, delimiter=delimiterChar, quotechar='"')
header = next(reader)
row_count = sum(1 for _ in reader)
csvfile.seek(0)
for row in reader:
numRow += 1
if numRow > 1 and numRow != row_count and numRow != row_count-1:
# company id row
if numRow == 2:
rowLen = len(row)
idVal = None
for index, columnVal in enumerate(row):
if index > 0:
if idVal is not None and idVal != columnVal:
self.numCompanies += 1
if self.numIndicators is None:
self.numIndicators = index - 1
if index + 1 == rowLen:
if self.numIndicators is None:
self.numIndicators = index
idVal = columnVal
if numRow > 2 and self.numIndicators is None:
return
else:
filename = 1
with open(str(filename) + '.csv', 'wb') as csvfile:
if self.numCompanies % 30 == 0:
print ("im working")
spamwriter = csv.writer(csvfile, delimiter=';')
spamwriter.writerow(header)
spamwriter.writerow(row)
filename += 1
#print (self.numIndicators)
dataset = SimFinDataset('new-data.csv','semicolon')