I have a huge .CSV file with more than 3000 columns and need to load these into the database and because the table limitation is 1024 columns I want to split these .CSV files into multiple files of 1024 or less columns.
So far what I have tried with the help of previous questions regarding this topic -
Python -
import csv
import file
input import os
source = 'H:\\Programs\\Exploratory Items\\Outdated\\2018\\csvs\\' for root, dirs, filenames in os.walk(source):
for f in filenames:
fullpath = os.path.join(source, f)
output_specifications = ( (fullpath[:-4] + 'pt1.csv', slice(1000)
(fullpath[:-4] + 'pt2.csv', slice(1000, 2000)),
(fullpath[:-4] + 'pt3.csv', slice(2000, 3000)),
(fullpath[:-4] + 'pt4.csv', slice(3000, 4000)),
(fullpath[:-4] + 'pt5.csv', slice(4000, 5000)), )
output_row_writers = [
(
csv.writer(open(file_name, 'wb'),
quoting=csv.QUOTE_MINIMAL).writerow, selector,
) for file_name, selector in output_specifications ]
reader = csv.reader(fileinput.input())
for row in reader:
for row_writer, selector in output_row_writers: row_writer(row[selector])
The issue with the above python code is it takes forever to split and write these files because it writes by row is my understanding. Not ideal for my case as I have more than 200 .CSV files with 1000+ rows in each.
Trying now -
-cut command (POSIX) but I use Windows so will try this on Ubuntu platform.
Want to try this in R:
I have a code that converts all my SPSS to .csv which works efficiently so I want to add more to this at this stage so it can split my file by column into multiple .csvs.
setwd("H:\\Programs\\2018")
getwd()
list.files()
files <- list.files(path = '.', pattern = '.sav')
library(foreign)
for (f in files) { #iterate over them
data <- read.spss(f, to.data.frame = TRUE, use.value.labels = FALSE )
write.csv (data, paste0(strsplit(f, split = '.', fixed = T)[[1]][1], '.csv'))
}
Thank you
References - Python code ref