I have being trying to import a huge .csv, with chunks and filters. But my code are just reading part of the archive (20 millions of 45 millions).
I also already tried to use data.table()
but without success.
arq_grande <- file("cnpj_dados_cadastrais_pj.csv", "r")
tam_chunk <- 5000
df1 <- read.csv(arq_grande, nrows = 10, header = T, sep = "#", dec = ".")
for(i in 1:ncol(df1)){df1[,i] <- df1[,i] %>% iconv(from = 'UTF-8', to = 'latin1')}
df_filtrado <- df1 %>% filter(codigo_natureza_juridica == c("2143","2330")) %>% select(cnpj,everything())
write.table(df_filtrado, "/cnpj_dados_cadastrais_pj_filtrado_coop.csv", row.names = F, sep = "#", dec = ".")
names(df1)
nrow <- 1
totalRows <- 0
repeat {
df <- read.csv(arq_grande, header=FALSE, sep="#", col.names = names(df1), nrows = tam_chunk)
for(i in 1:ncol(df)){df[,i] <- df[,i] %>% iconv(from = 'UTF-8', to = 'latin1')}
nRow = nrow(df)
totalRows <- totalRows + nRow
cat("Lendo", nrow(df), "linhas, total lido", totalRows, "\n")
if (nrow(df) == 0)
break
df_filtrado <- df %>% filter(codigo_natureza_juridica == c("2143","2330")) %>% select(cnpj,everything())
write.table(df_filtrado, "/cnpj_dados_cadastrais_pj_filtrado_coop.csv", append = T, col.names = F, row.names = F, sep = "#", dec = ".")
}
close(arq_grande)
I saw other exemples here, but nothing worked. Sorry, I'm new with this kind of data.
I just want to read all lines of my .csv.