I have a large data frame that is taking to long to compute a for loop, I've tried removing all computations to time the for loop but I still have an inefficient code. I'm new to R but I think there should be a better way of coding my for loop. If you could provide some guidance it would be appreciated.
My dataFrame has 2,772,807 obs of 6 variables.
Simplified code (Still takes long):
library("tictoc")
tic()
dataFlights <- read_delim("U.S._DOT_O&D_Monthly_Traffic_Report.tsv",
"\t", escape_double = FALSE, trim_ws = TRUE)
dataFlights["Connections"] = ""
pb <- txtProgressBar(min = 0, max = nrow(dataFlights), style = 3)
for (row in 1:nrow(dataFlights)) {
dataFlights[row,7] <- 1
setTxtProgressBar(pb, row)
}
close(pb)
toc()
Original Code:
#Reads DOT public flight information for 2017 & 2018,
#and computes the number of connections
#per route (Cp#1 or Cp#2) into a new column. Possible results 0,1, or 2 connections.
library("tictoc")
tic()
dataFlights <- read_delim("U.S._DOT_O&D_Monthly_Traffic_Report.tsv",
"\t", escape_double = FALSE, trim_ws = TRUE)
dataFlights["Connections"] = ""
pb <- txtProgressBar(min = 0, max = nrow(dataFlights), style = 3)
for (row in 1:nrow(dataFlights)) {
if(is.na(dataFlights[row,2]) & is.na(dataFlights[row,3])){
dataFlights[row,7] <- 0
} else if (is.na(dataFlights[row,2]) | is.na(dataFlights[row,3])) {
dataFlights[row,7] <- 1
} else {
dataFlights[row,7] <- 2
}
setTxtProgressBar(pb, row)
}
close(pb)
toc()