I replaced Ubuntu 20.04 with Fedora 37 on my laptop (clean install, 16 GB RAM) to follow my lab's standard and, curiously, readr
doesn't work with a 6.7 GB csv file in this specific case (it crashes RStudio). What can explain this? readr
worked with Ubuntu.
library(archive)
url <- "https://www.usitc.gov/data/gravity/itpd_e/itpd_e_r02.zip"
zip <- gsub(".*/", "", url)
if (!file.exists(zip)) {
try(download.file(url, zip, method = "wget", quiet = T))
}
if (!length(list.files(getwd(), pattern = "ITPD_E_R02\\.csv")) == 1) {
archive_extract(zip, dir = getwd())
}
# this will crash RStudio
# trade <- readr::read_csv("/ITPD_E_R02.csv")
# this won't
trade <- data.table::fread("/ITPD_E_R02.csv")
free memory
$ free -m
total used free shared buff/cache available
Mem: 15699 4332 1106 1032 10259 9957
Swap: 8191 9 8182
session info
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 37 (Workstation Edition)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.3
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
[4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] data.table_1.14.8 readr_2.1.4 archive_1.1.5
loaded via a namespace (and not attached):
[1] fansi_1.0.4 tzdb_0.3.0 utf8_1.2.3 R6_2.5.1 lifecycle_1.0.3
[6] magrittr_2.0.3 pillar_1.8.1 rlang_1.0.6 cli_3.6.0 rstudioapi_0.14
[11] ellipsis_0.3.2 vctrs_0.5.2 tools_4.2.2 glue_1.6.2 hms_1.1.2
[16] compiler_4.2.2 pkgconfig_2.0.3 CoprManager_0.5.0 tibble_3.1.8