I have a df with multiple variables, some are very long strings with up to 4500 characters. I would like to export this database as a .dta
file.
I try to save it using haven's write_dta()
function, but I get the following error message: Error in write_dta_(data, normalizePath(path, mustWork = FALSE), version = stata_file_format(version), : Writing failure: A provided string value was longer than the available storage size of the specified column.
Here is an example of the issue:
library(haven)
longFun <- function(n) {
do.call(paste0, replicate(5000, sample(LETTERS, n, TRUE), FALSE))
}
longString <- data.frame(VeryveryveryveryveryveryveryveryveryveryVeryveryveryveryveryveryveryveryveryverylongname = longFun(1), stringsAsFactors = F)
write_dta(longString,"tst.dta")
I am aware that write_dta
has issues handling long strings (https://github.com/tidyverse/haven/issues/437) and that one possibility is to trim the strings (Error in write_dta : A provided string value was longer than the available storage size of the specified column). But it is essential for me to keep the full strings.
Is there any way to save variables with long strings as .dta
files using R?
Edit:
I have tried the readstata13::save.dta13
option suggested by @jay.sf but this has two issues: 1) Is not able to manage - i.e. it truncates - long variable names above 32-UTF characters, that write_dta()
manages well. 2) It is significantly slower than write_dta()
. Given that I have to save a very large dataset this is a relevant concern.
In sum is there any other approach that would allow me to
a) save as .dta a df with very long strings
b) retain original variable names (longer than 32-UTF characters)
c) do this in a relatively fast manner.