The R package XLSX is converting entire column to string or boolean when one cell is not numeric

Question

I am using a shiney interface under R to read in a CSV file and load it into one sheet of an excel xlsm file. The file then allows user input and preforms calculations based on VBA macros.

The R xlsx package is working well for preserving the VBA and formatting in the original excel sheet. However some of the data is being converted to a different data type than intended. For example a cell containing the string "F" is causing the column containing it to be converted to type boolean, or a miss-entered number in one cell is causing the entire column to be converted to string.

Can this behavior be controlled so that, for example, cells with valid numbers are not converted to string type? Is there a work-around? Or can someone just help me to understand what is happening in the guts of the package to cause this effect so I can try to find a way around it?

Here are the calls in question:

#excelType() points to an excel xlsm template

data = read.csv("results.csv")
excelForm = loadWorkbook(excelType())
sheets = getSheets(excelForm)
addDataFrame(data, sheets[[1]], col.names = FALSE, row.names = FALSE, startRow=2, colStyle = NULL)
saveWorkbook(excelForm, "results.xlsm")

Thanks!

Not the answer you want, but that package [has not been updated in over three years](https://cran.r-project.org/web/packages/xlsx/index.html). Depending on your needs, I recommend [`openxlsx`](https://cran.r-project.org/web/packages/openxlsx/index.html) or [`readxl`](https://cran.r-project.org/web/packages/readxl/index.html). — r2evans, Oct 19 '17 at 23:54
Unfortunately no: if there are strings in a column, `xlsx` package (or any R package) will convert a column to a string type. You'll need to clean it up post import. See this general question about importing messy data from Excel to R: https://stackoverflow.com/questions/43242467/reading-excel-in-r-how-to-find-the-start-cell-in-messy-spreadsheets — lebelinoz, Oct 19 '17 at 23:55
In this case, the `F` is being interpreted not as a string but as the R-standard shortcut for `FALSE` (as you are likely aware). Typically the reading functions let you define the column classes manually, so if you know ahead of time, perhaps you can do that. — r2evans, Oct 19 '17 at 23:56
I've always used this function but yesterday, for the first time, I had the same issue... a column with 9 rows (e.g. 1,2,3,4,5,6,7,8,9), was converted to only one string "c(1,2,3,4,5,6,7,8,9)" in one row... couldn't solve that... I just saved as RData to solve latter — Thai, Oct 20 '17 at 00:56

score 0 · Answer 1 · answered Oct 22 '17 at 22:55

I hope this is the correct protocol for explaining the outcome which worked for me. I hope it will be of help to others if they end up doing something similar, though the solution is not very elegant!

I tried r2evans's suggestion of forcing column types I could not get that to work in this case. Using readxls gave the same problem, and also broke my VBA. Given lebelionz's comment suggesting that this is an R thing and not a package thing I followed his advice to deal with it after the fact. (I do not see how to credit a comment rather than an answer, but for the record this was very helpful, as were the others).

I therefore altered the program producing the CSV that was being loaded through R. I appended "::" to each cell produced, so that R saw all cells as strings, regardless of the original content. Thus "F" was stored as "::F", and therefore was not altered by R.

I added an autorun macro to the excel sheet thus created, so that when opened it automatically performed a global search and replace to remove the prefix "::" from the whole of the data. This forces Excel to choose a data type for each cell after it was restored, resulting in the types being detected cell by cell and in the correct format for my purposes.

It feels kludgy, but it works and is relatively transparent to the user. One hazard is that if the user data intentionally contained the string "::" it would be lost (I am confident this cannot arise in my particular application, but if someone would like to suggest a better prefix I would be interested). I still hope for an eventual solution rather than a work-around.

And here I thought it was only the movie industry that had to "fix it in post"!

The R package XLSX is converting entire column to string or boolean when one cell is not numeric

1 Answers1