0

How can I insert a blank column into an existing Excel file while retaining the data in that column?

I have time series data across multiple columns (one year per column) with summary data at the end (Min, Max, Percentiles).

I want to use R to insert a column that contains a new year's data, but the code I have currently pastes on top of the summary data.

Code has been generalized below:

# Add necessary packages
library(rJava)
library(xlsx)
library(xlsxjars)

# Import data. Assume 1 column of data
df <- read.csv("file.csv", header = TRUE)

# Create a workbook using library(xlsx) function
workbook <- loadWorkbook("existing_workbook.xlsx")

# Import sheet names from above workbook
sheets <- getSheets(workbook)

# Add the created dataframe into the workbook.
# Assume the workbook has 100 existing columns and data is to be added to column 99
addDataFrame(df, sheets$correct_sheet, startColumn = 99, row.names = FALSE)

#Save the created workbook
saveWorkbook(workbook, "R_Output.xlsx")

What this ultimately does is overwrite the data in column 99, which I would prefer be shifted one column to the right.

Is this possible?

AZaremba
  • 3
  • 2

1 Answers1

0

You can do this fairly simply. The idea is to make a copy of column 99 in column 100, then write df into column 99. Since you do not provide data, I make a small reproducible example. My existing_workbook.xlsx just has two columns. I will move column 2 to column 3, then add the data from file.csv into column 2. The updated database will have the new column "inserted" before column 2.

First some data

library(xlsx)

## Create some reproducible data for testing
write.csv(1:150, "file.csv", row.names=FALSE) 
df = data.frame(X = seq(2,300,2), Y=rep(LETTERS, length.out=150))
wb <- xlsx::createWorkbook()
sheet1 <- xlsx::createSheet(wb, sheetName='test')
addDataFrame(df, sheet1, col.names=TRUE, row.names=FALSE)
saveWorkbook(wb, 'existing_workbook.xlsx')

Now we have the files file.csv and existing_workbook.xlsx like in your code. My code starts out like yours to load the data.

## Your input statements
df <- read.csv("file.csv", header = TRUE)
workbook <- loadWorkbook("existing_workbook.xlsx")
sheets <- getSheets(workbook)

# Import sheet names from above workbook
sheets <- getSheets(workbook)

Now I copy column 2 into column 3 to preserve it.

## Move a copy of column 2 to column 3
columnToPreserve = readColumns(sheets$test, 2, 2, startRow=1)
addDataFrame(columnToPreserve, sheets$test, 
    startColumn = 3, row.names = FALSE)

Now we can write df into column 2 without losing the old column 2.

## Now we can write df into column 2
addDataFrame(df, sheets$test, startColumn = 2, row.names = FALSE)

## Save elsewhere to make sure it was right
saveWorkbook(workbook, 'updated_workbook.xlsx')

If you open up updated_workbook.xlsx in excel, you should see the inserted column.

G5W
  • 36,531
  • 10
  • 47
  • 80
  • Thanks G5W, I attempted this solution with my own data and kept receiving an error. I then attempted to use your code with the data you generated on the fly and receive the same error. See below: `Error in .jcall(Rintf, "[S", "readRowStrings", .jcast(sheet, "org/apache/poi/ss/usermodel/Sheet"), : java.lang.NoSuchFieldError: CREATE_NULL_AS_BLANK Error in readColumns(sheets$test, 2, 2, startRow = 1) : Cannot read the header. The header row doesn't have all requested cells non-blank!` – AZaremba Feb 25 '19 at 17:55
  • As a side note, over the weekend I discovered the XLConnect package and attempted this same method; copy the desired cells into a data frame, paste the new data, paste the data frame one column to the right. This works for all but one of the summary cells (Standard Deviation). If you care to read it, I posted my steps on the Github page for XLConnect [here](https://github.com/miraisolutions/xlconnect/issues/86) – AZaremba Feb 25 '19 at 18:06
  • Odd. It worked for me. I even ran the code starting from a fresh R session and it worked. Anyway, I am glad that you found a solution. – G5W Feb 25 '19 at 19:54