0

I have a excel file which has multiple sheets embedded in it. My main goal is to basically remove all rows which are appearing multiple times in a single sheet and have to do this for every sheet. I have written the code below but the code is only reading the first sheet and also giving ' ...' in first row and column. Can someone help me out where I might be going wrong. Thank you in advanced

**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1] 
count_of_xl_sheets <- length(config_xl_sheets_name) 
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
  assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
  updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**
Manu
  • 29
  • 6

1 Answers1

2

this is my approach

library(readxl)
library(data.table)
library(openxlsx)
file.to.read   <- "./testdata.xlsx"
sheets.to.read <- readxl::excel_sheets(file.to.read)

# read sheets from the file to a list and remove duplicate rows
L <- lapply(sheets.to.read, function(x) {
  data <- setDT(readxl::read_excel(file.to.read, sheet = x))
  #remove puplicates
  data[!duplicated(data), ]
  })

# create a new workbook
wb <- createWorkbook()
# create new worksheets an write to them
for (i in seq.int(L)) {
  addWorksheet(wb, sheets.to.read[i])
  writeData(wb, i, L[[i]] )
}
# write the workbook to disk
saveWorkbook(wb, "testdata_new.xlsx")
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Hi. The code is working absolutely fine. I have 2 doubts. 1) In output the numbers are stored as text hence getting ! mark in each cell so have to go & click on convert into number. Can it be resolved within this code? 2. In excel I have page number at top right corner in each sheet in 1 cell however after the code is running that page number gets shifted to left most cell and rest of the cells automatically gets filled with numbers ..2 ..3 ...4 and so on. Can I remove entirely that row from my excel in code something by specifying that I want code to delete the very first row from each sheet? – Manu Nov 16 '21 at 13:25
  • probably... make sure to check all possible arguments in the help-file of the `read_excel()` function – Wimpel Nov 16 '21 at 13:28