I have five excel files with 2 sheets per file.
-file_2015: a, b
-file_2016: a, b
-file_2017: a, b
-file_2018: a, b
Both sheets a and b provide the same data over time. They vary in the number of columns because new indicators are added to newer files that do not exist in older ones.
The column names change slightly and I would like to consolidate this data by appending it together.
Examples of variable names that are similar:
-Metric Rating - Unit level, Percentage Earning 5 Credit in Third Year
-Metric Rating - 5 Credits in 3rd Year - Unit level
The other problems I am running into is the fact that the same indicators from different files have been named with the corresponding year. For example:
-Target For 2016 - Grad Rate, 6-Year
-Target For 2017 - 6-Year Grad Rate
Although they are the same data indicator, it is producing two columns when appending data.
Currently, I am using this function to read in all of the excel files and append each sheet data together as one dataframe.
df<-list.files(path = filepath_raw_data, full.names = TRUE, pattern = "*.xlsx") %>%
#Keetping the names of the filepath as a data columns
set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
map_df(readxl::read_excel, sheet=sheet_name, col_types = "text", .id="file_name")
Is there any way I can use pattern matching to match similar columns names together and append the data together?