0

I have 16 large datasets of landcover variables around routes. Example dataset "Trial1":

RtNo     TYPE    CA      PLAND   NP      PD      LPI     TE 
2001     cls_11     996.57  6.4297  22  0.1419  6.3055  31080
2010     cls_11     56.34   0.3654  23  0.1492  0.1669  15480
18003    cls_11     141.12  0.9899  37  0.2596  0.1503  38700
18014    cls_11     797.58  5.3499  47  0.3153  1.3969  98310
2001     cls_21     1514.97 9.7744  592 3.8195  0.8443  761670
2010     cls_21     638.55  4.1414  95  0.6161  0.7489  463260
18003    cls_21     904.68  6.3463  612 4.2931  0.8769  549780
18014    cls_21     1189.89 7.9814  759 5.0911  0.4123  769650
2001     cls_22     732.33  4.7249  653 4.2131  0.7212  377430
2010     cls_22     32.31   0.2096  168 1.0896  0.0198  31470
18003    cls_22     275.85  1.9351  781 5.4787  0.0423  237390
18014    cls_22     469.44  3.1488  104 6.7345  0.1014  377580

I want to first select rows that meet a condition, for example, all rows in column "TYPE" that is cls_21. I know the following code does this work:

Trial21 <-subset(Trial1, TYPE==" cls_21 ")

(yes the invisible space before and after the categorical variable caused me a considerable headache). And there are several other ways of doing this as shown in [https://stackoverflow.com/questions/5391124/select-rows-of-a-matrix-that-meet-a-condition]

I get the following output (sorry this one has extra columns, but shouldn't affect my question):

    RtNo    TYPE    CA     PLAND     NP  PD    LPI     TE       ED      LSI
2   18003   cls_21  904.68  6.3463  612 4.2931  0.8769  549780  38.5668 46.1194
18  18014   cls_21  1189.89 7.9814  759 5.0911  0.4123  769650  51.6255 56.2522
34  2001    cls_21  1514.97 9.7744  592 3.8195  0.8443  761670  49.1418 49.3462
50  2010    cls_21  638.55  4.1414  95  0.6161  0.7489  463260  30.0457 46.0118
62  2020    cls_21  625.5   4.1165  180 1.1846  0.5064  384840  25.3268 38.6407
85  2021    cls_21  503.55  2.7926  214 1.1868  0.1178  348330  19.3175 38.9267

I want to rename the columns in this subset so they uniquely identify the class by adding "L21" at the back of existing column names, and I can do this using

library(data.table)
setnames(Trial21, old = c('CA', 'PLAND', 'NP', 'PD', 'LPI', 'TE', 'ED', 'LSI'), 
         new = c('CAL21', 'PLANDL21', 'NPL21', 'PDL21', 'LPIL21', 'TEL21', 'EDL21', 'LSIL21'))

I want help to develop a function or a loop that automates this process so I don't have to spend days repeating the same codes for 15 different classes and 16 datasets (240 times). Also, decrease the risk of errors. I may have to do the same for additional datasets. Any help to speed the process will be greatly appreciated.

Guphadi
  • 3
  • 3
  • 2
    It would be useful to know: How standardized is the format? I.e., are the columns to be renamed always in the same position? Will only the number change in the TYPE selector, or how flexible do you need the function to be? – ktiu Jun 01 '21 at 21:09
  • Yes, the columns will be in the same position across datasets and have the same name to start with. Only the numbers will change in the TYPE selector, all numbers are 2-digit, (range, 11-95). When renaming columns the new column names will have either "L" or "R" followed by the two-digit number associated with TYPE added at the end of the name. – Guphadi Jun 01 '21 at 21:31
  • I changed my answer acordingly. – ktiu Jun 01 '21 at 21:43

3 Answers3

0

You could do:

a <- split(df, df$TYPE)

b <- sapply(names(a), function(x)setNames(a[[x]],
              paste0(names(a[[x]]), sub(".*_", 'L', x))), simplify = FALSE)
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Here is a start that should work for your example:

library(dplyr)

myfilter <- function(data, number) {
  data %>%
    filter(TYPE == sprintf(" cls_%s ") %>%
    rename_with(\(x) sprintf("%s%s", x, suffix), !1:2)
}

myfilter(example_data, 21)

Given a list of numbers (here: 21 to 31) you could then automatically use them to filter a single dataframe:

multifilter <- function(data) {
  purrr::map(21:31, \(i) myfilter(data, i))
}

multifilter(example_data)

Finally, given a list of dataframes, you can automatically apply the filters to them:

purrr::map(list_of_dataframes, multifilter)
ktiu
  • 2,606
  • 6
  • 20
0

You can use ls to get the variable names of the datasets, and manipulate them as you wish inside a loop and with get function, then create new datasets with assign.

sets = grep("Trial", ls(), value=TRUE) #Assuming every dataset has "Trial" in the name

for(i in sets){
  classes = unique(get(i)$TYPE)
  
  for(j in classes){
    number = gsub("(.+)([0-9]{2})( )", "\\2", j)#this might be an overly complicated way of getting just the number, you can look for better options if you want
    assign(paste0("Trial", number),
           subset(Trial1, TYPE==j) %>% rename_with(function(x){paste0(x, number)}))}}