4

I would like to understand how to subset multiple columns from same data frame by matching the first 5 letters of the column names with each other and if they are equal then subset it and store it in a new variable.

Here is a small explanation of my required output. It is described below,

Lets say the data frame is eatable

fruits_area   fruits_production  vegetable_area   vegetable_production 

12             100                26               324
33             250                40               580
66             510                43               581

eatable <- data.frame(c(12,33,660),c(100,250,510),c(26,40,43),c(324,580,581))
names(eatable) <- c("fruits_area", "fruits_production", "vegetables_area",
          "vegetable_production")

I was trying to write a function which will match the strings in a loop and will store the subset columns after matching first 5 letters from the column names.

checkExpression <- function(dataset,str){
    dataset[grepl((str),names(dataset),ignore.case = TRUE)]
}

checkExpression(eatable,"your_string")

The above function checks the string correctly but I am confused how to do matching among the column names in the dataset.

Edit:- I think regular expressions would work here.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
WaterRocket8236
  • 1,442
  • 1
  • 17
  • 27
  • Try with `substr` – akrun Oct 23 '16 at 12:48
  • 3
    Using `dplyr`, I believe you just want `select(eatable, starts_with("fruit"))`. – aichao Oct 23 '16 at 12:55
  • @aichao I tried that. Your suggestion is good but what I want is automatic checking of column names in the data frame if they match, subset. – WaterRocket8236 Oct 23 '16 at 12:57
  • Then you can use `grepl` as you did and use the result from that to `subset` your columns `subset(dataset,select=colnames(dataset)[cols])` where `cols` is output from the `grepl` – aichao Oct 23 '16 at 13:04

3 Answers3

4

You could try:

v <- unique(substr(names(eatable), 0, 5))
lapply(v, function(x) eatable[grepl(x, names(eatable))])

Or using map() + select_()

library(tidyverse)
map(v, ~select_(eatable, ~matches(.)))

Which gives:

#[[1]]
#  fruits_area fruits_production
#1          12               100
#2          33               250
#3         660               510
#
#[[2]]
#  vegetables_area vegetable_production
#1              26                  324
#2              40                  580
#3              43                  581

Should you want to make it into a function:

checkExpression <- function(df, l = 5) {
  v <- unique(substr(names(df), 0, l))
  lapply(v, function(x) df[grepl(x, names(df))])
}

Then simply use:

checkExpression(eatable, 5)
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
1

I believe this may address your needs:

checkExpression <- function(dataset,str){
  cols <- grepl(paste0("^",str),colnames(dataset),ignore.case = TRUE)
  subset(dataset,select=colnames(dataset)[cols])
}

Note the addition of "^" to the pattern used in grepl.

Using your data:

checkExpression(eatable,"fruit")
##  fruits_area fruits_production
##1          12               100
##2          33               250
##3         660               510
checkExpression(eatable,"veget")
##  vegetables_area vegetable_production
##1              26                  324
##2              40                  580
##3              43                  581
aichao
  • 7,375
  • 3
  • 16
  • 18
0

Your function does exactly what you want but there was a small error:

checkExpression <- function(dataset,str){
  dataset[grepl((str),names(dataset),ignore.case = TRUE)]
}

Change the name of the object from which your subsetting from obje to dataset.

checkExpression(eatable,"fr")
#  fruits_area fruits_production
#1          12               100
#2          33               250
#3         660               510

checkExpression(eatable,"veg")
#  vegetables_area vegetable_production
#1              26                  324
#2              40                  580
#3              43                  581
cimentadaj
  • 1,414
  • 10
  • 23
  • Actually I corrected it in my local R script and getting similar output as yours but there was a mistake while typing it while asking the question. However I edited it and correct the same thing in question. Thank you for noticing. But my question starts after getting the similar output as yours. :) – WaterRocket8236 Oct 23 '16 at 13:26
  • Sorry, I misunderstood the question. – cimentadaj Oct 23 '16 at 13:41