2

I am new in this and i am stuck. I have a list of data frames that have information about pressure, temperature and salinity. I want to subset all of them and keep only the values of temperature and salinity when the pressure is equal to 5. Below this is the structure of the list:

str(CT_STP)
List of 3
$ CT01_CTD1:'data.frame':      41 obs. of  3 variables:
  ..$ pressure   : num [1:41] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ temperature: num [1:41] 18.8 18.8 18.8 18.8 18.8 ...
  ..$ salinity   : num [1:41] 34.1 34.1 34.1 34.1 34.1 ...
 $ CT02_CTD1:'data.frame':      69 obs. of  3 variables:
  ..$ pressure   : num [1:69] 2 3 4 5 6 7 8 9 10 11 ...
  ..$ temperature: num [1:69] 18.7 18.7 18.7 18.7 18.7 ...
  ..$ salinity   : num [1:69] 34 34 34 34 34 ...
 $ CT03_CTD1:'data.frame':      79 obs. of  3 variables:
  ..$ pressure   : num [1:79] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ temperature: num [1:79] 18.3 18.3 18.3 18.3 18.3 ...
  ..$ salinity   : num [1:79] 33.9 33.9 33.9 33.9 33.9 ...

I want to subset all the data frames to get only the temperature and the salinity when the pressure is equal to five.

I´ve tried tonnes of things even this :

PROF5<-lapply(CT_STP,subset(CT_STP, pressure==5,select="pressure","temperature","salinity"))

but nothing seems to work so far... I have searched for answers here but its difficult to find specific ones being a newcomer.

1 Answers1

3

I created a sample data. When you use subset(), you need a data frame and a condition. When you use lapply(), you make your function anonymous. That is, you write function(x) and further write codes which you want R to loop through. In your case, you want to loop through a list and apply subset(). R applies the function to each data frame in the list and handles the subsetting. Hope this will help you.

df1 <- data.frame(pressure = 1:5,
                  temperature = 18:22,
                  salinity = c(34.1, 34.1, 34.1, 34.1, 34.1))

df2 <- data.frame(pressure = 1:5,
                  temperature = 18:22,
                  salinity = c(34.1, 34.1, 34.1, 34.1, 34.1))

mylist <- list(df1, df2)

[[1]]
  pressure temperature salinity
1        1          18     34.1
2        2          19     34.1
3        3          20     34.1
4        4          21     34.1
5        5          22     34.1

[[2]]
  pressure temperature salinity
1        1          18     34.1
2        2          19     34.1
3        3          20     34.1
4        4          21     34.1
5        5          22     34.1

lapply(mylist, function(x) subset(x, pressure == 5))

[[1]]
  pressure temperature salinity
5        5          22     34.1

[[2]]
  pressure temperature salinity
5        5          22     34.1

EDIT

Given @tospig's comment, you can also do the following.

lapply(mylist, function(x) x[x$pressure == 5, ])
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • @CarlaBerghoff Pleasure. :) – jazzurro Feb 23 '15 at 01:09
  • I remember reading somewhere that you should avoid using `subset()` (but I can't remember where or why). An alternative using `[` is `lapply(mylist, function(x) x[x$pressure==5,])` – tospig Feb 23 '15 at 01:38
  • @tospig Thank you of your comment. I have not been aware of the issue you mentioned. Given the issue, your suggestion is another way to go. I will add that in my answer. Thank you very much for your support. – jazzurro Feb 23 '15 at 01:44
  • You're welcome. A quick search shows [this blog post on 'pitfalls of the subset'](http://rob-barry.com/2014/01/08/Avoiding_the_Subset.html) , which also links to a good article by Hadley Wickham. – tospig Feb 23 '15 at 01:46