Questions tagged [subset]

A subset consists of those elements selected from a larger set of elements, by their position in the larger set or other features, such as their value.

Definition:

From Wikipedia:

a set A is a subset of a set B, or equivalently B is a superset of A, if A is 'contained' inside B, that is, all elements of A are also elements of B.

Uses:

  • In , subset is a function that selects a subset of elements from a vector, matrix, or data frame, given some logical expression (caution: subset drops incomplete cases; see How to subset data in R without losing NA rows?). However, for programmatic use (as opposed to interactive use) it is better to use the \[ (or [[) operators or the filter function from dplyr. substring is used to find subsets of character strings.
  • In , a subset of an array can be obtained with array[indices].
6799 questions
21
votes
2 answers

R gotcha: logical-and operator for combining conditions is & not &&

Why doesn't subset() work with a logical and && operator combining two conditions? > subset(tt, (customer_id==177 && visit_date=="2010-08-26")) <0 rows> (or 0-length row.names) but they each work individually: > subset(tt, customer_id==177) >…
smci
  • 32,567
  • 20
  • 113
  • 146
21
votes
7 answers

How to iteratively generate k elements subsets from a set of size n in java?

I'm working on a puzzle that involves analyzing all size k subsets and figuring out which one is optimal. I wrote a solution that works when the number of subsets is small, but it runs out of memory for larger problems. Now I'm trying to translate…
user550617
21
votes
3 answers

Update subset of data.table based on join

I have two data tables, DT1 and DT2: set.seed(1) DT1<-data.table(id1=rep(1:3,2),id2=sample(letters,6), v1=rnorm(6), key="id2") DT1 ## id1 id2 v1 ## 1: 2 e 0.7383247 ## 2: 1 g 1.5952808 ## 3: 2 j 0.3295078 ## 4: 3 n…
dnlbrky
  • 9,396
  • 2
  • 51
  • 64
20
votes
5 answers

Subsets in Prolog

I'm looking for a predicate that works as this: ?- subset([1,2,3], X). X = [] ; X = [1] ; X = [2] ; X = [3] ; X = [1, 2] ; X = [1, 2, 3] ; X = [2, 3] ; ... I've seen some subset implementations, but they all work when you want to check if one list…
arubox
  • 201
  • 1
  • 2
  • 3
20
votes
2 answers

Using attributes of `ftable` for extracting data

I sometimes use the ftable function purely for its presentation of hierarchical categories. However, sometimes, when the table is large, I would like to further subset the table before using it. Let's say we're starting with: mytable <-…
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
20
votes
5 answers

Subset a file by row and column numbers

We want to subset a text file on rows and columns, where rows and columns numbers are read from a file. Excluding header (row 1) and rownames (col 1). inputFile.txt Tab delimited text file header 62 9 3 54 6 1 25 1 2 3 4 5 …
zx8754
  • 52,746
  • 12
  • 114
  • 209
20
votes
3 answers

Why does dplyr's filter drop NA values from a factor variable?

When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example: library(dplyr) set.seed(919) (dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T)))) # …
Jake Fisher
  • 3,220
  • 3
  • 26
  • 39
20
votes
3 answers

Subset data based on partial match of column names

I need to subset a df to include certain strings. Some of these are full column names, and the following works fine: testData[,c("FullColName1","FullColName2","FullColName3")] My problem is that I need to expand this to also include column names…
user3614783
  • 821
  • 6
  • 12
  • 20
19
votes
4 answers

subsetting in data.table

I am trying to subset a data.table ( from the package data.table ) in R (not a data.frame). I have a 4 digit year as a key. I would like to subset by taking a series of years. For example, I want to pull all the records that are from 1999, 2000,…
exl
  • 1,743
  • 2
  • 18
  • 27
19
votes
4 answers

Can I define an enum as a subset of another enum's cases?

Note: This is basically the same question as another one I've posted on Stackoverflow yesterday. However, I figured that I used a poor example in that question that didn't quite boil it down to the essence of what I had in mind. As all replies to…
Mischa
  • 15,816
  • 8
  • 59
  • 117
19
votes
6 answers

R selecting all rows from a data frame that don't appear in another

I'm trying to solve a tricky R problem that I haven't been able to solve via Googling keywords. Specifically, I'm trying to take a subset one data frame whose values don't appear in another. Here is an example: > test number fruit ID1 …
so13eit
  • 942
  • 3
  • 11
  • 22
18
votes
3 answers

How to sort and filter data.frame in R?

I understand how to sort a data frame: df[order(df$Height),] and I understand how to filter (or subset) a data frame matching some predicate: df[df$Weight > 120,] but how do I sort and filter (as an example, order by Height and filter by Weight)?
User
  • 62,498
  • 72
  • 186
  • 247
18
votes
1 answer

Select multiple rows conditioning on ID in R

I tried to select the rows based on their ID. For example, in a data frame called test, ID 201 has 6 rows of data, ID 202 has 6 rows of data too, and 203, 204..... etc. Now I only want to extract 201 and 202 from the dataset, so it should have 12…
Fred
  • 579
  • 2
  • 4
  • 13
18
votes
4 answers

Filter data frame rows based on values in vector

What is the best way to filter rows from data frame when the values to be deleted are stored in a vector? In my case I have a column with dates and want to remove several dates. I know how to delete rows corresponding to one day, using !=,…
matt_k
  • 4,139
  • 4
  • 27
  • 33
18
votes
6 answers

Why is [- subsetting (i.e. deletion) of columns not possible with names?

I fear greatly that this has been asked and will be downvoted, but I have not found the answer in the docs (?"["), and discovered that it is hard to search for. data(wines) # This is allowed: alcoholic <- wines[, 1] alcoholic <- wines[,…
a different ben
  • 3,900
  • 6
  • 35
  • 45