Questions tagged [subset]

A subset consists of those elements selected from a larger set of elements, by their position in the larger set or other features, such as their value.

Definition:

From Wikipedia:

a set A is a subset of a set B, or equivalently B is a superset of A, if A is 'contained' inside B, that is, all elements of A are also elements of B.

Uses:

  • In , subset is a function that selects a subset of elements from a vector, matrix, or data frame, given some logical expression (caution: subset drops incomplete cases; see How to subset data in R without losing NA rows?). However, for programmatic use (as opposed to interactive use) it is better to use the \[ (or [[) operators or the filter function from dplyr. substring is used to find subsets of character strings.
  • In , a subset of an array can be obtained with array[indices].
6799 questions
2
votes
5 answers

Code unique observations based on condition

Say I have a data set car_manu owner ford 1 toyota 1 ford 2 ford 3 ford 3 ford 3 I'd like to make a variable that says if they're a 'one car owner' - this would mean owner 2 is one car owner. I know…
PDog
  • 143
  • 1
  • 2
  • 11
2
votes
3 answers

subset dataset based on date comparison R

I have a dataset as shown below Col1 Col2 Col3 CutoffDate 12001 Yes 2008-08-15 2008-08-10 12001 Yes 2008-08-22 2008-08-10 12001 Yes 2008-08-10 2008-08-10 12001 Yes …
2
votes
2 answers

R: Subsetting on increasing value to max excluding the decreasing

I have a number of trials where one variable increases to a max of interest then decreases back to a starting point. How would I go about just retaining the observations with the increasing values to max. Thanks. For example Trial A B C 1 2 4…
ksing
  • 67
  • 5
2
votes
1 answer

How to isolate/retrieve/count a subset of returned records in Rails

I'm writing a reports dashboard for a rails app. The dashboard is for user data, and currently it's running multiple count an select queries to build the four or five reports on the page. I'm sure that there is a more efficient way to do this. …
Kevin Whitaker
  • 457
  • 1
  • 6
  • 12
2
votes
2 answers

Conditional sum with output for all rows in r data.table

I have a coding issue what I think should be very easy. I have created a simplified dataset: DT <- data.table(Bank=rep(c("a","b","c"),4), Type=rep(c("Ass","Liab"),6), …
Tim_Utrecht
  • 1,459
  • 6
  • 24
  • 44
2
votes
2 answers

Subsetting DataFrame in R by duplicate values for Year by lowest value for Rating

I have a data frame which looks like this > fitchRatings Country Month Year FitchLongTerm LongTermTransformed 1 Abu Dhabi 7 2007 AA 22 2 Angola 5 2012 BB- …
Josh
  • 1,800
  • 3
  • 15
  • 21
2
votes
3 answers

Subsetting with multiple conditions in very large data set

I have a matrix that is approximately 430 X 20,000. Each row is a person, each column is a project they have worked on. Each cell has a value of either 0 - (not involved), 1 - (project head, only one per project), 2 - (project helper). I am trying…
AgeTex
  • 33
  • 4
2
votes
3 answers

Subset Duplicated Values >10

I am looking at a data frame and trying to subset rows that have the same pressure value for more then 5 rows or delete rows that do not have 5 duplicate pressure values... File Turbidity Pressure 1 3.2 46 2 3.4 46 …
durosj
  • 23
  • 2
2
votes
0 answers

Use of a list of variable names in a for loop with .GlobalEnv and the `[[` operator

Questions like R variable names in loop, get, etc are common among people coming to R from other languages. The standard answer is usually, as I gave in that example, that it's not possible to iterate through a list of variables in the global…
Nick Kennedy
  • 12,510
  • 2
  • 30
  • 52
2
votes
3 answers

Array Addition, why start at 'i = 2'?

Using the Ruby language, have the function ArrayAdditionI(arr) take the array of numbers stored in arr and return the string true if any combination of numbers in the array can be added up to equal the largest number in the array, otherwise return…
Zarley
  • 51
  • 2
2
votes
1 answer

R: Code works line by line but not implemented as a function

I have a data set (called 'data' here) which contains three important kinds of columns: A 'label' column, which corresponds to a list of institutions; a 'group' column that states to which group each institution belongs, and a series of 'measure'…
2
votes
1 answer

Subset sequence data in fasta file based on IDs stored in listed data frames

I am trying to subset one FASTA file (containing multiple sequences) into several smaller ones based on IDs I stored in a list of data frames (and I have a FASTA called fastafile like this: fastafile <- dput(fastafile) structure(list(r1 =…
Moritz
  • 309
  • 6
  • 16
2
votes
1 answer

Efficiently finding the count of column values for distinct rows in a dataframe in r

Suppose I have a data frame as: id value 1 "hi" 1 "hi" 1 "hi again" 1 "hi again" 2 "hello" 2 "hi" Now I want to get the count of each value for each of the distinct values in id column. The output would be like id value …
Shiva
  • 789
  • 6
  • 15
2
votes
3 answers

How to subset by distinct rows in a data frame or matrix?

Suppose I had the following matrix: matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3) Result: [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 [3,] 2 2 2 [4,] 1 1 1 How can I filter/subset this matrix by whether or not each…
eyio
  • 337
  • 3
  • 14
2
votes
1 answer

Include column after grouping using datatable

My goal is to calculate a group % column by zip. I created the % column by zip, but keep losing my group ('cgrp') variable. How can I keep this in my end results? My data table script is giving me the below results: zip V1 1: 12007…
user3067851
  • 524
  • 1
  • 6
  • 20