Questions tagged [tapply]

tapply is a function in the R programming language for apply a function to subsets of a vector.

tapply is a function in the R programming language for apply a function to subsets of a vector. A vector is broken in to subsets, potentially of different lengths (aka a ragged array) based on the values of one or more other vector. The second vector is either already a factor or coerced to be a factor by as.factor. A function is applied to each of these subsets. tapply then returns either an array or a list, depending on the output of the function.

354 questions
1
vote
1 answer

Summarising data in a list in R

I have numerous dataframes all contained in a list called 1a1 the names in the list are dates when data were collected e.g. names(1a1) [1] "Jan4" "Jan5" "Jan6" "Jan7" "Jan8" "Jan9" "Jan10" all the dataframes in the list are in the same…
McMahok
  • 348
  • 2
  • 13
1
vote
1 answer

bind_tf_idf() error: in tapply(n, documents, sum) : arguments must have same length

I am trying to do bind_tf_idf() for the following df. My df has two documents/classes: Y or N. > test_2 # A tibble: 3,295 x 2 Class word 1 Y nature 2 Y great 3 Y are 4 Y present 5 N in 6…
1
vote
3 answers

To speed up the tapply function in R, or another function to convert data frame into a matrix

I need to convert a huge dataset into a matrix. The data structure likes the below data "x". When I use the function tapply to do it (see below), it cannot work due to memory limit for the huge dataset. I am wondering if there is another way to do…
Jian Zhang
  • 1,173
  • 5
  • 15
  • 20
1
vote
1 answer

Use tapply(dataframe , index, function) in R giving as argument to the function 2 columns

I would like to use the tapply() function on a dataframe, grouping the rows with the indexing. My problem is that the argument I would pass to the function is not a single column, but a pair of columns. This beacause the 2 columns of the data frame…
Leo
  • 45
  • 7
1
vote
1 answer

Using tapply and cumsum function for multiple vectors in R

I have a data frame with four columns. country date pangolin_lineage n cum_country 1 Albania 2020-09-05 B.1.236 1 1 2 Algeria 2020-03-02 B.1 2 2 3 Algeria …
aholtz
  • 175
  • 6
1
vote
1 answer

Formatting colnames to be read by cbind

Say I have a list called df such that colnames(df) yields: "A" "B" "C" "D" "E" "F" I would like aggregate data in the following way: aggregate(cbind(`C`,`D`,`E`,`F`)~A+B, data = df, FUN…
1
vote
1 answer

R tapply() does not work on data.frame due to improper length check

This is a bug report, not a question. The procedure to report bugs in R core appears complicated, and I don't want to be part of a mailing list. So I'm posting this here (as recommended by https://www.r-project.org/bugs.html.) Here it is: The…
jeanlain
  • 382
  • 1
  • 3
  • 13
1
vote
1 answer

Find cumulative value change and time since last change in panel data

I have panel data (small example of data below), and want to calculate both when a variable changes as well as the time from the last change. The end goal is to get two variables: cumulative change in any given year (i.e. the difference between the…
Sean Norton
  • 277
  • 1
  • 12
1
vote
2 answers

Create vectors from a contingency table

I have a contingency table of meteorological stations and frequency of occurrence. I used logical indexing to create separate vectors like below (b1:b5) from the table. However there has to be a simpler way, perhaps from the apply family. Can…
DAY
  • 91
  • 6
1
vote
2 answers

tapply with categorical variable

I am trying to use tapply() for some descriptive analysis, with the mtcars dataset in R. So the problem is: > table(mtcars$carb) 1 2 3 4 6 8 7 10 3 10 1 1 > tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x){length(x)}) 0 1 0…
user11806155
  • 121
  • 5
1
vote
2 answers

Calculate to the first decimal using integers in R

I would like to calculate to the first decimal using integer numbers but whatever I do, it is rounded and does not give me the precision I want. I have spent a lot of time looking up and it seems something a lot of beginners like me would have a…
owl
  • 1,841
  • 6
  • 20
  • 30
1
vote
1 answer

creating a dataframe of means of 5 randomly sampled observations

I'm currently reading "Practical Statistics for Data Scientists" and following along in R as they demonstrate some code. There is one chunk of code I'm particularly struggling to follow the logic of and was hoping someone could help. The code in…
cliftjc1
  • 43
  • 6
1
vote
2 answers

use function on multiple columns (variables) in r

I am trying to run tests of homogeneity of variance using the leveneTest function from the car package. I can run the test on a single variable like so (using the iris dataset as an…
becbot
  • 151
  • 1
  • 2
  • 9
1
vote
1 answer

Panel regression errors

I am trying to do the panel regression, where dependent variable (stock returns for various companies) is regressed on 5 independent variables. Here is the reproductible example of a data frame of independent…
Julia
  • 241
  • 1
  • 8
1
vote
1 answer

How to use something like tapply but keeping other columns in R

Let's say I have this data frame : Date Value1 Value2 01-01 13.6 20 01-01 25.4 25 01-01 49.5 18 02-01 12.2 22 02-01 28.2 35 02-01 42.2 26 and I would like to keep only the lines in this table that have the…
RenaudG
  • 13
  • 3