How can I get the length of an arbitrary data frame when piped in using Tidyr?

Question

I have code like this:

bulk <- read_csv("data/food_bulk_raw.csv") %>% 
  mutate(Treatment = "bulk", Individual = seq_len(Timestamp))

seq_len() is creating a list of 1:length(Timestamp). It works because 'Timestamp' is a column of the data-frame. But let's say I didn't know anything about my data-frame: Perhaps I am creating a function. How could I indicate the length of the data-frame without first saving it as an object like I have below?

data002 <- read_csv("data/data002.csv")

data002 <- mutate(data002, New_Column = 1:nrow(data002))

Ronak Shah · Accepted Answer · 2019-07-02T04:27:51.113

You could use any of the following

library(tidyverse)
#Option 1
read_csv("data/food_bulk_raw.csv") %>% 
  mutate(Treatment = "bulk", Individual = seq_len(nrow(.)))

#Option 2
read_csv("data/food_bulk_raw.csv") %>% 
     mutate(Treatment = "bulk", Individual = seq(nrow(.)))

#Option 3
read_csv("data/food_bulk_raw.csv") %>% 
      mutate(Treatment = "bulk", Individual = sequence(nrow(.)))

All of these do not depend on any column but uses nrow to create sequence.

Also as @Marius commented, you could also use n() which returns number of rows instead of nrow. So in all of the above options nrow(.) can be replaced with n().

Apart from that we can also use row_number

read_csv("data/food_bulk_raw.csv") %>% 
       mutate(Treatment = "bulk", Individual = row_number())

To demonstrate, making a function

df_sequence_func <- function(df) {
  df %>% mutate(Individual = seq_len(nrow(.)))
}

df_sequence_func(mtcars)

#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb Individual
#1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4          1
#2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4          2
#3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          3
#4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1          4
#....

df_sequence_func(cars)

#   speed dist Individual
#1      4    2          1
#2      4   10          2
#3      7    4          3
#4      7   22          4
#5      8   16          5
#6      9   10          6
#....

It returns a sequential row number irrespective of the columns or rows in the dataframe.

You can also use `dplyr::n()` instead of `nrow(.)` – Marius Jul 02 '19 at 04:20 — Marius, Jul 02 '19 at 04:20

akrun · Answer 2 · 2019-07-02T04:41:15.580

1

We can use data.table methods

library(data.table)
setDT(df)[, seq_len(.N)]

and it can be read with fread

fread("data/food_bulk_raw.csv")[, 
     c("Treatment", "Individual")  := .("bulk", seq_len(.N))][]

Or in tidyverse

library(tidyverse)
rownames_to_column(data002, 'rn')

Or using

data002 %>%
      mutate(New_Column = seq_len(n()))

Or in base R

df$newcolumn <- seq(nrow(df))

edited Jul 02 '19 at 04:41

answered Jul 02 '19 at 04:35

akrun

874,273
37
540
662

How can I get the length of an arbitrary data frame when piped in using Tidyr?

2 Answers2