0

I have about 30,000 rows of data with a Date column in date format. I would like to be able to count the number of rows by month/year and year, but when I aggregate with the below code, I get a vector within the data table for my results instead of a number.

Using the hyperlinked csv file, I have tried the aggregate function.

https://www.dropbox.com/s/a26t1gvbqaznjy0/myfiles.csv?dl=0

short.date <- strftime(myfiles$Date, "%Y/%m")
aggr.stat <- aggregate(myfiles$Date ~ short.date, FUN = count)

Below is a view of the aggr.stat data frame. There are two columns and the second one beginning with "c(" is the one where I'd like to see a count value.

1 1969/01 c(-365, -358, -351, -347, -346)

2 1969/02 c(-323, -320)

3 1969/03 c(-306, -292, -290)

4 1969/04 c(-275, -272, -271, -269, -261, -255)

5 1969/05 c(-245, -240, -231)

6 1969/06 c(-214, -211, -210, -205, -204, -201, -200, -194, -190, -186)

Rachel Rotz
  • 101
  • 9
  • Hi! Id love to help but can you expand on your expected output? Maybe provide an example of what the desired out come would be? Take a look at https://stackoverflow.com/questions/30544258/summing-rows-by-month-in-r – OctoCatKnows Jun 16 '19 at 13:20
  • Thanks - my expected output would be one column with the month/year labeled as Date and one column with the count of all the rows from that same month/year. – Rachel Rotz Jun 16 '19 at 13:53

2 Answers2

0

I'm not much into downloading any unknown file from the internet, so you'll have to adapt my proposed solution to your needs.

You can solve the problem with the help of data.table and lubridate.

Imagine your data has at least one column called dates of actual dates (it is, calling class(df$dates) will return at least Date or something similar (POSIXct, etc).

# load libraries
library(data.table)
library(lubridate)

# convert df to a data.table
setDT(df)

# count rows per month
df[, .N, by = .(monthDate = floor_date(dates, "month")]

.N counts the number of rows, by = groups the data. See ?data.table for further details.

PavoDive
  • 6,322
  • 2
  • 29
  • 55
0

Consider running everything from data frames. Specifically, add needed month/year column to data frame and then run aggregate using data argument (instead of running by separate vectors). Finally, there is no count() function in base R, use length instead:

# NEW COLUMN
myfiles$short.date <- strftime(myfiles$Date, "%Y/%m")

# AGGREGATE WITH SPECIFIED DATA
aggr.stat <- aggregate(Date ~ short.date, data = myfiles, FUN = length)
Parfait
  • 104,375
  • 17
  • 94
  • 125