Counting the NUM of rows that contains a certain year

Question

I have a data frame that contains a date column of this format (1990-02-28) i want to count the number of rows that contains the year 1996 (doesnt matter the month/day).

For Example:

DF

1. 1946-01-21   -0.7062
2. 1986-01-22   0.5029
3. 1923-01-23   0.5657
4. 1920-01-25   0.4723
5. 1996-01-26   -0.5384
6. 1996-01-27   0.717

the response would be 2 (for #5,#6)

Thanks

score 1 · Answer 1 · answered Nov 12 '17 at 11:08

1

library(lubridate)
library(dplyr)

dt = read.table(text = "
date value
19460121 -0.7062
19860122 0.5029
19230123 0.5657
19200125 0.4723
19960126 -0.5384
19960127 0.717
", header = T)

dt %>%
  mutate(date = ymd(date)) %>%     # make this a date column (if it's not already) 
  filter(year(date) == 1996) %>%   # filter rows where the year is 1996
  nrow()                           # count rows

answered Nov 12 '17 at 11:08

AntoniosK

15,991
2
19
32

i get this response: {Error in year(COMBINED$TrdDate) : could not find function "year"}, i tried also , sum(years(as.Date(COMBINED$TrdDate)) == "1996", na.rm = TRUE). which COMBINED is my DataFrame, and TrdDate is my date column, Thanks – Eliad Harell Nov 12 '17 at 11:15
Are you sure you're using the `lubridate` package? Make sure it is installed first using `install.packages("lubridate")`. – AntoniosK Nov 12 '17 at 11:19
I don't think that the `years` command you are using does what you think. – AntoniosK Nov 12 '17 at 11:22
still cant recognize the function "year". – Eliad Harell Nov 12 '17 at 11:26
There must be a conflict with other packages you're using. Try using the function in this way `lubridate::year`. – AntoniosK Nov 12 '17 at 11:28
for : filter(lubridate::year(COMBINED$TrdDate) == "1996")), the response : Error: unexpected ')' in "filter(lubridate::year(COMBINED$TrdDate) == "1996"))" – Eliad Harell Nov 12 '17 at 11:33

vaettchen · Answer 2 · 2017-11-12T11:38:21.960

0

In base R the statement

DF[ grepl( "1996", DF[ , 1 ] ), ]

would achieve your goal:

> DF[ grepl( "1996", DF[ , 1 ] ), ]
      date   value
5 19960126 -0.5384
6 19960127  0.7170

EDIT:

The number of rows can be found with

nrow( DF[ grepl( "1996", DF[ , 1 ] ), ] )

or using length() correctly:

length( DF[ grepl( "1996", DF[ , 1 ] ), 1 ] )

edited Nov 12 '17 at 11:38

answered Nov 12 '17 at 11:16

vaettchen

7,299
22
41

THanks for the response, by i need that the command will execute the number of rows only, without any more outputs. regard the example i gave, the output should be 2. – Eliad Harell Nov 12 '17 at 11:22
2

@EliadHarell, it would help if you learned _a little_ R when posting to SO. You really can't wrap `nrow()` around @vaettchen's answer? – hrbrmstr Nov 12 '17 at 11:25
`length()` is providing the column count @vaettchen, not the # rows – hrbrmstr Nov 12 '17 at 11:30

score 0 · Answer 3 · answered Nov 12 '17 at 11:22

0

Other way with base R:

df=read.table(text="
19460121 -0.7062
19860122 0.5029
19230123 0.5657
19200125 0.4723
19960126 -0.5384
19960127 0.717")

df$V1=as.character(df$V1)

table(format(as.Date(df$V1,"%Y%m%d"),"%Y"))
#1920 1923 1946 1986 1996 
#   1    1    1    1    2 

table(format(as.Date(df$V1,"%Y%m%d"),"%Y"))["1996"]
#1996 
#   2

answered Nov 12 '17 at 11:22

Robert

5,038
1
25
43

Thanks Robert, i made it. – Eliad Harell Nov 12 '17 at 11:27

score 0 · Answer 4 · answered Nov 12 '17 at 11:28

Column only operations are likely to be faster. Here are 3 of them:

read.table(stringsAsFactors=FALSE, header=FALSE, text="
19460121 -0.7062
19860122 0.5029
19230123 0.5657
19200125 0.4723
19960126 -0.5384
19960127 0.717") -> xdf

# base R
sum(grepl("^1996", xdf$V1))

# stringi one way
sum(!is.na(stringi::stri_match_first_regex(xdf$V1, "^1996")))

# stringi another way
sum(stringi::stri_count_regex(xdf$V1, "^1996"))

Counting the NUM of rows that contains a certain year

4 Answers4