-1

First off, StackOverFlow keeps saying there are answers already, but I've been looking for 2.5 hours now and nothing is available.

I'm attempting to view values from a dataframe with 940 rows. I would like to view the calories associated to the user IDs from the first and last dates of the trial.

            Id ActivityDay Calories
1   1503960366  2016-04-12     1985
2   1624580081  2016-04-12     1432
3   1644430081  2016-04-12     3199
4   1844505072  2016-04-12     2030
5   1927972279  2016-04-12     2220
6   2022484408  2016-04-12     2390
7   2026352035  2016-04-12     1459
8   2320127002  2016-04-12     2124
9   2347167796  2016-04-12     2344
10  2873212765  2016-04-12     1982
11  3372868164  2016-04-12     1788
12  3977333714  2016-04-12     1450
13  4020332650  2016-04-12     3654
14  4057192912  2016-04-12     2286
15  4319703577  2016-04-12     2115
16  4388161847  2016-04-12     2955
17  4445114986  2016-04-12     2113
18  4558609924  2016-04-12     1909
19  4702921684  2016-04-12     2947
20  5553957443  2016-04-12     2026
21  5577150313  2016-04-12     3405
22  6117666160  2016-04-12     1496
23  6290855005  2016-04-12     2560
24  6775888955  2016-04-12     1841
25  6962181067  2016-04-12     1994
26  7007744171  2016-04-12     2937
27  7086361926  2016-04-12     2772
28  8053475328  2016-04-12     3186
29  8253242879  2016-04-12     2044
30  8378563200  2016-04-12     3635
31  8583815059  2016-04-12     2650
32  8792009665  2016-04-12     2044
33  8877689391  2016-04-12     3921
34  1503960366  2016-04-13     1797
35  1624580081  2016-04-13     1411
36  1644430081  2016-04-13     2902
37  1844505072  2016-04-13     1860
38  1927972279  2016-04-13     2151
39  2022484408  2016-04-13     2601
40  2026352035  2016-04-13     1521
41  2320127002  2016-04-13     2003
42  2347167796  2016-04-13     2038
43  2873212765  2016-04-13     2004
44  3372868164  2016-04-13     2093
45  3977333714  2016-04-13     1495
46  4020332650  2016-04-13     1981
47  4057192912  2016-04-13     2306
48  4319703577  2016-04-13     2135
49  4388161847  2016-04-13     3092
50  4445114986  2016-04-13     2095
51  4558609924  2016-04-13     1722
52  4702921684  2016-04-13     2898

This is the sample data...ommiting the other nearly 900 rows... I want to keep only the date of 2016-04-12, AND 2016-05-12. That is the range of which the data was taken from. I'd like to see the IDs of the users, and their calories from those 2 dates only.

I've tried about 50 codes...here is where I'm at right now:

Daily_Calories %>% 
  group_by(Id, Calories) %>%
  arrange(ActivityDay) %>% 
  as.data.frame()

I have not saved all the codes I've tried, as I'm new and RStudio gets messy and unorganized quickly...and then I get a bit lost.

I've also tried:

Daily_Calories %>% 
  group_by(Id, Calories) %>%
  group_by(min(ActivityDay), max(ActivityDay)) %>% 
  arrange(ActivityDay) %>%
  as.data.frame()

and got this:

            Id ActivityDay Calories min(ActivityDay) max(ActivityDay)
1   1503960366  2016-04-12     1985       2016-04-12       2016-05-12
2   1624580081  2016-04-12     1432       2016-04-12       2016-05-12
3   1644430081  2016-04-12     3199       2016-04-12       2016-05-12
4   1844505072  2016-04-12     2030       2016-04-12       2016-05-12
5   1927972279  2016-04-12     2220       2016-04-12       2016-05-12
6   2022484408  2016-04-12     2390       2016-04-12       2016-05-12
7   2026352035  2016-04-12     1459       2016-04-12       2016-05-12
8   2320127002  2016-04-12     2124       2016-04-12       2016-05-12
9   2347167796  2016-04-12     2344       2016-04-12       2016-05-12
10  2873212765  2016-04-12     1982       2016-04-12       2016-05-12
11  3372868164  2016-04-12     1788       2016-04-12       2016-05-12
12  3977333714  2016-04-12     1450       2016-04-12       2016-05-12

and then tried this:

Daily_Calories %>% 
  group_by(Id, Calories) %>%
  arrange(ActivityDay) %>%
  summarise(min(ActivityDay), max(ActivityDay)) %>% 
  as.data.frame()

and got this:

            Id Calories min(ActivityDay) max(ActivityDay)
1   1503960366        0       2016-05-12       2016-05-12
2   1503960366     1728       2016-04-17       2016-04-17
3   1503960366     1740       2016-05-08       2016-05-08
4   1503960366     1745       2016-04-15       2016-04-15
5   1503960366     1775       2016-04-21       2016-04-21
6   1503960366     1776       2016-04-14       2016-04-14
7   1503960366     1783       2016-05-11       2016-05-11
8   1503960366     1786       2016-04-20       2016-04-20
9   1503960366     1788       2016-04-24       2016-04-24

I'm not looking for the minimum and maximum calories, simply, the "minimum" and "maximum" dates...meaning, 2016-04-12, and 2016-05-12. All three of these codes I just tried had 700+ rows omitted from the results, which signifies they are wrong. There are 33 users, and 2 dates, so there should be 66 rows for results.

I hope this is explained well enough, I'm trying to be better with my questions. I appreciate the time and help.

Almost forgot, I wasn't wanting to create a new dataframe, just see the results. That's why my code starts with just the dataframe. Does it make a difference? I'd prefer the results in the console for viewing. Cheers!

Phil
  • 7,287
  • 3
  • 36
  • 66
DCosta
  • 63
  • 1
  • 1
  • 10

1 Answers1

0

If I understand you correctly, you want to keep all observations in the data frame where ActivityDay is either 2016-04-12 or 2016-05-12, correct? Or do you want to view all values in the range between them?

If so, try:

keeps <- c("2016-04-12", "2016-05-12")

# Keep only those values
df[df$ActivityDay %in% keeps,]

# Keep value in range between 
df[as.Date(df$ActivityDay) %in% seq(min(as.Date(keeps)), max(as.Date(keeps)),1),]

This will show values for the dates that you want.

I was unclear as to what your final data would look like - if I misunderstood, let me know and I will modify my answer. Good luck!

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • I wasn't trying to change the initial dataframe at all, just do some calculations so I could see the information pulled up in the RStudio console. Like a filtered view basically... – DCosta Apr 29 '22 at 19:56
  • I just tried this: ```Daily_Calories %>% filter(ActivityDay == "2016-04-12" && ActivityDay == "2016-05-12") %>% group_by(Id, Calories) %>% arrange(ActivityDay) %>% as.data.frame() ``` Still not working, this is what shows up: ```[1] Id ActivityDay Calories <0 rows> (or 0-length row.names)``` – DCosta Apr 29 '22 at 20:06
  • The above code in the answer doesn’t change the data frame it just prints it in the viewer – jpsmith Apr 29 '22 at 20:57
  • I tried your code from above and it gave me all the dates, not just the 2 dates. I have given up at this point. I appreciate the help. It's amazing to me how much code is out there, and at the same time, how much has not been done yet. This is the 3rd issue I've had that was unsolvable. Also, is it my understanding that there are multiple versions of R language? R and R Markdown are different correct? – DCosta May 03 '22 at 02:08