How to extract the certain values from a column indexed by another column?

Question

I have the following data and I am trying to extract the final value of the cumsum column. For instance I would like to take row 8 from the dataframe with cumsum value 0.7810417 then do the same for 1115 with cumsum value 0.7282639 and 1116 ... 1117 etc.

structure(list(WEEK = c(1114L, 1114L, 1114L, 1114L, 1114L, 1114L, 
1114L, 1114L, 1115L, 1115L, 1115L, 1115L, 1115L, 1115L, 1115L, 
1116L, 1116L, 1116L, 1116L, 1116L, 1116L, 1117L, 1117L, 1117L, 
1117L, 1118L, 1118L, 1118L, 1118L, 1118L, 1118L, 1118L, 1119L, 
1119L, 1119L, 1119L, 1119L, 1119L, 1119L, 1119L, 1120L, 1120L, 
1120L, 1120L, 1120L, 1120L, 1120L, 1121L, 1121L, 1121L), price_per_ounce = c(0.124583333333333, 
0.0832638888888889, 0.100972222222222, 0.100972222222222, 0.10375, 
0.0832638888888889, 0.100972222222222, 0.0832638888888889, 0.124583333333333, 
0.0855555555555556, 0.109166666666667, 0.0832638888888889, 0.110972222222222, 
0.10375, 0.110972222222222, 0.124583333333333, 0.0841666666666667, 
0.110972222222222, 0.110972222222222, 0.0832638888888889, 0.110972222222222, 
0.0838888888888889, 0.110972222222222, 0.0832638888888889, 0.110972222222222, 
0.124583333333333, 0.0786111111111111, 0.110972222222222, 0.110972222222222, 
0.10375, 0.110972222222222, 0.0832638888888889, 0.124583333333333, 
0.0925, 0.110972222222222, 0.0832638888888889, 0.110972222222222, 
0.10375, 0.110972222222222, 0.0832638888888889, 0.124583333333333, 
0.0844444444444444, 0.110972222222222, 0.0832638888888889, 0.10375, 
0.110972222222222, 0.0832638888888889, 0.124583333333333, 0.0694444444444444, 
0.110972222222222), Total = c(0.124583333333333, 0.207847222222222, 
0.308819444444444, 0.409791666666667, 0.513541666666667, 0.596805555555556, 
0.697777777777778, 0.781041666666667, 0.124583333333333, 0.210138888888889, 
0.319305555555556, 0.402569444444444, 0.513541666666667, 0.617291666666667, 
0.728263888888889, 0.124583333333333, 0.20875, 0.319722222222222, 
0.430694444444444, 0.513958333333333, 0.624930555555556, 0.0838888888888889, 
0.194861111111111, 0.278125, 0.389097222222222, 0.124583333333333, 
0.203194444444444, 0.314166666666667, 0.425138888888889, 0.528888888888889, 
0.639861111111111, 0.723125, 0.124583333333333, 0.217083333333333, 
0.328055555555556, 0.411319444444444, 0.522291666666667, 0.626041666666667, 
0.737013888888889, 0.820277777777778, 0.124583333333333, 0.209027777777778, 
0.32, 0.403263888888889, 0.507013888888889, 0.617986111111111, 
0.70125, 0.124583333333333, 0.194027777777778, 0.305)), .Names = c("WEEK", 
"price_per_ounce", "Total"), class = c("data.table", "data.frame"
), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000008450788>)

EDIT: The results I obtain when I set the dataframe to be df <- head(df, 100)

structure(list(df = c(0.781041666666667, 0.728263888888889, 0.624930555555556, 
0.389097222222222, 0.723125, 0.820277777777778, 0.70125, 0.658611111111111, 
0.769583333333333, 0.759027777777778, 0.751666666666667, 0.741597222222222, 
0.519930555555556, 0.712152777777778)), .Names = "df", row.names = c(NA, 
-14L), class = "data.frame")

However when I run in on the whole dataframe I get the following.

structure(list(df = c(220.124649739256, 199.217289598068, 199.774511556463, 
206.738587849235, 205.766197136359, Inf, Inf, Inf, 205.103350187295, 
199.567357907284, 212.900103648094, 200.477169383407, Inf, 203.441435413023
)), .Names = "df", row.names = c(NA, 14L), class = "data.frame")

I am unsure why there are Infvalues now appearing, also the values change significantly, however all I did was skip this command df <- head(df, 100)

Here is the code I am applying (The same thing happens when I apply all the solutions below)

Price <- data %>%
  select(WEEK, price_per_ounce)

test <- transform(Price, Total = ave(price_per_ounce, WEEK, FUN = cumsum))

test <- head(test, 100)
#dput(test)

df <- test[, Total[.N], WEEK]$V1
df <- as.data.frame(df)
df <- head(df, 14)
dput(df)

Did you meant to extract the last value of 'cumsum' column for each 'wks' `df1[, cumsum[.N],wks]$V1` — akrun, Feb 05 '18 at 15:31
With your new data, `df1[, Total[.N], WEEK]$V1# [1] 0.7810417 0.7282639 0.6249306 0.3890972 0.7231250 0.8202778 0.7012500 0.3050000` — akrun, Feb 05 '18 at 16:00
You changed column names, what is the question or expected behavior now? — catastrophic-failure, Feb 05 '18 at 16:01
Thanks for the reply, the code works, but I am still running into `inf` problems when I run it on the whole dataframe. I have edited the original question to show the output I am getting. — user113156, Feb 05 '18 at 16:32
First, stop changing the question: you already have four answers that work exactly like you wanted. Second, check your data, you probably have `Inf` values in `df$Total` (btw `df` is a bad name, since it's a function in `stats` namespace) — catastrophic-failure, Feb 05 '18 at 18:03
It does not solve my problem, the code works great but only on a small section of the data and when I run it on all observations I get errors: `> all(is.finite(df$Total)) [1] TRUE` However here is a screen shot of the exact dataframe I ran the code on https://ibb.co/nNM0gH — user113156, Feb 05 '18 at 18:24
`max(df$Total) [1] -Inf Warning message: In max(df$Total) : no non-missing arguments to max; returning -Inf` — user113156, Feb 05 '18 at 18:53
Your data has issues, solve these first. Then any of the 4 methods presented will work. https://stackoverflow.com/questions/24282550/no-non-missing-arguments-warning-when-using-min-or-max-in-reshape2 — catastrophic-failure, Feb 05 '18 at 19:11

score 3 · Answer 1 · answered Feb 05 '18 at 15:42

3

Base R

aggregate(df$cumsum, by = list(df$wks), FUN = tail, n = 1)

answered Feb 05 '18 at 15:42

dondapati

829
6
18

Thanks it works, but I run into the same problems as I mentioned in the previous comment. – user113156 Feb 05 '18 at 15:58

score 3 · Answer 2 · answered Feb 05 '18 at 16:04

3

The OP's dataset is a data.table. An approach using data.table to get the last row of 'TOTAL for each 'WEEK' element and extract the 'TOTAL' would be

library(data.table)
df1[, Total[.N], WEEK]$V1
#[1] 0.7810417 0.7282639 0.6249306 0.3890972 0.7231250 0.8202778 0.7012500 0.3050000

If the intention is not to extract the column, then we can subset it and keep it as data.table

df1[, .(Total = Total[.N]),  WEEK]

answered Feb 05 '18 at 16:04

akrun

874,273
37
540
662

1

Hmm, `data.table` is more versatile than I thought, brings R much closer to Julia/Matlab indexing facilities [+1] – catastrophic-failure Feb 05 '18 at 16:07

score 2 · Answer 3 · answered Feb 05 '18 at 16:02

tapply(dat$Total, dat$WEEK, tail, 1)
#     1114      1115      1116      1117      1118      1119      1120      1121 
#0.7810417 0.7282639 0.6249306 0.3890972 0.7231250 0.8202778 0.7012500 0.3050000

Explanation

dat$Total is our atomic object, dat$WEEK our grouping variable, tail our function to be applied, 1 is the first argument passed to it, so we get the last value of dat$Total per index in dat$WEEK.

score 1 · Answer 4 · answered Feb 05 '18 at 15:37

1

library(dplyr)   
df %>% group_by(wks)%>%
filter(row_number()==n())

answered Feb 05 '18 at 15:37

Wyldsoul

1,468
10
16

Thanks for the answer, it works but when I apply it onto the whole datasize, I run into `inf` values. I increased the datasize to show this. What could be causing this? – user113156 Feb 05 '18 at 15:57
My guess is that your full data.table needs some cleaning. Maybe this post will help? https://stackoverflow.com/questions/12188509/cleaning-inf-values-from-an-r-dataframe – Wyldsoul Feb 05 '18 at 16:21

How to extract the certain values from a column indexed by another column?

4 Answers4

Explanation