After I've sorted a data table, can I select a value, unsort it, and then save that value and the 5 values below it to a dataframe?

Question

In my last question, I asked how to find a rolling mean, thank you for your help!

However, now that I have the lowest metabolic value of the rolling mean, my table is no longer in order. I want to flag the O2 value from that row, unsort it back into its original position, and average that 02 value with the next 4 values below it. Is this possible?

I'm just not sure how to set the flag to let R know that I want to keep using this value after I've unsorted the table

Alternatively, rather than unsorting and flagging my specified O2 value, would it be easier to just flag the timestamp for that O2 value, find that time in the original sheet, and then select the O2 from that row to do the average with the 5 values below it?

The value I am looking to get is the average V02 value from 13:36 to 14:01 which is: 0.738622117

Date       Time     kCal       VO2      VCO2      Lowest Average    
2020/08/11 13:36:00 0.1796796 0.6212131 0.5481290 1.290649
2020/08/11 13:41:00 0.1796833 0.6212261 0.5481405 1.412320
2020/08/11 18:06:00 0.2475342 0.8529993 0.7080062 1.540823
2020/08/11 13:46:00 0.1796903 0.6212505 0.5481620 1.551518
2020/08/11 18:01:00 0.3073857 1.0778390 0.9221587 1.580908
etc.

EDIT: From Second Solution:

> August11RMR6[,..I:=.I]

> setorder(August11RMR6, VO2_M_1, na.last=T)

> August11RMR6[..I%in%(..I[1]+(0:4)),]

Output:

              DateTime kcal_hr_M_1   VO2_M_1  VCO2_M_1 Sum_6period ..I
1: 2020/08/11 13:36:00   0.1796796 0.6212131 0.5481290    1.290649   1
2: 2020/08/11 13:41:00   0.1796833 0.6212261 0.5481405    1.412320   2
3: 2020/08/11 13:46:00   0.1796903 0.6212505 0.5481620    1.551518   4
4: 2020/08/11 18:06:00   0.2475342 0.8529993 0.7080062    1.540823   3
5: 2020/08/11 18:01:00   0.3073857 1.0778390 0.9221587    1.580908   5

From First Solution:

> August11RMR6[,..I:=.I]

> row.num <- as.numeric(August11RMR6[order(VO2_M_1)[1], "..I"])

> row.num
[1] 1

> August11RMR6[row.num+(0:4)]

https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame/50906379#50906379 You should be able to create a rank column based on O2 and then filter on that. — hannes101, Oct 17 '20 at 06:21
Please add data using `dput` and show the expected output for the same. Read about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, Oct 17 '20 at 08:18
@hannes101 I see where you're coming from when you suggest a rank column, but my issue is the 3rd point on that list isn't one that I want. I want "13:36:00" and the 5 values after that in the original document (that I sorted). Sorting it smallest to largest gives me the first 3-4 points, but then the 5th is larger and gets put further down the list. What I'm looking for is R to flag that top VO2 value, search for it in the original list, and then select it and 5 values below it...not sure if that is possible, but thank you for your help either way! I appreciate it. — qsxsqsxs, Oct 17 '20 at 16:58
@RonakShah I added my expected answer and the data (hopefully in an acceptable format...if not, feel free to let me know and I can try to fix it) Thank you for your help, I didn't know the dput function existed, I'll make sure to use it in the future. — qsxsqsxs, Oct 17 '20 at 17:33
@AbdessabourMtk Done! I pasted the raw dput data and updated the question, if I should have formatted it differently, feel free to let me know and I can fix it! — qsxsqsxs, Oct 17 '20 at 19:19

Abdessabour Mtk · Accepted Answer · 2020-10-18T09:28:34.357

0

# save the actual row numbers to a new column
dfs[,..I:=.I]
# get the row number of the smallest value
row.num <- as.numeric(dfs[order(VO2_M_1)[1], "..I"])
row.num
[1] 1
# get the 4 values that follow the smallest one
dfs[row.num+(0:4)]
#>               DateTime kcal_hr_M_1   VO2_M_1  VCO2_M_1  roll_sum ..I
#> 1: 2020/08/11 13:36:00   0.1796796 0.6212131 0.5481290 0.9647027   1
#> 2: 2020/08/11 13:41:00   0.1796833 0.6212261 0.5481405 1.1109694   2
#> 3: 2020/08/11 13:46:00   0.1796903 0.6212505 0.5481620 1.2326371   3
#> 4: 2020/08/11 13:51:00   0.1797111 0.6213223 0.5482253 1.3718277   4
#> 5: 2020/08/11 13:56:00   0.2459384 0.8402357 0.7533433 1.5177674   5

EDIT

# save the actual row numbers to a new column
dfs[, ..I:=.I]
# set the order of the data.table
setorder(dfs, VO2_M_1, na.last=T)
# get the 4 values that follow the smallest one
dfs[..I%in%(..I[1]+(0:4)),]
#>               DateTime kcal_hr_M_1   VO2_M_1  VCO2_M_1  roll_sum ..I
#> 1: 2020/08/11 13:36:00   0.1796796 0.6212131 0.5481290 0.9647027   1
#> 2: 2020/08/11 13:41:00   0.1796833 0.6212261 0.5481405 1.1109694   2
#> 3: 2020/08/11 13:46:00   0.1796903 0.6212505 0.5481620 1.2326371   3
#> 4: 2020/08/11 13:51:00   0.1797111 0.6213223 0.5482253 1.3718277   4
#> 5: 2020/08/11 13:56:00   0.2459384 0.8402357 0.7533433 1.5177674   5

edited Oct 18 '20 at 09:28

answered Oct 17 '20 at 20:42

Abdessabour Mtk

3,895
2
14
21

I'm not getting the same result, what did you put for dfs? I might have an issue with "as.numeric(August11RMR6[order(VO2_M_1)[1], "..I"]) because it is not sorting – qsxsqsxs Oct 18 '20 at 05:28
Here is my output that did not sort: structure(list(DateTime = c("2020/08/11 13:36:00", "2020/08/11 13:41:00", "2020/08/11 18:06:00", "2020/08/11 13:46:00", "2020/08/11 18:01:00" ), kcal_hr_M_1 = c(0.1796796, 0.1796833, 0.2475342, 0.1796903, 0.3073857), VO2_M_1 = c(0.6212131, 0.6212261, 0.8529993, 0.6212505, 1.077839), VCO2_M_1 = c(0.548129, 0.5481405, 0.7080062, 0.548162, 0.9221587), Sum_6period = c(1.290649, 1.4123204, 1.540823, 1.551518, 1.5809082), ..I = 1:5), row.names = c(NA, -5L), class = c("data.table", "data.frame"), .internal.selfref = ) – qsxsqsxs Oct 18 '20 at 05:33
And here is what I inputted (without the <): >August11RMR6[,..I:=.I] >row.num <- as.numeric(August11RMR6[order(VO2_M_1)[1], "..I"]) >row.num >August11RMR6[row.num+(0:4)] – qsxsqsxs Oct 18 '20 at 05:35
you want the output to be sorted?? my code doesn't actually keep the sorted df. but this can be easily solved – Abdessabour Mtk Oct 18 '20 at 09:20
@qsxsqsxs check the edit. And I'm using the data you provided. Also this way is memory efficient as you only need one data.table to keep all the data. – Abdessabour Mtk Oct 18 '20 at 09:31
I'm sorry, I'm still not getting the same output as you, even with the same code. your 4th and 5th values in the table are from 13:51:00 and 13:56:00 which is exactly what I want, but on my end it displays 18:06:00 and 18:01:00 as my 4th and 5th values in the table Here is my code that I am inputting: > August11RMR6[,..I:=.I] > setorder(August11RMR6, VO2_M_1, na.last=T) > August11RMR6[..I%in%(..I[1]+(0:4)),] – qsxsqsxs Oct 18 '20 at 16:15
@qsxsqsxs can u add the output u get to the question? so I can debug it – Abdessabour Mtk Oct 18 '20 at 20:17
Done! I have added my outputs as edits from your first and second response. Thank you so much for your help, feel free to let me know if you need anything further/in a different format! – qsxsqsxs Oct 18 '20 at 20:40
@qsxsqsxs the data.table you're using is the same as you have given in the question? – Abdessabour Mtk Oct 18 '20 at 20:53
Yes, it's the same, I'm using "August11RMR6" which is what I extracted with dput. Could there be a package interfering? – qsxsqsxs Oct 18 '20 at 22:29
I've been getting this error the first time I try to send the 1st line of code, however it doesn't occur the second time I try: In `[.data.table`(August11RMR6, , `:=`(..I, .I)) : Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. – qsxsqsxs Oct 18 '20 at 22:33
convert the data.table to data.frame then reconvert to data.table. and check if the error reoccurs – Abdessabour Mtk Oct 19 '20 at 11:06
Hello, I tried that and the error no longer occurs, but I am still getting the same incorrect values. I used dfAugust11RMR6 <- as.data.frame(August11RMR6) NewDTAugust11RMR6 <- setDT(dfAugust11RMR6). Is this correct? – qsxsqsxs Oct 19 '20 at 16:04
`setDT` turns a data.frame to a data.table by reference ie the object provided becomes a data.table => dfAugust11RMR6 has became a data.table – Abdessabour Mtk Oct 19 '20 at 16:26
the ..I values are correct in the output. so The code is running correctly. the problem maybe is coming from the data. again are u using the same data you provided in the Q? – Abdessabour Mtk Oct 19 '20 at 16:28
I am using the same data that I provided in the Q...but I just noticed that our "I" values don't match up. For me, 13:46:00 has I value 4, 13:51:00 has I value 24 and 13:56:00 has I value 76, while for you, they are 3, 4, and 5. In my original data, I did arrange() with dplyr, I think that's what was causing the mistake! – qsxsqsxs Oct 19 '20 at 16:56
I am now getting the same answer as you, at least for the August 11 dataset! Thank you! – qsxsqsxs Oct 19 '20 at 16:57

After I've sorted a data table, can I select a value, unsort it, and then save that value and the 5 values below it to a dataframe?

1 Answers1

EDIT