0

Beginning with something like this:

SUBJECT  TIME   C_INTAKE
C02      0.00    0.0
C02      48.00   0.2
C02      49.31   0.4 
C02      51.20   0.4
C02      52.61   0.4
C02      55.61   0.6 
C02      82.77   0.9
C02      84.97   1.1
C02      86.39   1.1
.
.
C03      0.00     0.0

I'd like to:

  • make set time bins (e.g. 10, 20, 30, 40, 50, 60, 70, 80, 90, 100); (in reality I have 92 900-s bins)
  • record cumulative intake for each of those time bins
  • and start again for each subject
    SUBJECT  BINTIME C_INTAKE2  
    C02      0.00    0.0
    C02      10.00   0.0
    C02      20.00   0.0 
    C02      30.00   0.0
    C02      40.00   0.0
    C02      50.00   0.4 
    C02      60.00   0.6
    C02      70.00   0.6
    C02      80.00   0.6
    C02      90.00   1.1
    C02     100.00   1.1
    .
    .
    C03      0.00     0.0

Thank you for your advice

EDIT: Thank you for your input so far. In trying to adapt it to a larger data set, I've come across an error. I am brand new to this (but learning from this forum!) so I'm afraid I'm not able to troubleshoot it on my own.

import pandas as pd
import numpy as np

# Import data from url
url = "https://raw.githubusercontent.com/yadatree/AL/main/intake.csv"
intake=pd.read_csv(url)

# select data
subject = intake['SUBJECT']
time = intake['NTIME']
c_intake = intake['cintake']

df = pd.DataFrame({'subject': subject, 'time': time, 'c_intake': c_intake})

# Create bins
bins = list(range(0,82801, 92))

# calculate cumulative sum by subject and bin
df.groupby([df'subject, pd.cut(df.time, bins)]) \
    .agg({'c_intake': 'sum'}) \
    .groupby('subject') \
    .cumsum() \
    .reset_index()

Ouput:

File "<ipython-input-32-ffea10b96606>", line 19
    df.groupby([df'subject, pd.cut(df.time, bins)])     .agg({'c_intake': 'sum'})     .groupby('subject')     .cumsum()     .reset_index()
                                                              ^
SyntaxError: invalid syntax

yt3
  • 13
  • 2
  • 1
    Hey, could you please clarify what you mean by 'cumulative intake' for each bin. Your example output shows `0.4` for the bin `50-60` (presumably), while your input data shows `1.2` for that bin (`0.4 + 0.4 + 0.6`). – Ali Jun 26 '22 at 07:30
  • The solution probably involves using `pd.groupby` with `pd.cut`. – Ali Jun 26 '22 at 07:31
  • Thank you for the input. The C_INTAKE values are cumulative intake so I'd like to have an output with that running cumulative intake. In the example you pointed out, at the 50.0 time point, the cumulative intake is 0.4. If it would be easier for me to not calculate from a cumulative intake column, please let me know – yt3 Jun 26 '22 at 17:10
  • Got it. I've edited my answer to account for this. It's simply calculating the cumulative sum of the totals by bin and sample. – Ali Jun 26 '22 at 19:42
  • Line 19 contains incorrect syntax. You didn't copy and paste the `groupby` statement provided in my answer correctly. Could you please copy and paste it again as it is. – Ali Jun 27 '22 at 20:53

1 Answers1

1

Assuming you want to sum c_intake within a subject and bin group, the following solution should work.

import pandas as pd
import numpy as np

# Create sample data
subjects = [item for subject in ['C02', 'C03', 'C04'] for item in [subject] * 10]
time = np.random.randint(0,100, 30)
c_intake = np.random.rand(30)

df = pd.DataFrame({'subject': subjects, 'time': time, 'c_intake': c_intake})

# Create desired bins
bins = list(range(0,101, 10))

# calculate cumulative sum by subject and bin
df.groupby(['subject', pd.cut(df.time, bins)]) \
    .agg({'c_intake': 'sum'}) \
        .groupby('subject') \
        .cumsum() \
        .reset_index()
Ali
  • 328
  • 2
  • 7