I put decimate in the title, but I am not sure that is exactly what I mean. Here is the full description of the issue. I have a dataframe that contains data from several subjects. What I want to do is to analyze data that is X number of days apart. The idea is that I only want to consider data that was collected every, say, 4th day from a subject. The catch here is that the data were collected in parallel for the subjects, so I can't just take every 4th day across subjects but rather need to do the decimation/downsampling/whatever for each subject. The two key columns in the dataframe are "subject" and "session_timestamp". In the latter, the date and time are formatted as in this example: 2017-11-10 16:30:47. Is there a good way to accomplish this in python?
Edit: The first commenters asked for a more concrete example of the dataframe with some example data. Here is a toy dataframe that is similar to what I have and should be easy to work with. The code below creates a dataframe with 4 columns: subjectID, date, score1 and score2. Note that a subject can have more than one entry for a given date (basically, these are neural recordings and each row of the dataframe represents one neuron and we can record more than one neuron per subject)
import pandas as pd
import numpy as np
ab = pd.DataFrame()
ab["subjectID"] = np.random.randint(5, size=200)#random list of "subjects" from 0 to 4
ab["date"] = np.random.randint(20, size=200)#random list of "dates" from 0 to 19
ab["score1"] = np.random.randint(200, size=200)#meant to simulate one measurement from one subject
ab["score2"] = np.random.randint(400, size=200)#meant to simulate a second measurement
What I want to do is to filter for the data (score1 and score2) that was collected at least 4 days apart for each subject. The code could be extremely simple and take the first day that a subject has an entry and every 4th day after that. But a better solution would be if it took the first day, then the next one that is more than 3 days later and then the one that is more than 3 days after that (not every subject has daily samples, so a rigid "every 4th day" code would not be so elegant). All data collected on the allowed days should be included. For example, all data with the day code 0 (if that is the first day of the subject) should be included.