1

I have data with timestamps like this:

timestamps price
2021/11/8 9:00:00 63
2021/11/8 9:01:00 64
2021/11/8 9:02:00 65
2021/11/8 9:03:00 64
2021/11/13 10:02:00 58
2021/11/11 12:03:00 55

I can read these timestamps and transfer them into timestamps type in python like this:

df["timestamp"] = pd.to_datetime(df['timestamp'])

I need to analyze data for every date by a for-loop.

I think I need to do it in two steps: First, find all dates and save them in a list(Date). Second, match every date from the list(Date) to the original data set to extract all prices. Does anyone know how to do these two steps?

Please notice that:1. This is a big data set, and the timestamps are not sorted. These dates don't increase regularly. 2. There is no period or start time and end time in the data, I don't know the start time and the end time. Of course, I can sort them first to get the start time and end time. But I still don't know how many dates there are. In other words, the timestamps are not continuous for the date.

Suppose, I need to randomly choose 5 prices for each date and sort them by the time without recording the hour, minute and second. Expected output:

timestamps prices
2021/11/8 61
2021/11/8 63
2021/11/8 65
2021/11/8 61
2021/11/8 61
2021/11/11 62
2021/11/11 63
.... ...
Lin.D
  • 23
  • 5
  • Welcome to Stack Overflow. Please show the DataFrame in a format that can be copied and pasted directly into a Python program, namely `print(df.head().to_dict('list'))`. Also, remember that the best solutions to pandas (and numpy) questions usually do not involve a for loop. – Steele Farnsworth Feb 15 '22 at 15:17
  • "First, find all dates" I assume this means "find all unique calendar days". You can achieve this with `df["Time"].dt.normalize().unique()`, which turns every timestamp into the timestamp for midnight on that respective day, and then retains only unique values, returning a numpy array. – Steele Farnsworth Feb 15 '22 at 15:23
  • "match every date from the list(Date) to the original data set to extract all prices." what if a day appears more than once? it sounds like this might be a roundabout way of rounding all the timestamps in the original dataframe down to midnight, without any other changes. – Steele Farnsworth Feb 15 '22 at 15:24
  • `df.groupby(df['timestamps'].dt.date).sample(5)` – mozway Feb 15 '22 at 15:24

0 Answers0