2

I have some data of an owl being present in the nest box. In a previous question you helped me visualize when the owl is in the box:

owl in box

In addition I created a plot of the hours per day spent in the box with the code below (probably this can be done more efficiently):

import pandas as pd
import matplotlib.pyplot as plt

# raw data indicating time spent in box (each row represents start and end time)
time = pd.DatetimeIndex(["2021-12-01 18:08","2021-12-01 18:11",
                         "2021-12-02 05:27","2021-12-02 05:29",
                         "2021-12-02 22:40","2021-12-02 22:43",
                         "2021-12-03 19:24","2021-12-03 19:27",
                         "2021-12-06 18:04","2021-12-06 18:06",
                         "2021-12-07 05:28","2021-12-07 05:30",
                         "2021-12-10 03:05","2021-12-10 03:10",
                         "2021-12-10 07:11","2021-12-10 07:13",
                         "2021-12-10 20:40","2021-12-10 20:41",
                         "2021-12-12 19:42","2021-12-12 19:45",
                         "2021-12-13 04:13","2021-12-13 04:17",
                         "2021-12-15 04:28","2021-12-15 04:30",
                         "2021-12-15 05:21","2021-12-15 05:25",
                         "2021-12-15 17:40","2021-12-15 17:44",
                         "2021-12-15 22:31","2021-12-15 22:37",
                         "2021-12-16 04:24","2021-12-16 04:28",
                         "2021-12-16 19:58","2021-12-16 20:09",
                         "2021-12-17 17:42","2021-12-17 18:04",
                         "2021-12-17 22:19","2021-12-17 22:26",
                         "2021-12-18 05:41","2021-12-18 05:44",
                         "2021-12-19 07:40","2021-12-19 16:55",
                         "2021-12-19 20:39","2021-12-19 20:52",
                         "2021-12-19 21:56","2021-12-19 23:17",
                         "2021-12-21 04:53","2021-12-21 04:59",
                         "2021-12-21 05:37","2021-12-21 05:39",
                         "2021-12-22 08:06","2021-12-22 17:22",
                         "2021-12-22 20:04","2021-12-22 21:24",
                         "2021-12-22 21:44","2021-12-22 22:47",
                         "2021-12-23 02:20","2021-12-23 06:17",
                         "2021-12-23 08:07","2021-12-23 16:54",
                         "2021-12-23 19:36","2021-12-23 23:59:59",
                         "2021-12-24 00:00","2021-12-24 00:28",
                         "2021-12-24 07:53","2021-12-24 17:00",
                             ])

# create dataframe with column indicating presence (1) or absence (0)
time_df = pd.DataFrame(data={'present':[1,0]*int(len(time)/2)}, index=time)

# calculate interval length and add to time_df
time_df['interval'] = time_df.index.to_series().diff().astype('timedelta64[m]')

# add column with day to time_df
time_df['day'] = time.day

#select only intervals where owl is present 
timeinbox = time_df.iloc[1::2, :]
interval = timeinbox.interval
day = timeinbox.day

# sum multiple intervals per day
interval_tot = [interval[0]]
day_tot = [day[0]]
for i in range(1, len(day)):
    if day[i] == day[i-1]:
        interval_tot[-1] +=interval[i]
    else:
        day_tot.append(day[i])
        interval_tot.append(interval[i])

# recalculate to hours        
for i in range(len(interval_tot)):
    interval_tot[i] = interval_tot[i]/(60)

plt.figure(figsize=(15, 5))       
plt.grid(zorder=0)
plt.bar(day_tot, interval_tot, color='g', zorder=3) 

plt.xlim([1,31])
plt.xlabel('day in December')
plt.ylabel('hours per day in nest box')
plt.xticks(np.arange(1,31,1))
plt.ylim([0, 24])

Now I would like to combine all data in one plot by making a stacked bar chart, where each day is represented by a bar and each bar indicating for each of the 24*60 minutes whether the owl is present or not. Is this possible from the current data structure?

Derek O
  • 16,770
  • 4
  • 24
  • 43
Sander
  • 35
  • 2
  • Am I correct in understanding that the two columns in the index time series data are the time when the first column is the time when they entered the nest and the second column is the time when they left the nest? – r-beginners Dec 24 '21 at 09:32
  • Yes indeed! I noted down the entry time and exit time from camera recordings. Since the amount of data is still relatively small, I could still change to another structure if that makes handling the data in Pandas easier. – Sander Dec 24 '21 at 09:41

1 Answers1

0

The data seems to have been created manually, so I have changed the format of the data presented. The approach I took was to create the time spent and the time not spent, with a continuous index of 1 minute intervals with the start and end time as the difference time and a flag of 1. Now to create non-stay time, I will create a time series index of start and end date + 1 at 1 minute intervals. Update the original data frame with the newly created index. This is the data for the graph. In the graph, based on the data frame extracted in days, create a color list with red for stay and green for non-stay. Then, in a bar graph, stack the height one. It may be necessary to consider grouping the data into hourly units.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import timedelta
import io

data = '''
start_time,end_time
"2021-12-01 18:08","2021-12-01 18:11"
"2021-12-02 05:27","2021-12-02 05:29"
"2021-12-02 22:40","2021-12-02 22:43"
"2021-12-03 19:24","2021-12-03 19:27"
"2021-12-06 18:04","2021-12-06 18:06"
"2021-12-07 05:28","2021-12-07 05:30"
"2021-12-10 03:05","2021-12-10 03:10"
"2021-12-10 07:11","2021-12-10 07:13"
"2021-12-10 20:40","2021-12-10 20:41"
"2021-12-12 19:42","2021-12-12 19:45"
"2021-12-13 04:13","2021-12-13 04:17"
"2021-12-15 04:28","2021-12-15 04:30"
"2021-12-15 05:21","2021-12-15 05:25"
"2021-12-15 17:40","2021-12-15 17:44"
"2021-12-15 22:31","2021-12-15 22:37"
"2021-12-16 04:24","2021-12-16 04:28"
"2021-12-16 19:58","2021-12-16 20:09"
"2021-12-17 17:42","2021-12-17 18:04"
"2021-12-17 22:19","2021-12-17 22:26"
"2021-12-18 05:41","2021-12-18 05:44"
"2021-12-19 07:40","2021-12-19 16:55"
"2021-12-19 20:39","2021-12-19 20:52"
"2021-12-19 21:56","2021-12-19 23:17"
"2021-12-21 04:53","2021-12-21 04:59"
"2021-12-21 05:37","2021-12-21 05:39"
"2021-12-22 08:06","2021-12-22 17:22"
"2021-12-22 20:04","2021-12-22 21:24"
"2021-12-22 21:44","2021-12-22 22:47"
"2021-12-23 02:20","2021-12-23 06:17"
"2021-12-23 08:07","2021-12-23 16:54"
"2021-12-23 19:36","2021-12-24 00:00"
"2021-12-24 00:00","2021-12-24 00:28"
"2021-12-24 07:53","2021-12-24 17:00"
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])

time_df = pd.DataFrame()
for idx, row in df.iterrows():
    rng = pd.date_range(row['start_time'], row['end_time']-timedelta(minutes=1), freq='1min')
    tmp = pd.DataFrame({'present':[1]*len(rng)}, index=rng)
    time_df = time_df.append(tmp)
date_add = pd.date_range(time_df.index[0].date(), time_df.index[-1].date()+timedelta(days=1), freq='1min')
time_df = time_df.reindex(date_add, fill_value=0)
time_df['day'] = time_df.index.day

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8,15))
ax.set_yticks(np.arange(0,1500,60))
ax.set_ylim(0,1440)
ax.set_xticks(np.arange(1,25,1))

days = time_df['day'].unique()
for d in days:
    #if d == 1:
    day_df = time_df.query('day == @d')
    colors = [ 'r' if p == 1 else 'g' for p in day_df['present']]
    for i in range(len(day_df)): 
        ax.bar(d, height=1, width=0.5, bottom=i+1, color=colors[i])

plt.show()

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32
  • The result is indeed what I had in mind; thanks! My laptop takes about 15 minutes to create the plot, though (24 times 1440 individual bars have to be drawn). – Sander Dec 24 '21 at 22:33