0

I am very new to Django rest framework and stack overflow even, I was struggling with finding a title so please feel free to give a better alternative.

I have a set of jobs that get posted to a database. the jobs are now grouped per machine and per given hour. For now I am only displaying the total count of jobs per hour. My API end point looks like this now : Api end point

Here's where I am struggling: I need to get the duration of the job separated by hour. I have a Datetime when it started and a Datetime when it ended. I also have a total of the duration. For example: suppose a job starts at 11h15 and ends at 12h15. this would mean that for hour 11 the total duration is 45 minutes. For hour 12, the duration is 15 minutes.

I thought of getting the time passed between 11h15 and 12h00 and then the time passed between 12h15 and 13h00. But I am not exactly sure how i could do this. also note that some jobs go for more than an hour and some even more than a day. I am sure there is a specific way to do this in python using timedelta. I am just not experienced enough to understand this.

This is the current code for the API end point:

class MachineDurationViewSet(APIView):
authentication_classes = (SessionAuthentication, DataAppClientTokenAuthentication)

def get(self, request):
    # get names of all machines
    machine_names_queryset = Machine.objects.all()
    machine_names = []
    for obj in machine_names_queryset:
        machine_names.append(obj.name)
        
    machine_list = {}
    sum = []
    queryset = Job.objects.all()

    # per machine, get dur for a given hour,use names to filter
    for name in machine_names:
        qs_grouped_per_machine = queryset.filter(machine__name=name)
        queryset_machine_start = qs_grouped_per_machine.annotate(hour=ExtractHour('dt_start'))
        queryset_machine_end = qs_grouped_per_machine.annotate(hour=ExtractHour('dt_end'))
        for n in range(0,24):
            job_start = queryset_machine_start.filter(hour=n)
            for job in job_start:
                print(job.dt_start)
            sum.append(job_start.count())
            machine_list[name] = sum
        sum = []
    return Response({"machines": machine_list, "machine_count": len(machine_names)})
ben
  • 3
  • 3

1 Answers1

0

This one seems to do the job, explanation is in code. Hope you can extract it to a function and use in your context:

from datetime import datetime
from datetime import timedelta

start = datetime(2019, 2, 15, 0, 38, 42) #datetime.now(tz=None)
hours_added = timedelta(minutes = 150) # change to eg. 1500
end = datetime(2019, 2, 15, 0, 39, 1) #start + hours_added
diff = end - start

print("Start ", start)
print("End", end)

dates_spent = []
SECONDS_IN_HOUR = 3600
full_hours = int(diff.seconds / 3600) + 1 # get hours the task took rounded (ceiling)
for i in range(full_hours + 1): # add '1' because we want to include last 'touched' hour as well
    if i == 0:
        # store first datetime
        dates_spent.append(start)
    if i == full_hours or full_hours == 1: # job spanned only some minutes within same hour
        # store last datetime and ignore next hour
        dates_spent.append(end)
        break
    next_hh = start + timedelta(hours=i + 1)
    # truncate minutes as we need full hours
    next_hh = datetime(next_hh.year, next_hh.month, next_hh.day, next_hh.hour)
    dates_spent.append(next_hh)

print("***")
for d in dates_spent:
    print(d)
print("***")

minutes_as_array = [None]*24

for i in range(len(dates_spent) - 1):
    # get time difference between 'next' hour spent and 'current'
    minutes_in_hour_spent = dates_spent[i + 1] - dates_spent[i]
    print("minutes spent: {}, in hour: {}".format(minutes_in_hour_spent, dates_spent[i]))
    minutes_as_array[dates_spent[i].hour] = minutes_in_hour_spent.seconds / 60

print("===")
print(minutes_as_array)

For my test data it yields:

Start  2019-02-15 00:38:42
End 2019-02-15 00:39:01
***
2019-02-15 00:38:42
2019-02-15 00:39:01
***
minutes spent: 0:00:19, in hour: 2019-02-15 00:38:42
===
[0.31666666666666665, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

So, on index 11 we'll have ~25 minutes, then 60 minutes, then 60 minutes and ~4 minutes.

But this is only going to work for one day - if a job spans multiple days, you'd have to create more complex structure, eg 2d array where first dimension would be 'day' (eg. in a month), then an hour.

Marek Piotrowski
  • 2,988
  • 3
  • 11
  • 16
  • thank you, i am reading through the code, what is hrs_inbetween supposed to be?can it be just an array of 24 datetime values to represent each hour of the day?? – ben Aug 18 '20 at 07:25
  • Edited the code a bit, hope it's clearer now. Name was confusing, sorry. You could pack it into an array I believe - extract hour information from `dates_spent[i]` and use it as an index in an array; extract minutes from `minutes_in_hour_spent` and use it as a value. You'd have to create a 2d array (or similar 'structure'), though, because a task can span multiple days, right? So you can't be sure what day's hour are we talking about. – Marek Piotrowski Aug 18 '20 at 08:03
  • yes, i realized that my questiion was a bit stupid, i ran you code in a python console and i understand what it is doing. The problem is indeed that some jobs span for multiple days and i am not sure how to refractor this code to make it work – ben Aug 18 '20 at 09:31
  • Edit the code a bit, but you'd have to tailor it to your needs I believe. – Marek Piotrowski Aug 18 '20 at 09:37
  • I am also seeing that if i get a negative value instead of 0 minutes for a specific hour {'minutes spent: -1 day, 23:13:52.422956, hour: 2019-02-07 03:00:00+00:00'} – ben Aug 18 '20 at 09:57
  • Share the end and start dates for such cases, please. I never said you won't have to test this code/play with it a bit :) – Marek Piotrowski Aug 18 '20 at 10:05
  • of course, im just saying what my outcome is :) the start date: 2019-02-15 00:38:42.577716+00:00 the end date: 2019-02-15 00:39:01.577716+00:00 I've also added a conversion since i was getting naive datetimes : unawawre_next_hh = datetime(next_hh.year, next_hh.month, next_hh.day, next_hh.hour) next_hh = unawawre_next_hh.replace(tzinfo=pytz.UTC) In my IDE the hh variable shows as not being used as well. not sure what its supposed to do either – ben Aug 18 '20 at 10:13
  • Yup, hh has been unused, removed it. But your example is a corner case where a job does not even span a minute :) Extended the `if` where calculating hours on which job spanned accordingly. Added some prints as well. – Marek Piotrowski Aug 18 '20 at 10:39
  • hey, you really helped me a lot, thank you, i think it is working for me now – ben Aug 18 '20 at 13:26
  • Glad I could help. I believe you can upvote and accept the answer then. Thanks. – Marek Piotrowski Aug 18 '20 at 13:35