7

Recently, Microsoft announced the option to download attendance in MS teams to keep track of who is joining and leaving. But, to take attendance in a class team, it is not quite useful since it does not give a proper way to know how much time the participant is there in the meeting.

For example,

df = pd.DataFrame([["Organiser Name","Joined","03/08/2020, 16:30:41"],
["Organiser Name","Left","03/08/2020, 17:03:32"],
["Organiser Name","Joined","03/08/2020, 17:04:25"],
["Student 1","Joined before","03/08/2020, 16:30:41"],
["Student 1","Joined before","03/08/2020, 17:04:27"],
["Student 2","Joined before","03/08/2020, 16:30:41"],
["Student 2","Joined","03/08/2020, 17:04:27"],
["Student 3","Joined","03/08/2020, 16:31:47"],
["Student 3","Joined","03/08/2020, 17:04:27"],
["Student 3","Left","03/08/2020, 17:30:32"],
["Student 4","Joined","03/08/2020, 16:32:01"],
["Student 4","Left","03/08/2020, 16:37:20"],
["Student 4","Joined","03/08/2020, 16:39:27"],
["Student 4","Joined","03/08/2020, 17:04:27"],
["Student 4","Left","03/08/2020, 17:17:19"],
["Student 4","Joined","03/08/2020, 17:19:13"],
["Student 5","Joined","03/08/2020, 16:35:41"],
["Student 5","Left","03/08/2020, 16:36:46"],
["Student 6","Joined","03/08/2020, 16:38:01"],
["Student 6","Left","03/08/2020, 16:40:14"],
["Student 7","Joined","03/08/2020, 17:15:08"],
["Student 7","Left","03/08/2020, 17:15:44"],
["Student 7","Joined","03/08/2020, 17:15:48"],
["Student 7","Left","03/08/2020, 17:15:54"],
["Student 8","Joined","03/08/2020, 17:18:12"],
["Student 8","Left","03/08/2020, 17:19:59"]], columns = ["Full Name","User Action","Timestamp"])

This is an original meeting attendance list (only the names are replaced). Let's say suppose I end the meeting at 03/08/2020, 17:22:00 and I downloaded the attendance 2 minutes earlier,is there any way that I can think of summarizing this data using python? Like,

List of students and his/her duration in which he was there in the class

Then I can compare that with all the students and get the absentees list and as wells as will be able to decide whether to give attendance or not depends on how long the students are in the class.

I don't have any base code in mind since I am confused about the following:

  1. In between I miss some left time but there are two consecutive joined time
  2. When I leave the meeting due to some internet issues and rejoin, then there is no information about who left and joined between myself leaving and rejoining.

Anyone have sorted this problem?

Or any idea to summarize this data?

or any fresh idea to decide the eligibility of the attendance?

Thanks in advance.

Note: The solution I expect is using python, however, solution in any language or in excel also are welcome.

Let's try
  • 1,044
  • 9
  • 20
David
  • 524
  • 1
  • 7
  • 24
  • I can see some students never left, does this mean that they stayed until you ended the meeting? – Cmagelssen Aug 07 '20 at 14:21
  • @ChristianMagelssen yes...by the time of taking downloading the attendance, they will be in the call so they will never be marked as left. – David Aug 07 '20 at 17:14
  • 2
    There are "joined" lines for every student that's already there when you reconnect, see the 17:04:27 lines after rejoin at 17:04:25. Similar to the 'joined before 16:30:41' lines. If you don't leave for long, the tracking will still be OK-ish with a no-op join (or a delayed join). The major problem is folks who leave while you're disconnected. A probable scenario: "instructor left, I'm gone". You won't know if they left then or if they were still connected when you downloaded the data. If I read the data right, it can be coded, but won't be reliable. Might not be worth the trouble. – volante Aug 08 '20 at 04:30
  • @volante Yeah.. exactly.... thats the scenario... – David Aug 08 '20 at 13:02
  • My bad: you *would* know who left during instructor disconnect because they would not have a "redundant" join when the instructor reconnected. I'll have a go at it. – volante Aug 09 '20 at 06:34
  • if you organise the data in the columns `"person", "join_or_left", "time"` then you can groupby by person and calculate the time difference easily – Serg Nov 03 '21 at 03:58

5 Answers5

1

Definition of Attendance

I think the main problem here is a sensible definition of attendance. For the specific use case of teacher (= organizer) and students this is straightforward:

"Attendance is the period in which both teacher/organizer and students are present."

Rationale:

  • If students log on early, the lesson has not started yet. They aren't attending anything.
  • If students log off while the teacher is still logged on, they will miss something.
  • It doesn't matter whether students log off/on while the teacher is away, they don't miss anything. (Note: This is a major difference between virtual and real classroom.)

That is exactly how MS Teams thinks about attendance. From the MS Office website:

Keep in mind that you'll only be able to download the attendance report while the meeting is in progress and participants are still present. [...] If attendees join a meeting before the organizer, their join times will match the time that the organizer joined the meeting.

The data has one big problem: The end time of the meeting is missing. The organizer cannot leave, because then the report cannot be downloaded. But the organizer will always download after the end of the lesson. You'll have to provide the end date manually or use a sensible

Answer

With the definition we can center all time periods on the organizer. There is a lot going on in the code. I commented as best as I can.

    # neccessary inputs (not in original data)
    end_timestamp = pd.Timestamp('2020-03-08 17:40')
    organizer_name = 'Organiser Name'

    # Pivot User Action values to columns; we don't need 'Join before'
    df['Timestamp'] = pd.to_datetime(df['Timestamp'])
    df['User Action'] = df['User Action'].str.replace('Joined before', 'Joined')
    df = df.set_index(['Full Name', 'User Action'], append=True).unstack()
    df.columns = df.columns.get_level_values(1)

    # we can (always) shift the 'Left' dates due to underlying data structure
    df['Left'] = df['Left'].shift(-1)
    df = df.dropna(how='all')

    # organizer can only have one missing value: the end value
    mask_organizer = df.index.get_level_values('Full Name') == organizer_name
    df.loc[mask_organizer, 'Left'] = df.loc[mask_organizer, 'Left'].fillna(end_timestamp)
    replace_na_dates = list(df.loc[mask_organizer, 'Left'])

    def fill_missing_dates(join_date, left_date, replace_dates):
        if left_date is not pd.NaT:
            return left_date
        for date in replace_dates:
            if join_date < date:
                return date
        return replace_dates[-1]

    df['Left'] = df.apply(lambda x: fill_missing_dates(x['Joined'], x['Left'], replace_na_dates), axis=1)
    df['Attendance'] = df['Left'] - df['Joined']
    df = df.groupby(level=1).sum()

Output:

                Attendance
Full Name                
Organiser Name   01:08:26
Student 1        01:08:24
Student 2        01:08:24
Student 3        00:57:50
Student 4        01:03:03
Student 5        00:01:05
Student 6        00:02:13
Student 7        00:00:42
Student 8        00:01:47

You may notice that the organizer has two seconds more attendance than everyone else. I think that MS Teams logs the join of the organizer correctly, but it takes a little longer until it has feedback from all participants in the meeting. In other words: it is the time between "I am back" and "Now I can see you all".

above_c_level
  • 3,579
  • 3
  • 22
  • 37
0

I tried my best but I have little experience with TimeSeries, so maybe other people can complete it. Basically, you need to Timestamp your column so that Pandas understand that it is dealing with time and not 'string/object'. Then you need to pivot it. The final step is to calculate the time each student has been in the meeting. Hope this helps you get started.

import pandas as pd
df = pd.read_csv('vg.csv', delimiter=",", sep=",")
df.head()

df['Timestamp'] = pd.to_datetime(df.Timestamp)

df['Hour'] = df.Timestamp.dt.hour #converting it to hour
df['Minutes'] = df.Timestamp.dt.minute #converting it to minutes
df['Sek'] = df.Timestamp.dt.minute #converting it to sexcunds
df['Ended'] = "18:00:00"
df['Ended'] = pd.to_datetime(df.Ended)

df.rename(columns={'   Full Name': 'Person'}, inplace=True) #Sorry I wanted to change your Columns name

result = df.pivot_table(index='Person',
                         columns='User Action',
                         values=['Hour', 'Minutes', "Sek", "Ended"])
print(result)
#

enter image description here

Cmagelssen
  • 620
  • 5
  • 21
0

An essential point is allowing timestamp delta calculations by using to_datetime(). The attendance times could be simply accumulated in a dict over the participants:

import pandas as pd

df = pd.DataFrame(...)  # as given


def main():
    col_n, col_a, col_t = df.columns  # just for readability: name, action, time
    i_n, i_a, i_t = range(3)
    df[col_t] = pd.to_datetime(df[col_t])  # for calculating
    meeting_end, meeting_begin = max(df[col_t]), min(df[col_t])
    meeting_duration_secs = (meeting_end - meeting_begin).total_seconds()
    names = sorted(set(df[col_n]))
    attendance = {}  # summarize time deltas per name
    for name in names:
        arr = df[df[col_n] == name].values  # ndarray of current attendee slice
        assert arr[0][i_a].startswith("Joined")
        attendance[name] = 0.
        for i in range(len(arr) - 1):
            row_1, row_2 = arr[i], arr[i+1]
            if row_1[i_a].startswith("Joined") and row_2[i_a] == "Left":
                attended = row_2[i_t] - row_1[i_t]
                attendance[name] += attended.total_seconds()
        if arr[-1][i_a] != "Left":
            attended = meeting_end - arr[-1][i_t]
            attendance[name] += attended.total_seconds()

    name_len = len(max(attendance, key=lambda s: len(s)))
    for name in attendance:
        mins = round(attendance[name] / 60., 1)
        perc = round(100. * attendance[name] / meeting_duration_secs, 1)
        print(f"{name:<{name_len}} {mins:5} min {perc:5} %")


if __name__ == '__main__':
    main()

...not sure the logic covers all circumstances, like actions might have further names than ["Joined", "Joined before", "Left"], and whether it holds, the first participant entry always startswith("Joined")—maybe use meeting_begin then if not.

thoku
  • 1,120
  • 9
  • 27
0

This solution processes ordered events and tracks state (present or not) for whoever shows up.

Major assumptions are:

  • only time with the data owner present counts (e.g. "Organiser Name")
  • the end of the session is the last recorded event

Other notes are in the code.

Sanity check on the answer: Organizer joined at 16:30:41, maximum timestamp is 17:30:32, and organizer was gone for about a minute, so maximum possible duration is around 59 minutes, following the logic of @above_c_level for shared time together.

import pandas as pd


class MeetingMonitor:

    def __init__(self, df, owner):
        df.columns = ["who", "action", "timestamp"]
        df['action'] = df['action'].replace('Joined.*', 1, regex=True)
        df['action'] = df['action'].replace('Left.*', 0, regex=True)
        df['timestamp'] = pd.to_datetime(df['timestamp'])
        self.owner = owner
        self.min_join = df.loc[df['who'] == owner, 'timestamp'].min()
        df = df.sort_values('timestamp')
        self.df = df
        self.folks = {}

    def get_report(self):
        self.folks = {}
        self.df.apply(self.handle, axis=1)
        # no data on true end of session, so best guest is last event
        self.everybody_leaves(df['timestamp'].max())
        results = [(self.folks[folk]['who'], self.folks[folk]['duration'])
                for folk in self.folks.keys()]
        results = pd.DataFrame(results, columns=['who', 'duration'])
        results['slack'] = results.duration.max() - results.duration
        return results.sort_values('slack')

    def make_folk(self, event):
        folk = {
            'who': event['who'],
            'duration': pd.Timedelta(0),
            'state': 1,
            'in': max(event.timestamp, self.min_join)
        }
        self.folks[folk['who']] = folk

    def join(self, event):
        self.folks[event['who']]['state'] = 1
        self.folks[event['who']]['in'] = event.timestamp

    def leave(self, who, timestamp):
        if self.folks[who]['state'] == 0:  # everybody leaves
            return
        self.folks[who]['duration'] += timestamp - self.folks[who]['in']
        self.folks[who]['state'] = 0

    def everybody_leaves(self, timestamp):
        for folk in self.folks.keys():
            self.leave(folk, timestamp)

    def handle(self, event):
        if event.who not in self.folks:
            if event.action == 1:
                self.make_folk(event)
                return 1
            else:
                pass  # someone left who wasn't here ... ok
        elif event.action == self.folks[event.who]['state']:
            # this shouldn't happen, mostly because of "everybody leaves" below
            # asymmetric assumption for bad data here,
            #   biased in favor of double joiners *shrug*
            return 1
        elif event.action == 1:
            self.join(event)
            return 1
        elif event.action == 0:
            if event.who == self.owner:
                self.everybody_leaves(event.timestamp)
            else:
                self.leave(event.who, event.timestamp)
            return 1

        # https://waffleguppies.tumblr.com/post/50741279401/just-a-reminder-that-the-nuclear-tesuji-is-a
        raise ValueError("(ノಠ益ಠ)ノ彡" + str(event))


df = pd.DataFrame([["Organiser Name", "Joined", "03/08/2020, 16:30:41"],
                   ["Organiser Name", "Left", "03/08/2020, 17:03:32"],
                   ["Organiser Name", "Joined", "03/08/2020, 17:04:25"],
                   ["Student 1", "Joined before", "03/08/2020, 16:30:41"],
                   ["Student 1", "Joined before", "03/08/2020, 17:04:27"],
                   ["Student 2", "Joined before", "03/08/2020, 16:30:41"],
                   ["Student 2", "Joined", "03/08/2020, 17:04:27"],
                   ["Student 3", "Joined", "03/08/2020, 16:31:47"],
                   ["Student 3", "Joined", "03/08/2020, 17:04:27"],
                   ["Student 3", "Left", "03/08/2020, 17:30:32"],
                   ["Student 4", "Joined", "03/08/2020, 16:32:01"],
                   ["Student 4", "Left", "03/08/2020, 16:37:20"],
                   ["Student 4", "Joined", "03/08/2020, 16:39:27"],
                   ["Student 4", "Joined", "03/08/2020, 17:04:27"],
                   ["Student 4", "Left", "03/08/2020, 17:17:19"],
                   ["Student 4", "Joined", "03/08/2020, 17:19:13"],
                   ["Student 5", "Joined", "03/08/2020, 16:35:41"],
                   ["Student 5", "Left", "03/08/2020, 16:36:46"],
                   ["Student 6", "Joined", "03/08/2020, 16:38:01"],
                   ["Student 6", "Left", "03/08/2020, 16:40:14"],
                   ["Student 7", "Joined", "03/08/2020, 17:15:08"],
                   ["Student 7", "Left", "03/08/2020, 17:15:44"],
                   ["Student 7", "Joined", "03/08/2020, 17:15:48"],
                   ["Student 7", "Left", "03/08/2020, 17:15:54"],
                   ["Student 8", "Joined", "03/08/2020, 17:18:12"],
                   ["Student 8", "Left", "03/08/2020, 17:19:59"]], columns=["Full Name", "User Action", "Timestamp"])

# don't assume data will be nicely ordered, make user specify the owner
mm = MeetingMonitor(df, df.iloc[0, 0])
res = mm.get_report()
print(res)

Output:

              who        duration           slack
0  Organiser Name 0 days 00:58:58 0 days 00:00:00
1       Student 1 0 days 00:58:56 0 days 00:00:02
2       Student 2 0 days 00:58:56 0 days 00:00:02
3       Student 3 0 days 00:57:50 0 days 00:01:08
4       Student 4 0 days 00:53:35 0 days 00:05:23
6       Student 6 0 days 00:02:13 0 days 00:56:45
8       Student 8 0 days 00:01:47 0 days 00:57:11
5       Student 5 0 days 00:01:05 0 days 00:57:53
7       Student 7 0 days 00:00:42 0 days 00:58:16
volante
  • 154
  • 2
  • 8
0

I has the same need today and created a gist for it: https://gist.github.com/smarie/09057f2006fc31616ebd06d41e056ec3

It also generates a histogram plot:

enter image description here

smarie
  • 4,568
  • 24
  • 39