How to calculate the Precedence Matrix in Python?

Question

Precedence diagrams

The precedence matrix shows the flows from one activity to another in a rectangular format. A precedence diagram is a two-dimensional matrix showing the flows between activities. It can contain different type values, by adjusting the type argument.

In r-programming it can be calculated using bupar package.

#Example

# Absolute Frequencies
patients %>%
    precedence_matrix(type = "absolute")

Output

## # A tibble: 13 x 3
##    antecedent            consequent                n
##    <fct>                 <fct>                 <int>
##  1 Triage and Assessment End                       2
##  2 Blood test            End                       1
##  3 Start                 Registration            500
##  4 Registration          Triage and Assessment   500
##  5 MRI SCAN              Discuss Results         236
##  6 Triage and Assessment Blood test              237
##  7 Blood test            MRI SCAN                236
##  8 Discuss Results       Check-out               492
##  9 X-Ray                 End                       2
## 10 Check-out             End                     491
## 11 X-Ray                 Discuss Results         259
## 12 Triage and Assessment X-Ray                   261
## 13 Discuss Results       End                       3

How to get the precedence matrix using python? Is there any package where a precedence matrix is obtained in python?

There may be a package I haven't heard of to do this, but you could also use an adjacency matrix where each edge is initialized to 0 and then incremented accordingly to indicate flow. — 0x263A, Aug 05 '21 at 16:15
"Adjacency matrix" is a broad term referring to a way of representing graphs, but Networkx does have an implementation of an adjacency matrix (i'm unsure if it would work out of the box for your purpose). Precedence diagrams are just a list of edges in a graph and the weight associated with them — you could implement your own. — 0x263A, Aug 05 '21 at 16:51
I have a simple poc if you would like i can post it as an answer — 0x263A, Aug 05 '21 at 17:25

score 1 · Answer 1 · edited Aug 07 '21 at 04:11

Per OP's request in the comments here is a minimalist proof of concept of a precedence matrix in python using a graph structure.

class PrecedenceDiagram():
    def __init__(self, mode = 'relative'):
        # {<str> origin : { <str> destination: <int> frequency, ... } }
        self.graph = dict()
        self.flow_amount = 0
        self.mode = mode
    
    def update(self, origin, destination):
        '''
        increment frequency if origin exists and destination is in
        origin else instantiate origin/destination appropriately
        '''
        if origin in self.graph:
            if destination in self.graph[origin]:
                self.graph[origin][destination] += 1
            else:
                self.graph[origin][destination] = 1
        else:
            self.graph[origin] = dict()
            self.graph[origin][destination] = 1
        self.flow_amount += 1

    def display_precedence(self):
        '''
        display flow frequency
        '''
        print('O','D','f')
        for node, edges in self.graph.items():
            for edge, weight in edges.items():
                if self.mode == 'absolute':
                    print(node, edge, weight)
                elif self.mode == 'relative':
                    print(node, edge, weight/ self.flow_amount)
        print('-'*16)


pm = PrecedenceDiagram(mode='relative')
pm.update('a', 'b')
pm.update('b', 'c')
pm.update('a', 'b')
pm.update('a', 'b')
pm.update('a', 'd')
pm.update('a', 'e')
pm.update('e', 'a')
pm.update('a', 'n')
pm.update('a', 'b')
pm.update('a', 'b')
pm.update('a', 'b')
pm.display_precedence()
pm.mode = 'absolute'
pm.display_precedence()

How can I apply it for a dataframe?? – Ailurophile Aug 09 '21 at 11:41 — Ailurophile, Aug 09 '21 at 11:41

Ailurophile · Accepted Answer · 2021-08-16T16:10:15.587

Using pm4py discover_dfg

Exampleset

import pandas as pd
import pm4py

df = pm4py.format_dataframe(pd.read_csv('https://raw.githubusercontent.com/pm4py/pm4py-core/release/notebooks/data/running_example.csv', sep=';'), case_id='case_id',activity_key='activity', timestamp_key='timestamp')

Converting data to log

from pm4py.objects.conversion.log import converter as log_converter
log = log_converter.apply(df)

discover_dfg(log) will return such a matrix (as a dict) and also a counter maintaining start and end activities.

d = pm4py.discover_dfg(log)[0]

Data Wrangling

df = pd.DataFrame.from_dict(d, orient='index').reset_index()
df.rename(columns={"index" : "Antecedent,Consequent", 0 : "Count"}, inplace=True)
df['Antecedent'], df['Consequent'] = zip(*df["Antecedent,Consequent"])

Final Output

Antecedent	Consequent	Count
register request	examine thoroughly	1
examine thoroughly	check ticket	2
check ticket	decide	6
decide	reject request	3
register request	check ticket	2
check ticket	examine casually	2
examine casually	decide	2
decide	pay compensation	3
register request	examine casually	3
examine casually	check ticket	4
decide	reinitiate request	3
reinitiate request	examine thoroughly	1
check ticket	examine thoroughly	1
examine thoroughly	decide	1
reinitiate request	check ticket	1
reinitiate request	examine casually	1

How to calculate the Precedence Matrix in Python?

Precedence diagrams

2 Answers2

Linked