All,
I have been working on a synthetic (patient) data generator. Off-the-shelve solution such as Synthea are sadly not usable for us because of our custom data models.
So I set out to create one my self. I ended up implementing a Yaml file which describes the flow of steps patients go through in their (lung cancer screening) hospital workflows.
What I ended up with was a tree structure with all option having certain probabilities, for example (unrealistic):
LunRADS_scores:
[0]:
probability: 10
follow_up:
screening_ct
[1]:
probability: 90
follow_up:
diagnostic_pet_scan
probability: 50
tissue_biopsy:
probability: 50
follow_up:
positive:
probability: 10
follow_up: Treatment
negative:
probability: 90
follow_up:
restart_screening_cycle
I am now writing a function that enters this tree and uses Python's random.choices to traverse the tree stochastically using the probabilities. (in reality events have dates and the cycle continuous from a start date until today or until death or another stopping event.)
I just realized that this is a sort of reverse decision tree, I don't classify an item using a probability tree, I generate many items (synthetic events for fictitious patients) according to a probability tree.
My question is: This must be some know problem, perhaps with some know module and a better way of defining the probability tree. It's just that I am a noob and searching for "reverse decision tree" does not yield results I can do something with. Any pointers are very welcome.