Reverse decision tree for generation of synthetic data

Question

All,

I have been working on a synthetic (patient) data generator. Off-the-shelve solution such as Synthea are sadly not usable for us because of our custom data models.

So I set out to create one my self. I ended up implementing a Yaml file which describes the flow of steps patients go through in their (lung cancer screening) hospital workflows.

What I ended up with was a tree structure with all option having certain probabilities, for example (unrealistic):

LunRADS_scores:
  [0]:
    probability: 10
      follow_up:
        screening_ct
  [1]:
    probability: 90
    follow_up:
      diagnostic_pet_scan
        probability: 50
      tissue_biopsy:
        probability: 50
        follow_up:
          positive:
           probability: 10
           follow_up: Treatment
          negative:
            probability: 90
              follow_up:
                restart_screening_cycle

I am now writing a function that enters this tree and uses Python's random.choices to traverse the tree stochastically using the probabilities. (in reality events have dates and the cycle continuous from a start date until today or until death or another stopping event.)

I just realized that this is a sort of reverse decision tree, I don't classify an item using a probability tree, I generate many items (synthetic events for fictitious patients) according to a probability tree.

My question is: This must be some know problem, perhaps with some know module and a better way of defining the probability tree. It's just that I am a noob and searching for "reverse decision tree" does not yield results I can do something with. Any pointers are very welcome.

score 1 · Answer 1 · answered Aug 21 '22 at 17:10

1

I think it is better described as a bayesian network, having such structure you can infer some useful statistics such as the patient distribution given some constraints using well established inference algorithms

answered Aug 21 '22 at 17:10

Paul_0

358
4
11

Reverse decision tree for generation of synthetic data

1 Answers1