0

My goal is to build a symptom recommendation system

I have 3 columns of data in my excel.

  1. Patient id
  2. Symptoms
  3. Disease detection

For each patient id there is one or more than one symptom that leads to a disease detection. My goal is to find the most relevant symptoms given an input symptom.

I am unable to think of a way to come up with a plan given the data limitation. One idea I have is to transform the data into a matrix with all symptoms as columns and disease as rows. For each disease if there is a symptom mark 1 else put 0 for all other symptoms. Will this approach work? Any idea on how to design this system

Lalit
  • 79
  • 6

1 Answers1

0

You could use scikit learn library to build a predictive model where the classifier is made up of the symptoms and the labels as the disease. You can then analyse which symptoms contribute most to the disease.

Alexander Caskie
  • 357
  • 3
  • 13
  • For that, I shall convert my symptom data into a matrix form right? – Lalit Jul 03 '20 at 15:36
  • scikit-learn works with numpy arrays or pandas dataframes so you will need to process your data in that format. – Alexander Caskie Jul 03 '20 at 15:39
  • I have to convert symptoms into 1 or 0 based on their influence on a disease right? With just 2 columns of textual data, I can never build a model right? – Lalit Jul 03 '20 at 15:42
  • Yes, scikit learn only works with numbers so you will need to convert your data to this. Though scikit can do this for you. I would imagine there are more than 2 symptoms so encoding with 0 and 1 would miss a lot of these. What are the two columns? A list of symptoms? – Alexander Caskie Jul 03 '20 at 15:45
  • No there are 3 columns.Patient id,Symptoms,Disease detection. If there are 5 symptoms for a disease also, I can put 1 for each symptom and 0 for remaining symptoms of my dataset by transforming the entire symptom column data into the column of the matrix right? – Lalit Jul 03 '20 at 15:55
  • Yes, if I've understood you correctly that would work. – Alexander Caskie Jul 03 '20 at 15:57