0

I have a dataset including dozens of health-related variables: some of them are quantitative (such as 'Body Mass Index') and some of them are qualitative (such as the variable isDrinking, that takes 0 (no) and 1 (yes) for values as answers to the question "does the individual ever drink alcohol?").

I am trying to reduce all these variables into one "Global Health Index" that quantifies how good a given individual's health is.

How to do that?

I already tried Principal Component Analysis, but the index I got was quite not robust (meaning a small change in one of the many variables would lead to a great change in one's index) and the explained variance ratio was not that great.

I thought of trying t-SNE (Stochastic Neighbors Embedding), but as I've just discovered this method a few days ago, I am not sure whether my data are well set for t-SNE? I've tried it anyway, and I came up with an Index. In PCA, the explained variance ratio help us know whether the principal component is a "good" or "bad" index, but how to evaluate performances of t-SNE?

Plus, once the t-SNE is done, how would I get the "parameters" of the fitted model, so that if a new data came, I would be able to reduce it without re-fitting the whole new dataset?

Thanks!

Noomkwah
  • 133
  • 6

0 Answers0