-1

i hope everyone is doing well

I need some help with generative models.

So im working on a project where the main task is to build a binary classification model. In the dataset which contains 300000 sample and 100 feature, there is an imbalance between the 2 classes where majority class is too much bigger than the minory class. To handle this problem, i'm using VAE (variational autoencoders) to solve this problem. So i started training the VAE on the minority class and then use the decoder part of the VAE to generate new or fake samples that are similars to the minority class then concatenate this new data with training set in order to have a new balanced training set.

My question is : is there anyway to evalutate generative models like vae, like is there a way to know if the data generated is similar to the real one ??

I have read that there is some metrics to evaluate generated data like inception distance and Frechet inception distance but i saw that they have been only used on image data

I wanna know if i can use them too on my dataset ?

Thanks in advance

1 Answers1

0

I believe your data is not image as you say there are 100 features. What I believe that you can check the similarity between the synthesised features and the original features (the ones belong to minority class), and keep only the ones with certain similarity. Cosine similarity index would be useful for this problem.

That would be also very nice to check a scatter plot of the synthesised features with the original ones to see if they are close to each other. tSNE would be useful at this point.

ai-py
  • 177
  • 1
  • 7