Questions tagged [dimensionality-reduction]

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.

422 questions
4
votes
1 answer

Dimensionality Reduction using Self Organizing Maps

I have been working on Self Organizing Maps(SOM) for the past few months.But I still have some confusions in understanding the dimensionaliy reduction part.Can you suggest any simple method to understand the real working of SOMs on any real world…
Pooja
  • 59
  • 3
  • 8
4
votes
1 answer

Reducing dimensionality on training data with PCA in Matlab

This is a follow up question to: PCA Dimensionality Reduction In order to classify the new 10 dimensional test data do I have to reduce the training data down to 10 dimensions as well? I tried: X = bsxfun(@minus, trainingData, mean(trainingData,1));…
user3094936
  • 263
  • 6
  • 12
4
votes
2 answers

How to evaluate dimensionality reduction technique?

I have a dataset of NxM data in binary form. I apply a variety of dimensionality techniques on it and I plot the first two dimensions. This is how I get an intuition of whether the technique is suitable for my dataset or not. Is there a more…
JustCurious
  • 1,848
  • 3
  • 30
  • 57
4
votes
2 answers

Reduce string length by removing contiguous duplicates

I have an R dataframe whith 2 fields: ID WORD 1 AAAAABBBBB 2 ABCAAABBBDDD 3 ... I'd like to simplify the words with repeating letters by keeping only the letter and not the duplicates in a repetition:…
Joe
  • 75
  • 3
4
votes
1 answer

Dimension reduction methods for images

I am trying to reduce the dimensions on an a set of images using Matlab Toolbox for Dimensionality Reduction. Problem is: I know very little about dimension reduction. So am trying each one by trial and error, passing the data set to the function. I…
3
votes
1 answer

After reducing the dimensionality of a dataset, I am getting negative feature values

I used a Dimensionality Reduction method (discussion here: Random projection algorithm pseudo code) on a large dataset. After reducing the dimension from 1000 to 50, I get my new dataset where each sample looks like: [ 1751. -360. -2069. ..., …
3
votes
0 answers

Lost information during dimensionality reduction using umap

I am working with spotify tracks database and trying to understand how columns danceability, liveness and energy affect popularity (use discrete popularity: -1, 0, 1). I want to do dimensionality reduction from three columns to two. Here's the…
3
votes
1 answer

Which dimensionality reduction technique works well for BERT sentence embeddings?

I'm trying to cluster hundreds of text documents so that each each cluster represents a distinct topic. Instead of using topic modeling (which I know I could do too), I want to follow a two-step approach: Create document embeddings with…
3
votes
1 answer

How to evaluate the autoencoder used for dimensionality reduction

I am using an autoencoder as a dimensionality reduction technique to use the learned representation as the low dimensional features that can be used for further analysis. The code snippet: # Note: implementation --> based on keras encoding_dim =…
3
votes
2 answers

Projecting multiple clusters for 2D data using the heighest eigen values from FLD

I have 4 matrices of size 5x5, where the five rows (5xn) are datapoints and the columns (nx5) are the features. As it follows: datapoint_1_class_A = np.asarray([(216, 236, 235, 230, 229), (237, 192, 191, 193, 199), (218, 189, 191, 192, 193), (201,…
3
votes
0 answers

t-SNE using earth mover distance metric

I am trying to use t-SNE with Wasserstrain distance instead of Euclidean. Here is part of my code: from sklearn.manifold import TSNE from scipy.stats import wasserstein_distance tsne = TSNE(n_components=2,perplexity=40, n_iter=1000,…
3
votes
0 answers

Negative eigenvalues in PCA

I've a matrix x (1000*25) that contains random floats in the interval (-5,5). nFeatures=25 and nPoints=1000. I'm using this code to find the eigenvalues of the covariance matrix, but I'm getting negative eigenvalues. #centering the data for i in…
3
votes
1 answer

Sklearn PCA, how to restore mean in lower dimension?

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA. I'm doing a simple principal component analysis with sklearn. As I understand it, the implementation should take care of (1) centering the data…
3
votes
1 answer

deciding to the type of kernel parameter in Kernel PCA

I am new to machine learning and I am trying to do unsupervised learning with k-means clustering (even if I read that k-means cannot work very well with categorical data). I encoded my categorical variables and tried to apply kernel PCA since I have…
3
votes
2 answers

Reducing input dimensions for a deep learning model

I am following a course on deep learning and I have a model built with keras. After data preprocessing and encoding of categorical data, I get an array of shape (12500,) as the input to the model. This input makes the model training process slower…