In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.
Questions tagged [dimensionality-reduction]
422 questions
-1
votes
1 answer
Autoencoder for dimesionality reduction in DL4J
I'm trying to write an autoencoder for dimensionality reduction in DL4J, but all the autoencoder examples I can find for DL4J are for outlier…

code monkey
- 1
- 1
-1
votes
1 answer
using an Autoencoder for dimensionality reduction
i have a data set that contains a ton of vector where each vector has 21300 values, naturally i want to reduce the dimension of each vector i.e compress the vectors
my data set is not split into training and testing datasets because i want all the…
-1
votes
1 answer
How do you know if your dataset suffers from high-dimensionality problems?
There seems to be many techniques for reducing dimensionality (pca, svd etc) in order to escape the curse of dimensionality. But how do you know that your dataset in fact suffers from high-dimensionality problems? Is there a best practice, like…

endorphinus
- 119
- 1
- 1
- 8
-1
votes
1 answer
Feature Selection for gene expression data
Can someone please give me some suggestions on which feature selection techniques for gene classification should I use?

meki kachi
- 1
- 1
-1
votes
1 answer
Measuring plots of data with PCA or t-SNE and Matplotlib
My goal is to find out if I can manipulate and measure data from a PCA or t-SNE plot in Python. I want to know if there is a way I can find distances of points from a center of clusters.
I think there is a way but I'm not too sure.

Jerome Ariola
- 135
- 1
- 11
-1
votes
1 answer
Dimensionality reduction - Pyspark
My objective is to find visual similarity between various Double Byte characters when written in a particular font. For instance,
I want to ascertain whether 伊 looks more similar to 達 or more similar to 市. This exercise has to be done for 13,108…

nOObda
- 123
- 1
- 2
- 9
-1
votes
2 answers
PCA vs averaging columns
I have a dataframe with 300 float type columns and 1 integer column which is the dependent variable. The 300 columns are of 3 kinds:
1.Kind A: columns 1 to 100
2.Kind B: columns 101 to 200
3.Kind C: columns 201 to 300
I want to reduce the number of…

sougata saha
- 47
- 6
-1
votes
1 answer
Unsupervised learning reduce dimensionality/clustering
I am trying to understand how can I split my data into clusters using unsupervised learning. For example, k-means method.
I have 20 columns of data and how can it be projected on 2D surface without losing of necessary information from 18…

renataleb
- 21
- 5
-1
votes
2 answers
Deciding to the clustering algorithm for the dataset containing both categorical and numerical variables
I am a newbie in machine learning and trying to make a segmentation with clustering algorithms. However, Since my dataset has both categorical variables (such as gender, marital status, preferred social media platform etc) as well as numerical…

Beg
- 405
- 1
- 5
- 18
-1
votes
1 answer
R: PCA ggplot Error "arguments imply differing number of rows"
I have a dataset:
https://docs.google.com/spreadsheets/d/1ZgyRQ2uTw-MjjkJgWCIiZ1vpnxKmF3o15a5awndttgo/edit?usp=sharing
that I'm trying to apply PCA analysis and to achieve a graph based on graph provided in this…

lydias
- 841
- 1
- 14
- 32
-1
votes
1 answer
Dimension Reduction for Clustering in R (PCA and other methods)
Let me preface this:
I have looked extensively on this matter and I've found several intriguing possibilities to look into (such as this and this). I've also looked into principal component analysis and I've seen some sources that claim it's a poor…

BlueRhapsody
- 93
- 2
- 13
-1
votes
1 answer
Principal Component Analysis being too slow (MLPY Python)
I am using the PCAFast method from the MLPY API in python (http://mlpy.sourceforge.net/docs/3.2/dim_red.html)
The method is executed pretty fast when it learns a feature matrix generated as follows:
x = np.random.rand(100, 100)
Sample output of…

obelix
- 880
- 2
- 16
- 43
-2
votes
1 answer
How to use output of TruncatedSVD() as input to neural network?
I have a dataset(contains sentences) on which I need to perform vectorization and then dimensionality reduction through TruncatedSVD() to reduce no. of features to 100.
Then i want to use that svd output as input to neural network. But i cannot…

BISHAL Adhikari
- 11
- 1
- 4
-2
votes
1 answer
Error when using pca to reduce dimensionality
from sklearn.decomposition import PCA
pca =PCA(n_components =2)
X_PCA =PCA.fit(data_x)
-2
votes
1 answer
Remove single occurrences of words in CountVectorizer
I am using CountVectorizer() to create a term-frequency matrix. I want to delete the vocabulary all of the terms which a frequency of two or less.
Then I use tfidfTransformer() for creating a ti*idf matrix
vectorizer=CountVectorizer()
X…

rootware
- 81
- 1
- 3