Questions tagged [dimensionality-reduction]

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.

422 questions
4
votes
1 answer

How to use the 'sphereize data' option with PCA in TensorFlow

I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/ I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation,…
Ozgur
  • 994
  • 7
  • 13
4
votes
0 answers

Dimensionality reduction on repeated measures: PCA? MFA? (FactoMineR)

I have a repeated measures sample, where each participant was asked to complete a sleep survey over the course of 5 years (baseline though year 4 of follow-up). Each survey item is fairly correlated (e.g. when you go to bed correlates with your…
Jess G
  • 188
  • 9
4
votes
1 answer

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to…
4
votes
0 answers

I got this error n_components=1000 must be between 0 and min(n_samples, n_features)=2 with svd_solver='full'

I want to reduce the dimension of matrix of size (5,3844) using pca dimensionality reduction.I got this error n_components=1000 must be between 0 and min(n_samples, n_features)=2 with svd_solver='full'. I am trying for more than 5 weeks to find how…
Eda
  • 565
  • 1
  • 7
  • 18
4
votes
1 answer

Reducing size of Facebook's fastText

I am building a machine learning model which will process documents and extract some key information from it. For this, I need to use word embedding for OCRed output. I have several different options for the embedding (Google's word2vec, Stanford's,…
ironhide012
  • 85
  • 1
  • 2
  • 7
4
votes
0 answers

t-SNE plot has almost a full overlap but classifier cross-validation score is high?

I am attempting a binary classification task on a dataset of around 20000 samples and 40 features. I have manually curated the dataset - each feature is a topic and the value for that feature and that sample is the sentiment associated with the…
4
votes
0 answers

t-SNE Choosing the Number of Dimensions

I am using t-SNE for exploratory data analysis. I am using this instead of PCA because PCA is linear and t-SNE is non-linear. It's really straight-forward to know how many dimensions are required to capture the necessary variance with PCA. How do…
4
votes
1 answer

Is it possible to use scikit TSNE on a large sparse matrix?

The scikit documentation explains fit_transform can only be used for dense matrices, but I have a sparse matrix in csr format which I want to perform tsne on. The documentation says to use the fit method for sparse matrices, but this doesn't return…
PyRsquared
  • 6,970
  • 11
  • 50
  • 86
4
votes
1 answer

How to keep User ID using Rtsne package

I want to use T-SNE to visualize user's variable but I want to be able to join the data to the user's social information. Unfortunately, the output of Rtsne doesn't seems to return data with the user id.. The data looks like this: client_id…
MathLal
  • 382
  • 3
  • 12
4
votes
1 answer

Split X into test/train before pre-processing and dimension reduction or after? Machine Learning

I have been completing Microsoft's course DAT210X - Programming with Python for Data Science. When creating SVC models for Machine Learning we are encouraged to split out the dataset X into test and train sets, using train_test_split from sci-kit…
QHarr
  • 83,427
  • 12
  • 54
  • 101
4
votes
1 answer

How to use Python's feature agglomeration for dimensionality reduction?

I searched up ways to implement dimensionality reduction in Python and this is the result that I got: http://scikit-learn.org/stable/modules/unsupervised_reduction.html. The last method shown in that website was feature agglomeration. I clicked on…
4
votes
1 answer

Bring Word2Vec models efficiently into Production Service

This is kind of a long shot, but I am hoping that someone has been in a similar situation as I am looking for some advice how to efficiently bring a set of large word2vec models into a production environment. We have a range of trained w2v models…
4
votes
1 answer

How to use eigenvectors obtained through PCA to reproject my data?

I am using PCA on 100 images. My training data is 442368x100 double matrix. 442368 are features and 100 is number of images. Here is my code for finding the eigenvector. [ rows, cols] =…
Rafay Zia Mir
  • 2,116
  • 6
  • 23
  • 49
4
votes
1 answer

Independent component analysis (ICA) in Python

Is there any available package in python to perform Independent Component Analysis (ICA)? please provide some pointers and links so that i can start with python for the same.
Nitin
  • 2,572
  • 5
  • 21
  • 28
4
votes
1 answer

How to deal with different sizes of sentences when giving them as input to a Neural Network?

I am giving a sentence as input to a tree structured Neural Network, where the leaf nodes will be the word vectors of the words in the sentence. That tree will be a binarized constituency(see the binary vs n-ary branching section) parse tree. I am…
Azrael
  • 690
  • 7
  • 12
1 2
3
28 29