Pre-requisite Info:
raw_df is a pandas dataframe that has 16 columns ('eeg_channel_1', 'eeg_channel_2',... 'eeg_channel_16') and 1140 rows (indexed from 125 to 1264) representing 1140 samples of EEG data (sample rate: 125 Hz).
I referenced the example on this page from the pycaret website to write this code https://pycaret.readthedocs.io/en/latest/api/clustering.html
Environment:
python(3.9.1)
pycaret(2.2.3)
scikit-learn(0.24.2)
scikit-plot(0.3.7)
Code I am running:
from pycaret.clustering import *
pca_data = setup(data=raw_df) # simplified setup function
# pca_data = setup(data=raw_df, pca=True, pca_method='linear', pca_components=2) # real setup function
My Goal: I am trying to perform Principal Component Analysis on a dataframe containing time-series data from a 16 channel EEG recording (only 2 channels were recording normal data in this set). I want to plot the first two principal components of the data so I can see if one of the components has a high-correlation with a 10 Hz sin wave (i.e. alpha wave detection). I know that more channels will be important for extracting more principal components, but I just want to get a working PCA proof of concept then iterate.
Issue: When I run the above code (simplified and real version) I get an error that says:
"ValueError: Setting a random_state has no effect since shuffle is False. You should leave random_state to its default (None), or set shuffle=True."
I haven't used the random_state parameter anywhere in my code, and I have also tried adding 1. shuffle=True
, 2. random_state=None
, and 3. session_id=None
to the setup()
parameters, but when I do this I get the following error messages respectively:
- `setup() got an unexpected keyword argument 'shuffle'`
- setup() got an unexpected keyword argument 'random_state'`
- `Setting a random_state has >no effect since shuffle is False. You should leave random_state to its default (None), or set >shuffle=True.`
If someone could help me understand how to properly run the setup function for plotting feature clusters in my EEG data, that would be very helpful. If there is a better/simpler way to extract and plot principal components in this data, that would be equally helpful.