-1

I am trying my hand at scikit-learn. I have a very simple dataset of timestamps and gas concentrations in the form of ppm.

Error:

ValueError: Expected 2D array, got 1D array instead:
array=[396.4 394.  395.8 395.3 404.2 400.6 397.7 401.5 394.7 398.9 402.5 394.6
 401.2 401.  399.  398.5 401.3 401.7 406.5 395.9 401.2 399.8 398.2 401.9
 405.4 396.1 402.8 404.4 402.5 400.9 402.8 397.8 399.7 398.4 403.4 401.4
 393.1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

code:

import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

data = pd.read_csv(r"myfilepath.csv")
print(data.shape)

kmeans = KMeans(n_clusters = 2, random_state = 0)
X = data['reading']
kmeans.fit(X)
#clusters = kmeans.fit_predict(data)
print(kmeans.cluster_centers_.shape)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • The function you want to use requires a 2D array such as `[[1, 2], [2, 3]]` and you provide a 1D array instead. – Eskapp Feb 11 '21 at 00:55
  • The message comes with some very clear *suggestions*; did you try them? And where exactly does it happen? Please edit & update your question with the full error trace. – desertnaut Feb 11 '21 at 00:59
  • [Scikit-learn: How to run KMeans on a one-dimensional array?](https://stackoverflow.com/questions/28416408/scikit-learn-how-to-run-kmeans-on-a-one-dimensional-array) - which just uses `reshape(-1,1)`, as advised in the error message if your data has a single feature (like here). – desertnaut Feb 11 '21 at 01:04
  • Only moderators can delete comments, but users can flag them when inappropriate. Have not flagged anything here, but if you left something like [this](https://stackoverflow.com/questions/66055429/weka-building-a-model-to-identify-outliers#comment117096821_66055429), probably it was flagged by someone else as unfriendly/unkind (or possibly worse - i didn't see it so I cannot know what you wrote) – desertnaut Feb 16 '21 at 22:37

1 Answers1

0

I did some more digging and discovered that converting my dataframe to a numpy array and then using python negative indexing fixed my problem

updated code:

import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

# CHANGES
data = pd.read_csv(r"myfilepath.csv").to_numpy()

print(data.shape)

kmeans = KMeans(n_clusters = 2, random_state = 0)

#CHANGES
X = data[:-1]

kmeans.fit(X)
#clusters = kmeans.fit_predict(data)
print(kmeans.cluster_centers_.shape)

plt.scatter(X[:, 0], X[:, 1], s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], s=200, alpha=0.5)