Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
-2
votes
1 answer
Pandas dataframe to array for further use
I've got a dataframe which contains a csv of selling KPIs (quantity, article number and the corresponding date)
I need to split the dataframe into multiple with each containing the data to one article number (e.g. frame1= 123, frame2=345 and so…

Seppl98
- 21
- 4
-2
votes
1 answer
scikit learn on jupyter notebook
I tried to run something on jupyter notebook and I got this problem.
does anyone know how to solve it?

Dor Lasri
- 7
- 1
-2
votes
1 answer
could not convert string to float: 'cd9f3b1a-2eb8-4cdb-86d1-5d4c2740b1dc'
I am a newbie in Data Science and Python. So I try to use KMeans from sklearn.
I have information about calls, and I want to find centroids. So I can do it for one phone number, but can't for 10. When I used for-loop I got the mistake "could not…

John
- 1
- 3
-2
votes
1 answer
how we can convert categorical data in a column into numbered data
Lets take an example, suppose my table values are:
subjects
english
mathematics
science
english
science
how can i convert these string data into numbered data as shown in table below.
subjects
1
2
3
1
3

Sunil Sharma
- 249
- 3
- 8
-2
votes
1 answer
How to create a Supervised dataset?
I want to create a dataset with 300 features and instances which are combinations of 0 or 1(boolean).I have to specify the 1's using some id's.How can I do it with python.
for eg: one instance should be like the columns 4,45,213,6,48 should be 1 and…

najmath
- 261
- 1
- 3
- 16
-2
votes
1 answer
sklearn.model_selection 'KFold' object is not iterable
I have a problem with the following code
This the code
# simulate splitting a dataset of 25 observations into 5 folds
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, random_state=None, shuffle=False)
# print the contents of each…

Kirill Poznyak
- 11
- 1
- 1
-2
votes
1 answer
How to add a line as Index in a table by using pandas?
I have a question to use pandas.
I have a table like this :
0 A B C D
1 S D F G
......
and every element of first line is the index of every column.
But I want to add a line at the top of the table, and I want the new line to be the index of the…
-2
votes
1 answer
How to split data?
Lets say I've 1010 number of rows in my data frame. Now I want to split them using train_test_split so that first 1000 rows comes to train data and next 10 rows comes to test data.
# Natural Language Processing
# Importing the libraries
import…
-2
votes
1 answer
Indexing a CSV running into inconsistent number of samples for logistic regression
I'm currently indexing a CSV with values below and running into the error:
ValueError: Found input variables with inconsistent numbers of
samples: [1, 514]
It's examining it as 1 row with 514 columns which emphasize that I have called a specific…

1K20EK303
- 1
- 3
-3
votes
1 answer
sklearn "linear" unresolved reference
I am trying to learn how to use sklearn, TF, pandas within pycharm. I was able to successfully import the above mentioned libraries and test the code to make sure they are functioning by printing the accuracy after train and test. All of the other…

Chup91
- 66
- 7
-3
votes
1 answer
I have a value error that I am not understanding. How can I fix it?
I am getting a ValueError: shape mismatch: objects cannot be broadcast to single shape.
The error occurs when I run the following code:
plt.bar(range(1, 14), pca.explained_variance_ratio_, alpha=0.5,
... align='center')
Traceback (most recent call…

Karina Naranjo
- 5
- 4
-3
votes
1 answer
The result of `ColumnTransformer.fit_transform()` only contains the later transfromer's result
There are 2 transfomers in ColumnTransformer. But the result of ColumnTransformer.fit_transform() only contains the later transfromer's result:
pos_time
array([[1.24100000e+03, 6.27000000e+02, 1.56279701e+09],
[1.27100000e+03, 6.90000000e+02,…

master_db
- 1
- 1
-3
votes
1 answer
MultiLabelBinarizer mixes up data when inverse transforming
I am using sklearn's multilabelbinarizer() to train multiple columns in my machine learning which I use to train my model.
After using it I noticed it was mixing up my data when it inverse transforms it. I created a test set of random values where…

Ethan Kulla
- 333
- 2
- 9
-3
votes
1 answer
How to order categorical string features in order of severity?
If one of the features for my data set is a score that is categorical string like:
Score
X1c
X3a
X1a
X2b
X4
X1a
X1b
X4
Where X1a is the weakest followed by X1b, X1c, X2a, X2b ...X4 with X4 being the strongest, how can I encode it to integers such…

bloodynri
- 543
- 1
- 6
- 14
-4
votes
2 answers
How to remove rows in a column with certain value in Excel file with Python
I have data like this:
I want to remove the rows in user ID_2 column which the data is more than and less than 5 digit

Ridwan K
- 3
- 1