Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
-2
votes
1 answer

Pandas dataframe to array for further use

I've got a dataframe which contains a csv of selling KPIs (quantity, article number and the corresponding date) I need to split the dataframe into multiple with each containing the data to one article number (e.g. frame1= 123, frame2=345 and so…
Seppl98
  • 21
  • 4
-2
votes
1 answer

scikit learn on jupyter notebook

I tried to run something on jupyter notebook and I got this problem. does anyone know how to solve it?
-2
votes
1 answer

could not convert string to float: 'cd9f3b1a-2eb8-4cdb-86d1-5d4c2740b1dc'

I am a newbie in Data Science and Python. So I try to use KMeans from sklearn. I have information about calls, and I want to find centroids. So I can do it for one phone number, but can't for 10. When I used for-loop I got the mistake "could not…
John
  • 1
  • 3
-2
votes
1 answer

how we can convert categorical data in a column into numbered data

Lets take an example, suppose my table values are: subjects english mathematics science english science how can i convert these string data into numbered data as shown in table below. subjects 1 2 3 1 3
Sunil Sharma
  • 249
  • 3
  • 8
-2
votes
1 answer

How to create a Supervised dataset?

I want to create a dataset with 300 features and instances which are combinations of 0 or 1(boolean).I have to specify the 1's using some id's.How can I do it with python. for eg: one instance should be like the columns 4,45,213,6,48 should be 1 and…
najmath
  • 261
  • 1
  • 3
  • 16
-2
votes
1 answer

sklearn.model_selection 'KFold' object is not iterable

I have a problem with the following code This the code # simulate splitting a dataset of 25 observations into 5 folds from sklearn.model_selection import KFold kf = KFold(n_splits=5, random_state=None, shuffle=False) # print the contents of each…
-2
votes
1 answer

How to add a line as Index in a table by using pandas?

I have a question to use pandas. I have a table like this : 0 A B C D 1 S D F G ...... and every element of first line is the index of every column. But I want to add a line at the top of the table, and I want the new line to be the index of the…
-2
votes
1 answer

How to split data?

Lets say I've 1010 number of rows in my data frame. Now I want to split them using train_test_split so that first 1000 rows comes to train data and next 10 rows comes to test data. # Natural Language Processing # Importing the libraries import…
-2
votes
1 answer

Indexing a CSV running into inconsistent number of samples for logistic regression

I'm currently indexing a CSV with values below and running into the error: ValueError: Found input variables with inconsistent numbers of samples: [1, 514] It's examining it as 1 row with 514 columns which emphasize that I have called a specific…
-3
votes
1 answer

sklearn "linear" unresolved reference

I am trying to learn how to use sklearn, TF, pandas within pycharm. I was able to successfully import the above mentioned libraries and test the code to make sure they are functioning by printing the accuracy after train and test. All of the other…
Chup91
  • 66
  • 7
-3
votes
1 answer

I have a value error that I am not understanding. How can I fix it?

I am getting a ValueError: shape mismatch: objects cannot be broadcast to single shape. The error occurs when I run the following code: plt.bar(range(1, 14), pca.explained_variance_ratio_, alpha=0.5, ... align='center') Traceback (most recent call…
-3
votes
1 answer

The result of `ColumnTransformer.fit_transform()` only contains the later transfromer's result

There are 2 transfomers in ColumnTransformer. But the result of ColumnTransformer.fit_transform() only contains the later transfromer's result: pos_time array([[1.24100000e+03, 6.27000000e+02, 1.56279701e+09], [1.27100000e+03, 6.90000000e+02,…
master_db
  • 1
  • 1
-3
votes
1 answer

MultiLabelBinarizer mixes up data when inverse transforming

I am using sklearn's multilabelbinarizer() to train multiple columns in my machine learning which I use to train my model. After using it I noticed it was mixing up my data when it inverse transforms it. I created a test set of random values where…
-3
votes
1 answer

How to order categorical string features in order of severity?

If one of the features for my data set is a score that is categorical string like: Score X1c X3a X1a X2b X4 X1a X1b X4 Where X1a is the weakest followed by X1b, X1c, X2a, X2b ...X4 with X4 being the strongest, how can I encode it to integers such…
bloodynri
  • 543
  • 1
  • 6
  • 14
-4
votes
2 answers

How to remove rows in a column with certain value in Excel file with Python

I have data like this: I want to remove the rows in user ID_2 column which the data is more than and less than 5 digit
1 2 3
88
89