Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
0
votes
0 answers
Inconsistent SpectralEmbedding results from python
I am trying to translate some code from python into scala. I am specifically looking to generate the same results as generated by SpectralEmbedding. At the moment I am investigating the LaplacianEigenmap from https://haifengl.github.io/smile/
The…

roblovelock
- 1,971
- 2
- 23
- 41
0
votes
2 answers
Sklearn syntax error?
I am learning by this site(https://pythonprogramming.net/training-testing-machine-learning-tutorial/), part 4. This is my code(copied, with Quandl having lowercase q as is correct in newer version and model_selection instead of cross_validation for…

econ
- 495
- 3
- 6
- 16
0
votes
2 answers
Using similarities.cosine (with dataset) of SurPRISE package python
Briefing:
I'm working over Movielens 100k Dataset for recommendation of movies. So far I've done foll.
Sorting of values
df_sorted_values = df.sort_values(['UserID', 'MovieID'])
print type(df_sorted_values)
Printing Matrix with NaN values
df_matrix…

T3J45
- 717
- 3
- 12
- 32
0
votes
0 answers
Python - Is groupby allowed in Random forest or gradient booster regressors?
I have a dataset like the below:
state year response
1 MA 1 -0.2038564714
2 MA 2 -1.9344440707
3 MA 3 -0.3105101158
4 MA 4 -0.4222270032
5 MA 5 0.6818296904
6 MA 6 1.0094961857
7 …

Doubt Dhanabalu
- 457
- 4
- 8
- 18
0
votes
1 answer
Dataframe applying function to rows with specific condition
Here is a sample from my dataframe:
id DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND
0 2017-04-01 330.0 1234.0 -1.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0
2 2017-04-02 331.0 1235.0…

Oussama Jabri
- 674
- 1
- 7
- 18
0
votes
1 answer
Make a scatterplot from sklearn PCA result for python
I am trying to display a scatterplot of a dataset that I made two dimensional with the PCA function from sklearn. My data is returned as followns:
array([[ -3.18592855e+04, -2.13479310e+00],
[ -3.29633003e+04, 1.40801796e+01],
[…

hY8vVpf3tyR57Xib
- 3,574
- 8
- 41
- 86
0
votes
0 answers
Can someone explain this: TypeError: MinMaxScaler(copy=True, feature_range=(0, 1)) is not JSON serializable
I'm getting the error
TypeError: MinMaxScaler(copy=True, feature_range=(0, 1)) is not JSON
serializable
I've googled it but no results relating to the error. I must be the first in history to get it
I'm trying to scale the data from an ad…

Nick Duddy
- 910
- 6
- 20
- 36
0
votes
0 answers
How to convert yyMMDDHHMMSS to a valid date using pandas function to_datetime() without providing format?
So I'm new to python and pandas. I'm using to_datetime function to convert 1605121950805 into a real date format. If I use:
dates=pd.to_datetime(1605121950805, infer_datetime_format=True,yearfirst=True)
I get: 1970-01-01 00:26:45.121950805
If I use…
0
votes
1 answer
Finding number of clicks per unit of time
Ad-Slot Id Click Time Click IP
0 208878 2017-03-23 18:30:00.059 2405:204:c:3868:f27d:db2c:e2a9:c90c
1 195915 2017-03-23 18:30:00.107 2405:204:4183:6939:d3c2:bf40:ed47:3a6d
2 129192 2017-03-23 18:30:00.309 …

Seema Mudgil
- 365
- 1
- 7
- 15
0
votes
1 answer
Convert word Python Pandas Data Frame into Zero One Data Frame
Input
userID col1 col2 col3 col4 col5 col6 col7 col8 col9
1 Java c c++ php python perl html hadoop nodejs
2 nodejs c# c++ oops css html angular java php
3 php…

Sanjeev singh
- 93
- 13
0
votes
2 answers
How to set pip for python in windows 8
I am a beginner to python programming, i have downloaded the python 3.6.1 version and installed it i have also trying to run with the idle file and the command prompt is coming . i have also set the path to environment variables in windows, but the…

Mandrek
- 1,159
- 6
- 25
- 55
0
votes
0 answers
Importing multiple CSV files into dictionary in Python
I am trying to import multiple CSV files into dictionary and then can access and send each cvs file to another function. I have this code:
def readfile():
path = '~/CSVFiles'
df = pd.DataFrame()
csvFiles = glob.glob(path + "/*.csv")
…

RZA KHK
- 35
- 4
0
votes
0 answers
Pandas : Cannot join text columns
I want to join all the text columns of my dataframe, so that i can fit this into a CountVectorizer.
def populate_distance_metrics(in_df, col_list, prim_col):
vect_data=in_df[col_list[0]].map(str)
print (type(vect_data))
for col,idx in…

AbtPst
- 7,778
- 17
- 91
- 172
0
votes
1 answer
How to simplify deletion of multiple columns?
I want to delete multiple columns from my dataset. These columns are in random positions and I have their names.
For the moment I delete them as follows.
import pandas as pd
data = pd.read_csv('data.cvs')
del data['021']
del data['hg1']
del…

Javi
- 385
- 1
- 3
- 9
0
votes
0 answers
Error with fitting a pipeline for clustering both text and numeric data
I'm having some difficulties with fitting a pipeline on a data set for the purpose of K-means clustering using scikit-learn. For illustration purposes let's say i have a certain DataFrame like the following (only much larger):
df =…

Uri T
- 9
- 2