Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
0
votes
0 answers

Inconsistent SpectralEmbedding results from python

I am trying to translate some code from python into scala. I am specifically looking to generate the same results as generated by SpectralEmbedding. At the moment I am investigating the LaplacianEigenmap from https://haifengl.github.io/smile/ The…
roblovelock
  • 1,971
  • 2
  • 23
  • 41
0
votes
2 answers

Sklearn syntax error?

I am learning by this site(https://pythonprogramming.net/training-testing-machine-learning-tutorial/), part 4. This is my code(copied, with Quandl having lowercase q as is correct in newer version and model_selection instead of cross_validation for…
econ
  • 495
  • 3
  • 6
  • 16
0
votes
2 answers

Using similarities.cosine (with dataset) of SurPRISE package python

Briefing: I'm working over Movielens 100k Dataset for recommendation of movies. So far I've done foll. Sorting of values df_sorted_values = df.sort_values(['UserID', 'MovieID']) print type(df_sorted_values) Printing Matrix with NaN values df_matrix…
T3J45
  • 717
  • 3
  • 12
  • 32
0
votes
0 answers

Python - Is groupby allowed in Random forest or gradient booster regressors?

I have a dataset like the below: state year response 1 MA 1 -0.2038564714 2 MA 2 -1.9344440707 3 MA 3 -0.3105101158 4 MA 4 -0.4222270032 5 MA 5 0.6818296904 6 MA 6 1.0094961857 7 …
0
votes
1 answer

Dataframe applying function to rows with specific condition

Here is a sample from my dataframe: id DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND 0 2017-04-01 330.0 1234.0 -1.0 100.0 1 2017-04-01 330.0 1234.0 0.0 80.0 2 2017-04-02 331.0 1235.0…
Oussama Jabri
  • 674
  • 1
  • 7
  • 18
0
votes
1 answer

Make a scatterplot from sklearn PCA result for python

I am trying to display a scatterplot of a dataset that I made two dimensional with the PCA function from sklearn. My data is returned as followns: array([[ -3.18592855e+04, -2.13479310e+00], [ -3.29633003e+04, 1.40801796e+01], […
hY8vVpf3tyR57Xib
  • 3,574
  • 8
  • 41
  • 86
0
votes
0 answers

Can someone explain this: TypeError: MinMaxScaler(copy=True, feature_range=(0, 1)) is not JSON serializable

I'm getting the error TypeError: MinMaxScaler(copy=True, feature_range=(0, 1)) is not JSON serializable I've googled it but no results relating to the error. I must be the first in history to get it I'm trying to scale the data from an ad…
Nick Duddy
  • 910
  • 6
  • 20
  • 36
0
votes
0 answers

How to convert yyMMDDHHMMSS to a valid date using pandas function to_datetime() without providing format?

So I'm new to python and pandas. I'm using to_datetime function to convert 1605121950805 into a real date format. If I use: dates=pd.to_datetime(1605121950805, infer_datetime_format=True,yearfirst=True) I get: 1970-01-01 00:26:45.121950805 If I use…
0
votes
1 answer

Finding number of clicks per unit of time

Ad-Slot Id Click Time Click IP 0 208878 2017-03-23 18:30:00.059 2405:204:c:3868:f27d:db2c:e2a9:c90c 1 195915 2017-03-23 18:30:00.107 2405:204:4183:6939:d3c2:bf40:ed47:3a6d 2 129192 2017-03-23 18:30:00.309 …
Seema Mudgil
  • 365
  • 1
  • 7
  • 15
0
votes
1 answer

Convert word Python Pandas Data Frame into Zero One Data Frame

Input userID col1 col2 col3 col4 col5 col6 col7 col8 col9 1 Java c c++ php python perl html hadoop nodejs 2 nodejs c# c++ oops css html angular java php 3 php…
0
votes
2 answers

How to set pip for python in windows 8

I am a beginner to python programming, i have downloaded the python 3.6.1 version and installed it i have also trying to run with the idle file and the command prompt is coming . i have also set the path to environment variables in windows, but the…
Mandrek
  • 1,159
  • 6
  • 25
  • 55
0
votes
0 answers

Importing multiple CSV files into dictionary in Python

I am trying to import multiple CSV files into dictionary and then can access and send each cvs file to another function. I have this code: def readfile(): path = '~/CSVFiles' df = pd.DataFrame() csvFiles = glob.glob(path + "/*.csv") …
RZA KHK
  • 35
  • 4
0
votes
0 answers

Pandas : Cannot join text columns

I want to join all the text columns of my dataframe, so that i can fit this into a CountVectorizer. def populate_distance_metrics(in_df, col_list, prim_col): vect_data=in_df[col_list[0]].map(str) print (type(vect_data)) for col,idx in…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
0
votes
1 answer

How to simplify deletion of multiple columns?

I want to delete multiple columns from my dataset. These columns are in random positions and I have their names. For the moment I delete them as follows. import pandas as pd data = pd.read_csv('data.cvs') del data['021'] del data['hg1'] del…
Javi
  • 385
  • 1
  • 3
  • 9
0
votes
0 answers

Error with fitting a pipeline for clustering both text and numeric data

I'm having some difficulties with fitting a pipeline on a data set for the purpose of K-means clustering using scikit-learn. For illustration purposes let's say i have a certain DataFrame like the following (only much larger): df =…
Uri T
  • 9
  • 2