Questions tagged [sklearn-pandas]

Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames

Resources

1336 questions
3
votes
1 answer

getting top words from the tf-idf sparse matrix (highest tf-idf value)

I have a list of size 208 (208 arrays of sentences), that looks like: all_words = [["this is a sentence ... "] , [" another one hello bob this is alice ... "] , ["..."] ...] I want to get the words with the highest tf-idf values. I created a…
sheldonzy
  • 5,505
  • 9
  • 48
  • 86
3
votes
1 answer

Move data from a column to seven days in advance - pandas Dataframe

I have a pandas Dataframe with 2 columns. One of them is the index in date format and the other one is a rate R (a number between 0 and 1). How can I add another column to the pandas Dataframe that contains the rate R for the one-week before day? So…
Jorge Garcia
  • 117
  • 9
3
votes
1 answer

Pandas error: “pandas.libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)

First of all, hello everyone. I am halfway though Python Programming for Finance - Creating targets for machine learning labels, and I have a csv with some historical stock data that I'm reading into pandas: def process_data_for_labels(ticker): …
Gabe H. Coud
  • 175
  • 2
  • 14
3
votes
2 answers

Initial visualization of datasets in Scikit - head() command

In considering the potential equivalence of Python to R for data processing, I am working on the basics. In particular, when loading a database, such as Iris in R, the simple command head() produces a beautiful printout on the screen: head(iris) …
Antoni Parellada
  • 4,253
  • 6
  • 49
  • 114
3
votes
1 answer

Get panda Series from csv

I am totally new to machine learning, I am currently playing with MNIST machine learning, using RandomForestClassifier. I use sklearn and panda. I have a training CSV data set. import pandas as pd import numpy as np from sklearn import…
3
votes
2 answers

Is there a way to save the preprocessing objects in scikit-learn?

I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future,…
3
votes
1 answer

How can l read and transform 7z file into csv using Pandas (python)?

l have 7z files that l want to transform them into csv using Pandas to preprocess the data. l have python 2.7. l tried this one : import pandas as pd data = pd.read_csv('train_2011_2012_2013.7z.002', header = None) print data l got this…
heisen
  • 47
  • 1
  • 5
3
votes
1 answer

unorderable types error when importing sklearn

I installed numpy(1.12.0b1), Scipy(0.18) on windows. I also installed sci-kit as well. When i wrote "import sklearn" in python console, it gives an error like this: if np_version < (1, 12, 0): TypeError: unorderable types: str() < int() What will…
user2621675
3
votes
1 answer

Python sklearn poly regression

I'm stuck solving this issue for two days now. I have some datapoints I put in a scatter plot and get this: Which is nice, but now I also want to add a regression line, so I had a look at this example from sklearn and changed the code to…
user3079834
  • 2,009
  • 2
  • 31
  • 63
3
votes
1 answer

iPython (python 2) - ImportError: No module named model_selection

iPython Notebook Python 2 Complaining about this line: from sklearn.model_selection import train_test_split Why isn't model selection working?
user1072337
  • 12,615
  • 37
  • 116
  • 195
3
votes
1 answer

Scikit learn split train test for series

I have a data which include dates in sorted order. I would like to split the given data to train and test set. However, I must to split the data in a way that the test have to be newer than the train set. Please look at the given example: Let's…
Aviade
  • 2,057
  • 4
  • 27
  • 49
3
votes
2 answers

How to keep one single column as a dataframe

I have dataframe with 20 columns and one index. Its shape is something like (100, 20). I want to slice the 3rd column from this dataframe, but want to keep the result as a dataframe of (100,1). If I do a v = df['col3'], I get a Series (which I do…
Prana
  • 693
  • 1
  • 7
  • 16
3
votes
3 answers

Python, Roc curves and ggplot?

I followed a tutorial for displaying the roc curves and the corresponding auc; I never used the ggplot library, thus I cannot understand where is my error. Here the code below: from sklearn import metrics import pandas as pd from ggplot…
ElenaPhys
  • 443
  • 2
  • 5
  • 16
3
votes
1 answer

DICT() and MATPLOTLIB?

I created a dictionary to match the feature importance of a Decision Tree in sklearn with the corresponding name of the feature in my df. Here the code below: importances = clf.feature_importances_ feature_names = ['age','BP','chol','maxh', …
ElenaPhys
  • 443
  • 2
  • 5
  • 16
3
votes
1 answer

Invalid literal for Float error in Python

I am trying to use sklearn and perform linear regression in Python using sklearn library. This is the code I have used to train and fit the model, I am getting the error when I run the predict function call. train, test = train_test_split(h1,…
goutam
  • 657
  • 2
  • 13
  • 35