Python module providing a bridge between Scikit-Learn’s Machine Learning methods and pandas-style DataFrames
Questions tagged [sklearn-pandas]
1336 questions
3
votes
1 answer
getting top words from the tf-idf sparse matrix (highest tf-idf value)
I have a list of size 208 (208 arrays of sentences), that looks like:
all_words = [["this is a sentence ... "] , [" another one hello bob this is alice ... "] , ["..."] ...]
I want to get the words with the highest tf-idf values.
I created a…

sheldonzy
- 5,505
- 9
- 48
- 86
3
votes
1 answer
Move data from a column to seven days in advance - pandas Dataframe
I have a pandas Dataframe with 2 columns. One of them is the index in date format and the other one is a rate R (a number between 0 and 1). How can I add another column to the pandas Dataframe that contains the rate R for the one-week before day?
So…

Jorge Garcia
- 117
- 9
3
votes
1 answer
Pandas error: “pandas.libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
First of all, hello everyone. I am halfway though Python Programming for Finance - Creating targets for machine learning labels, and I have a csv with some historical stock data that I'm reading into pandas:
def process_data_for_labels(ticker):
…

Gabe H. Coud
- 175
- 2
- 14
3
votes
2 answers
Initial visualization of datasets in Scikit - head() command
In considering the potential equivalence of Python to R for data processing, I am working on the basics. In particular, when loading a database, such as Iris in R, the simple command head() produces a beautiful printout on the screen:
head(iris)
…

Antoni Parellada
- 4,253
- 6
- 49
- 114
3
votes
1 answer
Get panda Series from csv
I am totally new to machine learning, I am currently playing with MNIST machine learning, using RandomForestClassifier.
I use sklearn and panda.
I have a training CSV data set.
import pandas as pd
import numpy as np
from sklearn import…

user2724028
- 594
- 2
- 8
- 19
3
votes
2 answers
Is there a way to save the preprocessing objects in scikit-learn?
I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future,…

user1367204
- 4,549
- 10
- 49
- 78
3
votes
1 answer
How can l read and transform 7z file into csv using Pandas (python)?
l have 7z files that l want to transform them into csv using Pandas to preprocess the data. l have python 2.7.
l tried this one :
import pandas as pd
data = pd.read_csv('train_2011_2012_2013.7z.002', header = None)
print data
l got this…

heisen
- 47
- 1
- 5
3
votes
1 answer
unorderable types error when importing sklearn
I installed numpy(1.12.0b1), Scipy(0.18) on windows. I also installed sci-kit as well. When i wrote "import sklearn" in python console, it gives an error like this:
if np_version < (1, 12, 0):
TypeError: unorderable types: str() < int()
What will…
user2621675
3
votes
1 answer
Python sklearn poly regression
I'm stuck solving this issue for two days now. I have some datapoints I put in a scatter plot and get this:
Which is nice, but now I also want to add a regression line, so I had a look at this example from sklearn and changed the code to…

user3079834
- 2,009
- 2
- 31
- 63
3
votes
1 answer
iPython (python 2) - ImportError: No module named model_selection
iPython Notebook
Python 2
Complaining about this line:
from sklearn.model_selection import train_test_split
Why isn't model selection working?

user1072337
- 12,615
- 37
- 116
- 195
3
votes
1 answer
Scikit learn split train test for series
I have a data which include dates in sorted order.
I would like to split the given data to train and test set.
However, I must to split the data in a way that the test have to be newer than the train set.
Please look at the given example:
Let's…

Aviade
- 2,057
- 4
- 27
- 49
3
votes
2 answers
How to keep one single column as a dataframe
I have dataframe with 20 columns and one index.
Its shape is something like (100, 20).
I want to slice the 3rd column from this dataframe, but want to keep the result as a dataframe of (100,1).
If I do a v = df['col3'], I get a Series (which I do…

Prana
- 693
- 1
- 7
- 16
3
votes
3 answers
Python, Roc curves and ggplot?
I followed a tutorial for displaying the roc curves and the corresponding auc; I never used the ggplot library, thus I cannot understand where is my error. Here the code below:
from sklearn import metrics
import pandas as pd
from ggplot…

ElenaPhys
- 443
- 2
- 5
- 16
3
votes
1 answer
DICT() and MATPLOTLIB?
I created a dictionary to match the feature importance of a Decision Tree in sklearn with the corresponding name of the feature in my df. Here the code below:
importances = clf.feature_importances_
feature_names = ['age','BP','chol','maxh',
…

ElenaPhys
- 443
- 2
- 5
- 16
3
votes
1 answer
Invalid literal for Float error in Python
I am trying to use sklearn and perform linear regression in Python using sklearn library.
This is the code I have used to train and fit the model, I am getting the error when I run the predict function call.
train, test = train_test_split(h1,…

goutam
- 657
- 2
- 13
- 35