0

I'm currently doing a machine learning project (a very basic one), and using baseball data from 1871-2015. I want to use a specific set of years to test my prediction on. I'm using the dfply package and then the mask command to take out a certain year, but I need more than just one year taken out. How can I go about this?

Thank you in advance.

I've tried to use "or" and "|" as well as adding () and [].

import pandas as pd

import numpy as np 

import sklearn

from sklearn import linear_model

from sklearn.utils import shuffle

import matplotlib.pyplot as pyplot

import pickle 

from matplotlib import style

from dfply import *

import statsmodels.api as sm

import statsmodels.formula.api as smf

data = pd.read_csv("team.csv")

data_test = (data >>  mask(X.year == 1997))

I want the X.year to be from 1997-2015.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Yaz229
  • 3
  • 2
  • 1
    Question has nothing to do with `machine-learning` - kindly do not spam irrelevant tags (removed & replaced with `pandas`). – desertnaut Oct 02 '19 at 13:56

1 Answers1

0

Assuming you have a column year in your pandas.DataFrame, this should work:

data_test = data[data.year == 1997]
AnsFourtyTwo
  • 2,480
  • 2
  • 13
  • 33