I'm currently doing a machine learning project (a very basic one), and using baseball data from 1871-2015. I want to use a specific set of years to test my prediction on. I'm using the dfply package and then the mask command to take out a certain year, but I need more than just one year taken out. How can I go about this?
Thank you in advance.
I've tried to use "or" and "|" as well as adding () and [].
import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle
import matplotlib.pyplot as pyplot
import pickle
from matplotlib import style
from dfply import *
import statsmodels.api as sm
import statsmodels.formula.api as smf
data = pd.read_csv("team.csv")
data_test = (data >> mask(X.year == 1997))
I want the X.year to be from 1997-2015.