I was wondering how would find estimated values based on several different categories. Two of the columns are categorical, one of the other columns contains two strings of interest and the last contain numeric values I have a csv file called sports.csv
import pandas as pd
import numpy as np
#loading the data into data frame
df = pd.read_csv('sports.csv')
I'm trying to find a suggested price
for a Gym
that have both baseball and basketball as well as enrollment
from 240 to 260 given they are from region
4 and of type
1
Region Type enroll estimates price Gym
2 1 377 0.43 40 Football|Baseball|Hockey|Running|Basketball|Swimming|Cycling|Volleyball|Tennis|Ballet
4 2 100 0.26 37 Baseball|Tennis
4 1 347 0.65 61 Basketball|Baseball|Ballet
4 1 264 0.17 12 Swimming|Ballet|Cycling|Basketball|Volleyball|Hockey|Running|Tennis|Baseball|Football
1 1 286 0.74 78 Swimming|Basketball
0 1 210 0.13 29 Baseball|Tennis|Ballet|Cycling|Basketball|Football|Volleyball|Swimming
0 1 263 0.91 31 Tennis
2 2 271 0.39 54 Tennis|Football|Ballet|Cycling|Running|Swimming|Baseball|Basketball|Volleyball
3 3 247 0.51 33 Baseball|Hockey|Swimming|Cycling
0 1 109 0.12 17 Football|Hockey|Volleyball
I don't know how to piece everything together. I apologize if the syntax is incorrect I'm just beginning Python. So far I have:
import pandas as pd
import numpy as np
#loading the data into data frame
df = pd.read_csv('sports.csv')
#group 4th region and type 1 together where enrollment is in between 240 and 260
group = df[df['Region'] == 4] df[df['Type'] == 1] df[240>=df['Enrollment'] <=260 ]
#split by pipe chars to find gyms that contain both Baseball and Basketball
df['Gym'] = df['Gym'].str.split('|')
df['Gym'] = df['Gym'].str.contains('Baseball'& 'Basketball')
price = df.loc[df['Gym'], 'Price']
Should I do a groupby instead? If so, how would I include the columns Type
==1 Region
==4 and enrollment from 240 to 260 ?