lets say i have 2 csv files (very large files),
- the first file represents restaurants and have 6 attributes
restaurant_id
,name
,star_rating
,city
,zone
,closed
- the second file represents the categories of the restaurants and have 2 attributes
restaurant_id
andcategory
So, what i want to do is basically add a column called zone_categories_intersection
to my features that tells me the number of restaurants in the same area (zone) that share at least one category with the restaurant in question.
Since it's the first time i use the pandas librairy, i have a little trouble getting fluent when manipulating tables. I did something like this to figure out the number of restaurants in the area associated with the restaurant in question and add it to my features column.
restaurants['nb_restaurants_zone'] = restaurants.groupby('zone')['zone'].transform('size')
restaurants.head()
features = restaurants[['restaurant_id', 'moyenne_etoiles', 'ville', 'zone', 'ferme', 'nb_restaurants_zone']].copy()
features.head()
#edit
merged = restaurants.merge(categories, on='restaurant_id')
merged.head()
I thought about adding the category.csv
file and merge it with restaurant and map the categorys with the corresponding id's and then figure out a way to apply the second condition (that share at least one category with the restaurant in question)... but i dont really know how to do any of those things
Thank you