-1

We preprocessed the data of the yelp dataset and added category,subcategory for each restaurant. Our data now contains the rows Business_id, name, review_count, stars received, nearby_school, category, subcategory, is_vegetarian, latitude, longitude.

The columns descriptions are at this link: https://www.yelp.com/academic_dataset

Example row:

__EmsZiRXiUmljbfpOqZig,Awful Arthur's Seafood Co,11,2.5,Virginia Tech,Restaurant,Seafood,no,37.2283389,-80.4142281

We want to understand which type of cuisine (eg. Seafood, Chinese, American, Indian) is more popular near a school. We are new to data analysis. Can someone help give any suggestions how to go about this?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
PSH
  • 1
  • 1
  • The data had multiple categories, so I divided them into category and subcategory as in the example given above. I am planning to cluster the data around the school and thus remove outliers. Then multiply the review count and rating to calculate a value for all the rows. Then separate out the subsets near each school. Then calculate the average for every category,subcategory pair for each subset. The greatest average value would be the most popular restaurant. Would this be the correct way to proceed? – PSH Nov 25 '13 at 19:03

1 Answers1

1

As a very simple analysis, you could create an index for all businesses by their nearby school. And then for each school rank businesses by cuisine and stars received.

There are many patterns you might be able to find in common across universities.

viper
  • 2,220
  • 5
  • 27
  • 33