I am using the IMDB database to find the actor/actress with the highest rating and was in the most movies in a given year. I am trying to join the actors dataset with their ratings. Then filter the year and sort the data based on highest rating and movie count.
joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE *;
actorsYear = FILTER actorRating BY(year MATCHES '2000');
groupedYear = GROUP actorsYear BY (year,rating,firstName,lastName);
aggregatedYear = FOREACH groupedYear GENERATE group, COUNT (actorsYear) AS movieCount;
unaggregatedYear = FOREACH aggregatedYear GENERATE FLATTEN(group) AS (year,rating,firstName,lastName);
sortRating = ORDER unaggregatedYear BY rating ASC, count ASC;
dump sortRating;
The compiler says that the second line is an "Invalid field projection" but I am not sure how to access the year field after joining the two datasets. Does anyone know how to fix this?