0

I am using the IMDB database to find the actor/actress with the highest rating and was in the most movies in a given year. I am trying to join the actors dataset with their ratings. Then filter the year and sort the data based on highest rating and movie count.

joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE *;
actorsYear = FILTER actorRating BY(year MATCHES '2000');
groupedYear = GROUP actorsYear BY (year,rating,firstName,lastName);
aggregatedYear = FOREACH groupedYear GENERATE group, COUNT (actorsYear) AS movieCount;
unaggregatedYear = FOREACH aggregatedYear GENERATE FLATTEN(group) AS (year,rating,firstName,lastName);
sortRating = ORDER unaggregatedYear BY rating ASC, count ASC;
dump sortRating; 

The compiler says that the second line is an "Invalid field projection" but I am not sure how to access the year field after joining the two datasets. Does anyone know how to fix this?

ekad
  • 14,436
  • 26
  • 44
  • 46
K8E
  • 13
  • 1
  • 3

1 Answers1

0

After your join, you need to project the fields you want through to your current relation.

joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE ratings::movie as movie
    , ratings::rank as rank, ratings::year as year, actors::firstName as firstName
    , actors::lastName as lastName;

I'm not sure which columns are in which table (other than movie is in both) because you didn't include the two tables, so I just guessed. You can modify the projections as needed.

o-90
  • 17,045
  • 10
  • 39
  • 63