0

I am trying to see if a movie is the same between two pages, and to do so I would like to compare the Actors as one of the criteria. However, actors are often listed differently on different pages. For example:

Previously, I was doing a very rough match on:

if actors_from_site_1[0] == actors_from_site_2[0]

But, as you can see from the above case, this isn't a good technique. What would be a better technique to see if the actors from one film match the others?

ArtOfWarfare
  • 20,617
  • 19
  • 137
  • 193
David542
  • 104,438
  • 178
  • 489
  • 842

3 Answers3

2

You could check the length of a set intersection of the two sets of actors.

if len(set(actors_from_site_1).intersection(set(actors_from_site_2))):

or you could do something like:

if any(actor in actors_from_site_1 for actor in actors_from_site_2):
Broseph
  • 1,655
  • 1
  • 18
  • 38
1

If all the lists are comma separated actor names, split them on the commas, lowercase the names, and get the intersection:

actors_from_site_1 = set(actors_from_site_1.lower().split(','))
actors_from_site_2 = set(actors_from_site_2.lower().split(','))

common_actors = actors_from_site_1 & actors_from_site_2
Lgiro
  • 762
  • 5
  • 13
  • 1
    @MattDMo: If the intersection contains a large enough set of actors, you can consider them the same... or you can use that information combined with other information to decide if the movies are the same or not... – ArtOfWarfare Apr 01 '15 at 01:52
  • @ArtOfWarfare correct. I wrote my comment when the answer was [not really anything](http://stackoverflow.com/revisions/29381487/1). – MattDMo Apr 01 '15 at 15:43
  • @MattDMo: Ah, I forgot that answers posted so soon after a question is asked tend to go through substantial revisions shortly after getting posted. – ArtOfWarfare Apr 01 '15 at 17:54
1

Try:

similaractors = []
for actor in actors_from_site_1:
    if actor in actors_from_site_2:
        similaractors.append(actor)

Then, you have similaractors as a list of all the actors they share. Call len(similaractors) to get the number of similar actors, and then you can print(similaractors) and do everything else you might do with a list.

Luke Taylor
  • 8,631
  • 8
  • 54
  • 92