3

I have a lot of movie data from IMDB and I'm in the middle of cleaning up the data and making it so that 1 row = 1 movie as the database often has multiple records for a single film.

I've restructured the data so that what was a single 'Country' variable with multiple cases for a single film, is now a set of 29 country columns. A single film may have up to 29 countries affiliated with it (most have just 1 or 2).

I plan to do some simple descriptive statistics and expected frequencies, perhaps look for correlations with other variables like genre etc.

Is it possible to have SPSS treat all 29 variables as a single variable? It doesn't matter which of the country variables a country is present in, just that it is present in one of them. For example I might want to find all Indian films, and ask SPSS to check for each row, whether 'India' is in any one of the country variables and return the row if it is present in any of them.

Is this possible, or do I just need to manually instruct SPSS with a list of OR commands whenever I run a query.

  • 2
    For descriptive stats this is exactly what multiple response sets are for. For the later on querying rows you can use the `ANY` command. – Andy W May 12 '15 at 11:28

1 Answers1

3

There are two types of multiple response sets: multiple dichotomy, which would be 29 yes/no variables as you describe, and multiple category, in which you have a list of arbitrary categories. See the MRSETS command for details.

Once defined, CTABLES can do all your statistical calculations on these, and these sets can also be used in graphics constructed in the Chart Builder or GGRAPH commands.

Don't confuse the sets created by MRSETS with the older MULTIPLE RESPONSE procedure, which is still available. MRSETS definitions persist with the data and are used with CTABLES and GGRAPH only.

With the ANY function, as Andy said above, you would use the individual variables, but you can use TO. So, for example, you could write

COMPUTE FILM7 = ANY(7, f1 to f29)

if you have MC variables. If using the MD structure, you would have to check, say, variable f7 in this example.

eli-k
  • 10,898
  • 11
  • 40
  • 44
JKP
  • 5,419
  • 13
  • 5