Skip to content

HoMER 2021: Miskell’s International Orientation Index

Introduction

The Cinema Context in RDF project was presented during the HoMER Network Annual Conference (24-29 May 2021). In a pre-recorded video (see below), Thunnis van Oort and Leon van Wissen explain the forthcoming of this project, and highlight some use cases of working with the Linked Data of Cinema Context, also beyond the dataset itself (e.g. by briding the dataset to Wikidata).

More information on the session: https://homernetwork.org/discussion-session-20/

International orientation index

One of these case studies dives into a replication of economic film historian Peter Miskell’s analysis on what he calls the ‘international orientation index’. He proposes (Miskell & Li, 2014) this index in order to investigate the relative success of Hollywood productions abroad in the post-war reconstruction period. Miskell states that American productions with a relatively high proportion of non-American creative talent and non-American content (based on the origin of story characters, and the narrative location of a film) have fared better at non-American box offices.

Miskell analyses a total of 665 films for which he could find sufficient (financial) data. He constructs a ‘international orientation score’ on the basis of:

  • nationality of leading actors, directors, screenwriters, and leading characters
  • film’s setting
  • national provenance of the source text

Replicating this with Cinema Context RDF data

We can check if Miskell’s observation holds true for the Dutch film market by analysing film and programming data from Hollywood productions in Cinema Context. What is missing in our dataset is information on the (total) revenue of a film, though this value can be approximated by it’s number of screenings under the assumption that a film with more screenings generates a higher revenue.

To peer the variables Miskell used to come to a score in his index, we can make use of the information that’s available for films in Wikidata. Instead of assigning a 0, 1 or 2 score to a criterion, we can give a variable a relative score, indicating its extent of ‘internationalisation’ (or: ‘inverse americanness’) in this criterion. All criteria get a score from 0.0 - 1.0 and are calculated by summing the number of American persons or locations involved and by dividing it over the total in a criterion. This is subtracted from 1 so that a score of 0.0 means a score is all American, and 1.0 that it is all non-American. In total, we gathered information on:

Criterium Wikidata property path
Director nationality wdt:P57/wdt:P27
Screenwriter nationality wdt:P58/wdt:P27
Cast nationality wdt:P161/wdt:P27
Narrative location wdt:P840/(wdt:P17|wdt:P131)
Shooting location wdt:P915/(wdt:P17|wdt:P131)
Author (source) nationality (wdt:P144/wdt:P50)|wdt:P1877/wdt:P27

Query

Info

In theory it is possible to do this in one single federated query, though the necessary programming power far exceeds what the endpoints of Cinema Context and Wikidata allow and offer. Therefore, fetching the necessary information is done in three steps. A Jupyter notebook file with the steps can be found here: https://gitlab.com/uvacreate/cinema-context/cinema-context-rdf/-/snippets/2145834

Steps in getting the data

  1. We first get all film information from Cinema Context and ask for:

    • CC id
    • Title
    • Country of origin
    • IMDB id
    • Number of screenings
    • The earlest and latest screening date
    SPARQL Query
    # endpoint: https://data.create.humanities.uva.nl/sparql
    
    PREFIX sem: <http://semanticweb.cs.vu.nl/2009/11/sem/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX schema: <http://schema.org/>
    
    SELECT ?cc ?title ?countryOfOrigin ?imdb (COUNT(?screening) AS ?screenings) (MIN(?date) AS ?first_screening) (MAX(?date) AS ?last_screening) WHERE {
    GRAPH <https://data.create.humanities.uva.nl/id/cinemacontext/> {
        ?cc a schema:Movie ;
            schema:name ?title ;
            schema:sameAs ?imdb ;
            schema:countryOfOrigin/schema:name ?countryOfOrigin .
    
        FILTER(LANG(?countryOfOrigin) = 'en')
    
        ?screening schema:workPresented ?cc .
    
        ?program schema:subEvent ?screening ;
                sem:hasEarliestBeginTimeStamp ?date .
    }
    } GROUP BY ?cc ?countryOfOrigin ?title ?imdb
    
  2. Then, we use the Cinema Context Wikidata linking property (wdt:P8296) to join the Cinema Context with Wikidata. An alternative here would be to use the IMDB id as shared resource.

    SPARQL Query
    # endpoint https://query.wikidata.org/sparql
    
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX cc: <http://www.cinemacontext.nl/id/>
    
    SELECT ?wd ?cc WHERE {
    
        ?wd wdt:P8296 ?ccid . # cc
    
        BIND(URI(CONCAT("http://www.cinemacontext.nl/id/", ?ccid)) AS ?cc)
    
    }
    
  3. Finally, for every film for which we have screening information, get the scores for each criterion from Wikidata. In total, what is returned:

    • Film title
    • Film origin country name
    • Ratios for:
      • Director nationality
      • Screenwriter nationality
      • Cast nationality
      • Narrative location
      • Shooting location
      • Author (source) nationality
    SPARQL Query

    This example query is given for wd:Q561208 [=Anna Karenina]. Substitute this reference in every subquery with the respective Wikidata identifier.

    # endpoint https://query.wikidata.org/sparql
    
    SELECT * WHERE {
    
    {
        wd:Q561208 rdfs:label ?label .
        FILTER(LANG(?label) = 'en')
    
        OPTIONAL { 
            wd:Q561208 wdt:P495/rdfs:label ?countryLabel .
            FILTER(LANG(?countryLabel) = 'en')
        }
    
    }
    
    # Director nationality
    {
        SELECT (1 - (?americans / ?total) AS ?director_ratio) WHERE {
        {
            SELECT (COUNT(?american_director) AS ?americans) WHERE {
            wd:Q561208 wdt:P57 ?american_director .
            ?american_director wdt:P27 wd:Q30 .
            }
        }
        {
            SELECT (COUNT(?origin_director) AS ?total) WHERE {
            wd:Q561208 wdt:P57/wdt:P27 ?origin_director .      
            }
        } 
      }
    }
    
    # Screenwriter nationality
    {
        SELECT (1 - (?americans / ?total) AS ?screenwriter_ratio) WHERE {
        {
            SELECT (COUNT(?american_screenwriter) AS ?americans) WHERE {
            wd:Q561208 wdt:P58 ?american_screenwriter .
            ?american_screenwriter wdt:P27 wd:Q30 .
            }
        }
        {
            SELECT (COUNT(?origin_screenwriter) AS ?total) WHERE {
            wd:Q561208 wdt:P58/wdt:P27 ?origin_screenwriter .      
            }
        }
      } 
    }
    
    # Cast nationality
    {
        SELECT (1 - (?americans / ?total) AS ?cast_ratio) WHERE {
        {
            SELECT (COUNT(?american_actor) AS ?americans) WHERE {
            wd:Q561208 wdt:P161 ?american_actor .
            ?american_actor wdt:P27 wd:Q30 .
            }
        }
        {
            SELECT (COUNT(?origin_actor) AS ?total) WHERE {
            wd:Q561208 wdt:P161/wdt:P27 ?origin_actor .      
            }
        }
      } 
    }
    
    # Narrative location
    {
        SELECT (1 - (?in_america / ?total) AS ?narrative_ratio) WHERE {
        {
            SELECT (COUNT(DISTINCT ?american_location) AS ?in_america) WHERE {
            wd:Q561208 wdt:P840 ?american_location .
            ?american_location wdt:P17|wdt:P131 wd:Q30 .
            }
        }
        {
            SELECT (COUNT(DISTINCT ?location) AS ?total) WHERE {
            wd:Q561208 wdt:P840 ?location .      
            }
        }
      } 
    }
    
    # Shooting location
    {
        SELECT (1 - (?in_america / ?total) AS ?shooting_ratio) WHERE {
        {
            SELECT (COUNT(DISTINCT ?location) AS ?in_america) WHERE {
            wd:Q561208 wdt:P915 ?location .
            ?location wdt:P17|wdt:P131 wd:Q30 .
            }
        }
        {
            SELECT (COUNT(DISTINCT ?location) AS ?total) WHERE {
            wd:Q561208 wdt:P915 ?location .      
            }
        }
      } 
    }
    
    # Author (source) nationality
    {
        SELECT (1 - (?americans / ?total) AS ?source_ratio) WHERE {
        {
            SELECT (COUNT(DISTINCT ?american_author) AS ?americans) WHERE {
            wd:Q561208 (wdt:P144/wdt:P50)|wdt:P1877 ?american_author .
            ?american_author wdt:P27 wd:Q30 .
            }
        }
        {
            SELECT (COUNT(DISTINCT ?author) AS ?total) WHERE {
            wd:Q561208 (wdt:P144/wdt:P50)|wdt:P1877 ?author .      
            }
        }
      } 
    }
    
    }
    

Individual results (excerpt)

Category/criterion Anna Karenina (1935) Casablanca (1942) Key Largo (1948)
CC id F001809 F020802 F015663
Wikidata id Q561208 Q132689 Q830773
Screenings 37 28 No data available
Director nationality 0 0.52 0
Screenwriter nationality 0.5 0 0
Cast nationality 0.53 0.58 0.125
Narrative Location 1 1 0
Shooting Location No data available1 0 No data available
Author (source) nationality 1 0 0
Miskell’s International Orientation 12 7 1
Total 3.03 2.08 0.125

Info

A spreadsheet with all results combined (N=8836, of which 5495 USA productions) can be downloaded here: https://gitlab.com/uvacreate/cinema-context/cinema-context-rdf/-/snippets/2127334. Keep in mind that this data is a snapshot of the information that was on Wikidata and could have been changed by user additions/deletions.

Calculating the score

The score of ‘internationalness’ is calculated by summing the ratios and dividing them over the number of columns for which data was available. The summed ratio is thereby again relative to its potential maximum score. Applied to the three examples above gives:

Category/criterion Anna Karenina (1935) Casablanca (1942) Key Largo (1948)
Miskell’s International Orientation 12 7 1
Total sum of ratios 3.03 2.08 0.125
Columns with available data 6 6 5
Score 0.61 0.35 0.03

Results

Calculating a simple correlation between the numer of screenings and the score by using Pearson’s r gives the following outcome:

Number of variables Rows Pearson’s r correlation
1 5441 0.125
2 4725 0.138
3 3418 0.130
4 1340 0.137
5 274 0.089
6 25 -0.097

The ‘Rows’ column indicates for how many films information was available in the six criteria. Only for 25 out of 5495 USA produced films we had all six criteria filled. It is therefore better to look at the correlations based on a slice of the data for which we at least had 4 or 5 variables.

Looking at these numbers, we can say we found a slight positive correlation between the number of screenings a film had in Dutch cinemas, based on the available programming data in Cinema Context, and its ‘internationalness’, based on data found in Wikidata. This very weak correlation indicates that films that have a higher international orientation tend to do better in the Dutch film market than more American oriented films.

Conclusion

This preliminary research is merely meant as a proof of concept and should be complemented with more data on revenue, programmes and for instance cinema capacity. It also heavily relies on the availability of data in Wikidata for the properties that we used to match Miskell’s scoring system. Although the scores resemble the scores found by Miskell et al., we cannot compare them that easily, and most likely there is too much distortion or bias in the data to make this a valid comparison. What we can conclude from this is that there is a weak indication that the international orientation of Hollywood productions is related to their popularity in The Netherlands.

We of course invite anyone to complement, improve, or contest these results. All the code is available, and all the data is open!

References


  1. This means that this information is not entered in Wikidata, or is not known in Cinema Context. We do compensate for this lack of information in working with the score. 

  2. NB: Though there is only one director (M. Curtiz), there are two countries of citizenship involved (Hungary and United States) in this way of querying. Therefore, the score is 0.5. 


Last update: July 6, 2021