Cinema Context RDF documentation¶
SEMANTICS 2021: Paper on Cinema Context as Linked Open Data
Check out our accompanying paper:
- Leon van Wissen, Thunnis van Oort, Julia Noordegraaf, and Ivan Kisjes: Cinema Context as Linked Open Data: Converting an online Dutch film culture dataset to RDF. Joint Proceedings of the Semantics co-located events: Poster&Demo track and Workshop on Ontology-Driven Conceptual Modelling of Digital Twins co-located with Semantics 2021, Amsterdam, The Netherlands, September 6-9, 2021, CEUR-WS.org, online http://ceur-ws.org/Vol-2941/paper10.pdf
Cinema Context (CC) is an online MySQL database containing places, persons and companies involved in more than 100,000 film screenings since 1895. CC provides insight into the ‘DNA’ of Dutch film and cinema culture and is praised by film historians worldwide. The data model is used in various international projects, such as European Cinema Audiences and the Cinecos/Cinema Belgica project. Technical management of CC rests with the Digital Production Centre (DPC) of the University Library of Amsterdam, the editors are located at the University of Amsterdam, within the CREATE digital humanities programme (editor-in-chief Julia Noordegraaf, editor-in-chief Thunnis van Oort and programmer Ivan Kisjes). A dump of the SQL database is deposited at DANS.
Cinema Context as Linked Open Data¶
CC as Linked Data offers opportunities for broadening and renewing historical and cultural research, which is less feasible with the current SQL format. It allows linking data from the database to a range of external data: about buildings, persons, heritage objects and locations, as for example in recent research that relates demographic data about Amsterdam to the locations of cinemas in order to learn more about historical film audiences. In the Digital Humanities and Cultural Heritage communities there is a need to be able to query CC data in connection with such external data via a sparql endpoint.
DANS has financed a ‘Klein dataproject’ (small data project) that has allowed the conversion of the SQL database into Linked Open Data. To this end, Menno den Engelse has created export scripts. These scripts not only have converted the data into high quality RDF, but these scripts can be used for regular updates. Moreover, these scripts are a starting point for converting the international databases set up according to the CC data model to RDF.
The project is collaboration between the CC editorial staff, the DPC, and Islands of Meaning, a one-man company owned by Menno den Engelse, programmer and data maker specialising in RDF. The selection of appropriate vocabularies and thesauri required a close collaboration between data specialists and domain experts. Moreover, the collaboration has functioned as de facto training in working with RDF; the export scripts can be executed and adapted by CC’s own administrators and editors at the end of the project.
The project was divided into the following phases:
Modelling data: Determine which thesauri and vocabularies to use in a balance between a practical, workable model that developers and researchers can work with and precision / complexity. Determine URI strategy: existing CC permalinks can be used.
Writing export scripts: The export scripts retrieve data from the MySQL database and, using the vocabularies chosen above, write rdf-statements to turtle files.
Quality analysis SPARQL queries: Via a number of queries we test the RDF data for usability and completeness.
Making connections: Linked data deserves its fifth star when linked to resources elsewhere. This already happens occasionally (film titles linked to Internet Movie Database), but more entities can be linked to external identifiers. We tackled a number of quickwins. [ADD EXAMPLES] Facilitating the linking to Wikidata, a Wikidata Cinema Context ID property has been created.
Demonstration application: To test whether we have indeed created a practical and workable model, we created a small application that shows all Amsterdam cinemas and the films shown there within the Amsterdam Time Machine.
Modifications MySQL database and Cinema Context Editor: The Cinema Context Editor was developed in 2018, with funding from the Amsterdam University Fund, to facilitate CC’s editorial staff in correcting and entering data. Converting the CC database to Linked data has required some adjustments to the original MySQL database and the Cinema Context Editor.
Publish: The data will be publicly available via a sparql endpoint at https://data.create.humanities.uva.nl/, and also deposited at DANS. The export scripts are available in an online repository on GitLab. The entire conversion process is described on this current documentation page (https://uvacreate.gitlab.io/cinema-context/cinema-context-rdf/).