Maud Ehrmann
Digital Humanities Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)
https://people.epfl.ch/maud.ehrmann
maud.ehrmann@epfl.ch
Title: Media Archives Across Borders – The Impresso Projects
Abstract: The availability of newspaper and radio archives in machine-readable formats has improved preservation, accessibility, and opened up new opportunities for automatic processing and exploration. In the case of newspapers in particular, text and image processing techniques are now being used to enrich collections with semantic annotations, enabling deeper content exploration. Despite these advances, current digital portals still fall short of meeting the needs of historical research. Exploration frameworks remain fragmented, confined to digital archive silos with country-based institutional portals, and digital media silos, where enrichments and exploration capabilities are typically limited to single language and media type. Moreover, these portals often offer only passive exploration of static collections, whereas historical research requires iterative comparison and association of multiple objects of study. As digital tools increasingly shape all phases of historical research, historians are also calling for new methods and tools to critically analyse data, tools, and interfaces.
Impresso - Media Monitoring of the Past is an interdisciplinary research project that aims to pioneer new approaches to exploring media archives for historical research. In its first phase (2017-2020), the project developed a scalable infrastructure for Swiss and Luxembourg newspapers, featuring a powerful search interface that helped popularise the use of text mining-based enrichment for the retrieval and exploration of newspaper articles - now almost a standard practice. The second phase, starting in 2023, broadens the scope and envisions a comprehensive connection between media archives, aiming to enable the joint exploration of historical newspaper and radio content across temporal, linguistic, and national boundaries in order to support data-driven historical research in transmedia and transnational perspectives.
This talk will introduce Impresso 2 and review the evolution from the first to the second project. We will discuss the specific challenges to connecting newspaper and radio from legal, processing, historical, and design perspectives, our efforts to adapt text mining and exploration tools to historical material derived from different modalities, and our approach to conducting comparative and data-driven historical research using semantically enriched sources, accessible through both graphical and API-based interfaces.
Bio:
Maud Ehrmann is a research scientist and lecturer at the EPFL Digital Humanities Laboratory in Lausanne. She holds a PhD in Computational Linguistics from the University of Paris 7 Denis Diderot and her research interests span natural language processing and digital humanities, with special focus on historical document processing, information extraction and knowledge representation. With backgrounds in both NLP and the humanities, she particularly enjoys working in interdisciplinary contexts and often acts as an intermediary between computer scientists, humanity scholars, engineers and representatives of cultural heritage institutions. In recent years, she has focused on content mining of historical newspapers with, among others, the project Impresso - Media Monitoring of the Past projects and the HIPE evaluation campaigns.