Programme – DH & RSE Summer School 2025

Day 1: Mon 30 June

King’s Digital Lab, King’s College London

Leads: Neil Jakeman, Mary Chester-Kadwell

The Software Development Life Cycle

Understanding the Software Development Life Cycle (SDLC) is an essential skill for managing digital humanities (DH) projects effectively, or indeed, any digital project. The SDLC provides a structured framework that guides the development process from initial conception through development to deployment and maintenance. By adopting a systematic approach, researchers and research software engineers can ensure that project objectives are clearly defined, resources are efficiently allocated, and potential risks are identified and mitigated early on. This methodology enhances the quality and sustainability of digital outputs and fosters better collaboration among interdisciplinary teams. Incorporating SDLC practices into DH projects leads to more robust, maintainable, and impactful research outcomes.

Morning: Talks

Introduction to RSE in digital humanities
Arianna Ciula, Director & Senior Research Software Analyst, King’s Digital Lab

Introduction to the Software Development Lifecycle
Neil Jakeman, Senior Research Software Analyst

In this presentation session, we will examine the rationale for establishing a Software Development Life Cycle (SDLC) through looking at past projects and common challenges. We consider the various consequences of inadequate planning, and the benefits of a transparent and collaborative approach to developing research software in partnership with academic colleagues. We will also see that while project processes are essential, institutional support for careers in RSE, and the recognition of RSE contributions also underpin the implementation of robust project management practices.

Case studies of projects at King’s Digital Lab
speakers tbc

Afternoon: Workshop

Software Development Life Cycle planning scenario – requirements elicitation
Neil Jakeman, Senior Research Software Analyst
Mary Chester-Kadwell, Senior Research Software Engineer

Building from the principles and processes outlined in the morning session, we will break into small groups with each being given a hypothetical digital humanties project to scope the requirements for. Each group will play the role of an RSE team with limited resources, planning the project approach with the academic project lead. The teams will need to interview the academic lead to establish important factors such as research priorities, target audiences, project timeframes, project resources, established standards, and sustainability needs. This will be a playful simulation intended to provoke discussion and underline the challenges of managing competing priorities.

Day 2: Tue 1 July

Cambridge Digital Humanities, University of Cambridge

Lead: Jonathan Blaney

Digital Humanities at the Command Line

Learning the command line is a foundational skill for research software engineering practices. The command line is a versatile interface that allows users to interact directly with their computer, providing capabilities that often go beyond what is possible with graphical interfaces. It can simplify tasks such as managing large collections of files, processing textual data, or automating repetitive actions. It also provides access to advanced tools for analysis, version control, and data processing, as well as enabling the use of high-performance computing (HPC) resources for tackling large-scale or computationally intensive research questions. Using the command line fosters efficiency, opens up new possibilities for working with data, and equips users with skills that enhance collaboration and reproducibility in digital humanities projects. We hope to demonstrate that command line skills and confidence are useful in a wide variety of scenarios.

Morning: Talks

Humanities RSE work at Cambridge
Estara Arrant, Leverhulme Early Career Research Fellow, University of Cambridge
Mike Hawkins, Senior Developer, Cambridge University Library

In this session we will learn about a variety of approaches to and applications of RSE in digital humanities at the University of Cambridge, from the individual scholar working alone or as the only technical person on a project working on a single piece of software, to the software developer in a team environment working on multiple projects and systems at once. The discussion will focus on comparing and contrasting the different experiences of RSE in digital humanities and you will have the opportunity to ask questions about their interesting projects, challenges and career trajectories.

The command line: principles and best practice
Jonathan Blaney, Digital Humanities Research Software Engineer
Ryan Heuser, Assistant Professor in Digital Humanities

This session will introduce the command line for those who haven’t used it before, but also discuss best practice for making use of the command line as repeatable and well-documented as possible. Accordingly we will focus on commands which embody those principles rather than offer obscure incantations to the uninitiated. We will also discuss the important question of when not to use the command line, a topic to which we will return at the end of the day.

Afternoon: Workshops

Command line group challenge: Building a complex pipeline
Jonathan Blaney, Digital Humanities Research Software Engineer
Ryan Heuser, Assistant Professor in Digital Humanities

For this session we will challenge participants working in teams to use the command line to gradually build up a series of commands to solve a data problem. If the command line is new, this will consolidate your understanding, and if you are already experienced it will challenge you not just to solve the problem but to explain it to others. We will provide a list of possible commands you might consider to help narrow down the choice.

Distant reading with the command line and Python
Ryan Heuser, Assistant Professor in Digital Humanities
Jonathan Blaney, Digital Humanities Research Software Engineer

For the final session we will be working in Colab and demonstrating how the command line can be used in tandem with Python, playing to the strengths of each. We will use the command line to get texts and prepare them for analysis and then the power of Python and its libraries for the text analysis phase of the work.

Day 3: Wed 2 July

e-Research, King’s College London

Leads: Neil Jakeman, James Graham

High Performance Computing for Digital Humanities

High-performance computing (HPC) provides researchers and research software engineers with the computational power needed to process large datasets, perform complex analyses, and process extensive collections of texts or images. In digital humanities, HPC is particularly useful for tasks such as natural language processing, machine learning, or working with digital archives that exceed the capacity of standard personal computers. Learning to use HPC resources is increasingly important for handling computationally intensive methods, collaborating on large-scale projects, and efficiently addressing research questions that require significant processing capabilities.

Morning: Talks

Case studies using HPC in digital humanities research
Neil Jakeman, Senior Research Software Analyst, King’s Digital Lab
Daniel Chevaz Heras, Lecturer in Digital Culture and Creative Computing, King’s College London

How do we recognise the requirement for additional computing power in a digital research setting? What barriers to entry typically need to be overcome? How can we, as RSEs, empower our colleagues and communicate the benefits and limitations?

We will hear from a colleague in the Department of Digital Humanities at KCL about their particular use case for HPC and how access to these resources has enabled research to scale, and we also look at other recent use cases. The increasing prevalence of, and the widening scope for the application of, Artificial Intelligence also make HPC resources important. If we want to use the latest and most powerful machine learning models, and also have reliable, secure and replicable computing environments we will need to become conversant in the use of these resources.

Principles of High Performance Computing for digital humanists
e-Research Training Team

High-performance computing is potentially a powerful tool, but how does it work, and when is it useful for humanities research? This session explores the core principles behind HPC, including how CPUs and GPUs handle tasks differently, when large-scale computing might be needed, how job scheduling and resource management is different on a cluster than a laptop, and when to optimise code for better performance. We’ll discuss when digital humanists may benefit from these methods—and when your laptop is enough. This session focuses on understanding the key ideas that make HPC a powerful tool for digital humanities research.

Afternoon: Workshop

Introduction to High Performance Computing with CREATE HPC
e-Research Training Team

This workshop introduces the practical aspects of using high-performance computers, giving a basic overview of the tools available and how to use them. It will build on the principles learnt in the morning session to see how working with CPUs, GPUs and large datasets can be managed in practice. Participants will learn how to access HPC resources, navigate the command-line environment, and submit and manage jobs on computing clusters. The session will also cover key concepts such as finding and loading the necessary software and working with virtual environments. No prior experience with HPC is required—this workshop makes advanced computing accessible to everyone exploring computational methods in the humanities.

Day 4: Thu 3 July

Centre for Data, Culture and Society, University of Edinburgh

Lead: Lucia Michelin

Responsible Digital Research

Responsible digital research adopts ethical and sustainable methodologies to minimise environmental impact whilst maintaining data integrity and promoting long-term sustainability. Following a preliminary overview on the application of humanities perspectives to computational and digital research, this session will delve into ‘good enough’ practices in data acquisition, the utilisation of High-Performance Computing (HPC), and overarching strategies throughout the research project lifecycle. We will consider the ethical and legal ramifications of web scraping social media content, alongside exploring optimisation techniques for HPC, thereby fostering a dialogue around the humanities’ role in advancing more sustainable digital research practices.

Morning: Talks

Introduction to the EFI/CDCS ecosystem and the humanities approach to digital and computational research
Lisa Otty, Acting Director, Centre for Data, Culture & Society

Responsible Data Collection
Lucia Michelin, Digital Skills Training Manager
Jessica Witte, Digital Research Analyst

Estimating carbon emissions from high-performance computing
Andrew Turner, EPCC, University of Edinburgh

In this presentation, I will describe how we have enabled users to estimate carbon emissions from the use of the UK National Supercomputing Service, ARCHER2. I will also cover what strategies are available to reduce carbon emissions from use of high performance computing (spoiler alert: you need to do less computing!).

Minimal computing and digital publishing
Christopher Ohge, Senior Lecturer in Digital Approaches to Literature, School of Advanced Study, University of London

Afternoon: Workshop

Greening the software development life cycle
Lucia Michelin, Digital Skills Training Manager

In this session we will build upon the outcomes from Monday’s workshop on Software Development Life Cycle planning and Wednesday’s workshop on High Performance Computing. This involves refining existing scenarios through a sustainability-focused framework, enhancing our approaches to effectively integrate environmental considerations into software development and deployment. Using Green Disc and the Digital Sustainability Card Game—developed by the Digital Humanities Climate Coalition—participants will engage with key sustainability principles, assess real-world scenarios, and explore ways to integrate greener practices into their own projects.

Day 5: Fri 4 July (optional)

Digital Skills in Arts and Humanities (DISKAH)

Morning: Workshop (2 hours)

Building your digital research skills with Programming Historian
Instructor/Trainer: Anisa Hawes, Publishing Manager Programming Historian

Programming Historian in English, en español, en français, and em português are four multilingual, Diamond Open Access journals of article-length lessons on digital techniques and workflows. Our lessons support students and academics to learn effective research methods, and also help educators to teach tomorrow’s researchers. This workshop will introduce you to the journals, explaining how our lessons can empower your next steps in learning the practical skills you’ll need to work with data. We’ll highlight lessons that could help you develop the skills you’ll need during your research, discuss troubleshooting strategies for overcoming obstacles, and think together about the value of peer-to-peer support and building a community around you as you work. This training can be attended as a standalone session, but will be followed-up with a webinar later in the year that focuses on how to approach writing about technical methods and guides researchers in the development of a lesson proposal for Programming Historian.