02 - Implementing Reproducible Environmental Data Science with Open Science: Lessons from the 1st Climate Informatics Reproducibility Challenge#

Presented by: Andrew McDonald

Abstract#

As data science, machine learning, and artificial intelligence revolutionise the earth and climate sciences, facilitating reproducibility and replicability across studies to advance state-of-the-art techniques, uncover knowledge gaps, and prevent the unnecessary duplication of work becomes increasingly important. Inspired by the Machine Learning Reproducibility Challenge series, we convened the inaugural Climate Informatics Reproducibility Challenge in June 2023 immediately following the 12th Annual Conference on Climate Informatics held in May 2023. The challenge garnered approximately 30 sign-ups from around the world and split participants among 7 teams each attempting to reproduce a paper published in the Environmental Data Science journal published by Cambridge University Press. Teams were tasked with creating notebook-based repositories which reimplemented the workflow of a published paper to verify and validate the findings of the study and to pinpoint potential improvements in data- and code-sharing best practices. Over the course of the challenge, organisers convened a lecture series from experts in open science, held office hours for 1:1 support, facilitated networking events to connect teams with one another over shared research interests, and recruited peer-reviewers to share constructive feedback with teams on their submissions. Three teams submitted complete notebook repositories currently in the process of being published in the Environmental Data Science book, an open-source initiative which seeks to promote the principles of findable, accessible, interoperable, and reusable (FAIR) code and data across the earth and climate sciences.