We are thrilled to announce the launch of the Lamont Doherty Earth Observatory Climate Data Science Lab. A $2.3M grant from the Gordon and Betty Moore Foundation, to principal investigator Ryan Abernathey, will enable us to tackle some of the most difficult scientific problems and data challenges in Climate Data Science. We are extremely grateful to the Moore Foundation for their generous support and for their progressive vision for data-driven science.
Motivation and Goals
Oceanographers and climate scientists have access to a wealth of data–from instruments, satellite observations, and numerical simulations–to help confront the intellectual and societal challenges posted by climate change. But our ability to observe and simulate the climate system has outstripped our ability to actually understand the resulting data. In terms of software, as a field we have focused nearly all our efforts on the computational problem of simulation itself (ocean and climate models), and comparatively little effort on the data analysis side. Currently, we lack software and infrastructure commensurate to the scale of the data and the complexity of our scientific questions. Consequently, many students and postdocs are toiling over low-level data processing challenging, rather than thinking about big-picture scientific questions.
The goal of the Climate Data Science Lab is to leverage our group’s expertise at the intersection of climate science and open source software to develop a new generation of powerful, scalable, and sustainable computational tools for climate data science (a new term we are embracing henceforth). Hopefully, this effort will drive leaps in productivity across the entire field, leading to exciting new discoveries in oceanography and climate science.
The Climate Data Science Lab builds on the work of the Pangeo Project. Pangeo was founded to improve coordination and community within the open source geoscience community. The Pangeo collaboration has lead to dramatic improvements in the integration and scalability of netCDF, Xarray, Dask, Zarr and related libraries, on both traditional High Performance Computing systems and cloud platforms. The focus of the Climate Data Science Lab is the development of flexible yet performant layer of high-level software tools for oceanography and climate-specific operations that interoperate with the rest of the Pangeo tools. We will also work on improving the integration between these tools and machine learning libraries such as TensorFlow and PyTorch.
Some of the scientific problems we plan to explore within the Climate Data Science Lab are:
- Cross-scale energy transfers in high-resolution models and observations
- Air-sea exchange in high-resolution models and observation
- Machine learning for ocean remote sensing and sub-grid parameterization
Education and outreach will also be an important aspect of the lab’s activities. The tools developed within the lab, and within Pangeo more generally, will only have a broad impact if the next generation (and the current generation!) of researchers can easily learn to use them. Lab members will contribute to the development of tutorials and workshops, hold open office hours, and participate in online forums to help diffuse emerging best practices into the broader climate science community.
CDS Lab Structure
The CDS Lab will follow a structure that has proved successful so far for the Pangeo project: bringing together software-literate scientists with science-literate software developers. The scientists will define scientific problems which push the boundaries of the currently available software tools. This engineers will identify and resolve the technological roadblocks, making contributions wherever deemed necessary.
With the Climate Data Science Lab, we are hoping to pioneer a new sort of scientific working environment focused on collaboration, rather than traditional model of individual researchers working mostly in isolation. We plan to cultivate the following principles:
- Tough scientific challenges are best tackled by a team with diverse backgrounds, skills, and experiences.
- Developing reusable tools (e.g. software) is a valuable contribution to scientific research.
- Open data, open source software, and computational reproducibility are integral to every scientific project.
- When it comes to publications, quality is more important than quantity.
There are several open positions within the CDS lab.
- Software Engineer, Numerical Algorithms and Scientific Python (apply online)
- Software Engineer, Cloud Computing and Data DevOps (apply online)
- Associate Research Scientist, Physical Oceanography (apply online)
- 2 x Postdocs, Physical Oceanography (email [email protected] for more information on postdoc positions)
More information about the initiative is available here.
— Ryan Abernathey, Associate Professor, Earth & Environmental Sciences, Columbia University