Géraldine Van der Auwera is a talented researcher from Brussels. She lives in the USA, where she worked for many years at the Broad Institute (co-created by MIT and Harvard University). She is the author of one of the very first books devoted to the alliance between the Cloud and genomics, and helped set up the "Terra" platform. She has been lending her expertise to the Global Alliance for Genomics and Health (GA4GH) and the 101 Genomes Foundation since 2020. Ludivine interviewed her during one of her rare visits to Brussels to try and shed some light on Géraldine's unique expertise and what she brings to 101 Genomes.
Ludivine: Hello Géraldine, would you like to introduce yourself?
Géraldine I'm a microbiologist by training, and switched to human genomics after a post-doctorate in microbial genetics. I worked for ten years at the Broad Institute of MIT and Harvard, a genomics research institute in Boston, USA. There, I was mainly responsible for the technical and scientific support of researchers using certain bioinformatics tools made available to the scientific community by the Broad Institute. I am currently a freelance consultant in bioinformatics and scientific communication. As a sideline, I am co-director of the Large-Scale Genomics Workstream at the Broad Institute. Global Alliance for Genomics and Health (GA4GH), an international organization developing technical standards and regulatory frameworks to promote responsible sharing of genomic data.
L.: Can you explain your role in the 101 Genomes Foundation and why you decided to get involved in 2020?
G. : You and Romain contacted me with questions about the Broad Institute's large-scale genomic studies, and the migration to the cloud of genomic data analysis and sharing systems, which play a key role in the implementation of such studies. I was both touched by their family history and impressed by their approach to creating the Foundation, which made me want to lend them a hand. I play an advisory role, mainly in the development of the cloud infrastructure to support the scientific aims of the project.
L.: You wrote "Genomics in Cloud". It's one of the first books devoted to using the Cloud to preserve and study the genome. Can you tell us a little more about this book?
G. The book Genomics in the Cloudpublished by O'Reilly Media in 2020offers both a theoretical and practical introduction to the analysis methods used in human genomics, focusing primarily on data processing and variant identification from sequencing data, using cloud infrastructure.
My co-author, Brian O'Connor, and I designed this book based on our shared experience at the intersection of genomics and computer technology. This is a highly interdisciplinary field that brings together both biology and medicine specialists, who typically have very little computer science training in their background, and technologists, programmers, and other computer infrastructure professionals, who find themselves dealing with particularly complex scientific concepts and vocabularies.
Our book therefore offers the reader an upgrade of his technical and scientific knowledge through practical exercises, with very few prerequisites, with the aim of making human genomics more accessible.
L.: You've put a lot of work into developing "Terra". Why do you think 101 Genomes is a good candidate to join "Terra on Azure"?
G. : One of the major difficulties faced by rare disease research associations is the fragmentation of genomic data sources. Many studies are based on relatively small numbers of patients, too small to use the large-scale analysis techniques that are needed to examine complex genetic mechanisms in a statistically robust way.
The solution to this problem is to federate data from multiple studies. The Terra platform has been designed to enable such data federation within an open-source scientific ecosystem that promotes scientific collaboration while protecting data security and ownership.
101 Genomes is an excellent example of a project that can benefit from such a platform to achieve its scientific goals without having to take on the development and operation of a complete infrastructure. Having already set up a data lake on Azure, the Microsoft cloud, the F101G will soon be able to connect its data lake to Terra on Azure, allowing research teams to analyze this data via Terra in a collaborative manner. As other groups migrate their data to this data federation ecosystem, these analyses will gain statistical power and push forward our understanding of the biological mechanisms involved.
L.: Is there anything else you'd like to add?
G. : I think it's important to remember that human genomics is coming back, in fineThis has ethical implications, but also very practical ones. This has ethical implications, but also very practical ones: we can only achieve a sufficient understanding of the human genome if we have a sufficient representation of populations around the world. This is why it is essential to work towards a substantive international collaboration, as is done for example by the Global Alliance for Genomics and Health. It is an effort that requires the participation of stakeholders from all walks of life - researchers, physicians, technologists, forensic scientists, politicians, as well as patient organizations and even the general public.