Since June 2022, the Genomic Cloud of 101 Genomes is operational. To date, this bioinformatics biobank (Bio-Biobank) is probably already one of the largest genomic databases (WGS) of Marfan syndrome patients in the world (if not the largest). And it's all set to keep growing! This groundbreaking resource will allow bioinformatics researchers to better understand Marfan syndrome and can serve as a pilot for research into other rare diseases. Colby T. Ford is the architect of our Genomic Cloud. He talks about the work he has done with us and answers Ludivine's questions on the occasion of the recent release of his book "Genomics in the Azure Cloud", which was inspired by this experience.
Ludivine : Hello Colby, can you introduce yourself ?
Colby: My name is Colby T. Ford, Ph.D. and I am a genomics scientist and cloud solution architect. I own Tuple, a Microsoft and Databricks partner consulting firm that specializes in building cloud genomics solutions for life science organizations. Outside of Tuple, I'm an avid researcher in human genomics and infectious diseases where I've contributed to topics spanning oncology, immunology, malaria, SARS-CoV-2, and more. I'm a Microsoft Certified Trainer and the author of "Genomics in the Azure Cloud", published by O'Reilly Media in 2022.
L.: Can you briefly explain your role as the cloud architect for the F101G?
C.: My role as a consultant for F101G centered around working with the founders of F101G to understand the goals of their genomics cloud platform (for Marfan syndrome as a starting point). We began by creating a genomics data lake to house all of the -omics and phenotype data for study participants. I then collaborated with other individuals on the team to build out data pipelines to collect data from sequencing vendors. We also built out some computational services for analyzing and visualizing the data from the data lake. This included bioinformatics pipelines and logic to scalably query variant data and a DICOM viewer application to view imaging data (X-rays and MRIs) from patients. Finally, we worked closely with a security consultant and achieved ISO 27001:2013 compliance on the entire cloud architecture.
L.: What do you think about your collaboration with the F101G and the F101G project?
C. : The F101G project as a whole was an interesting challenge with a very impactful research goal. Being from the United States, regulations and rules around patient data differ between the US and Europe, which was nice to learn.
While I had worked on other rare diseases in the past, Marfan Syndrome was not one of them. I always enjoy learning a new biological use case, disease, drug target, etc. at different client projects.
Also, the collaboration with F101G has been quite unique in that we've been able to collaborate both scientifically on the study of the disease and technically on the design of the cloud architecture. I love the drive that the F101G team has in transforming research in Marfan Syndrome and beyond with an innovative and cloud-first approach.
L.: You recently published a book called "Genomics in the Azure Cloud", can you tell us more about this book?
C. : The book provides a great foundational set of considerations in building a cloud-based architecture for genomics workloads. I wrote the book as I noticed that there wasn't a lot of content or examples that existed for enterprise-scale genomics, though there's plenty of content for finance, retail, and other industries. In the book, we cover data platform services such as data lake data lakes and data warehouses and then we spend time learning about computational services that can help to automate and scale the processing of bioinformatics data. This book is written for scientists who want to learn how to do their work better in Azure and also for cloud architects that want to learn more about solutions for -omics workloads.
L.: Is there anything you would like to add?
C. : I truly believe that the work we have done with F101G will be revolutionary for Marfan syndrome research. In addition, the cloud computing architecture and resources we have put in place will easily extend to other rare diseases in the future. It will be amazing to see how the Azure cloud helps provide scalable information in disease research over time!