By Mark Dietrich
What does a finalist for a Times Person of the Year, a knighthood and a global race have to do with advanced research computing infrastructure in Canada? Dr Fabiola Gianotti received McGill University's honorary doctorate for her work on the Higgs boson in June. A subatomic physicist from Milan, she was also a finalist for the Times Person of the Year in 2012. A scientist who predicted the Higgs boson — and another who helped to find it — were knighted in June as well.
The elusive particle was finally detected in 2012 at the particle accelerator in Switzerland (CERN) by two giant data experiments — CMS and ATLAS — that have data centres around the world. This amazing discovery, worthy of documentaries and global attention, would not have been possible without advanced research computing.
Canada's contribution to this effort involved not only dozens of collaborators like Dr Gianotti, but also digital infrastructure, such as the TRIUMF data centre in Vancouver — one of the world's leading subatomic physics laboratories.
To analyze the data in Canada, the ATLAS team uses four data centres operated by Compute Canada. This team of researchers made key contributions to the search for the Higgs boson. TRIUMF and Compute Canada enabled 40 faculty members and 150 Canadian researchers to participate in this international endeavour.
Today, ATLAS stores more than 120 petabytes (PB) of data worldwide and is expected to store about 500PB by 2020. (This is a massive amount of data — a single petabyte of music files could be played continuously without repetition for 2,000 years.)
Advances in medical imaging and genetic sequencing are driving growth in bioinformatics and computational biology. This growth involves very big "Big Data." Genomic and proteomic databases are doubling in size every year, and other types of data are doubling every 3-4 months.
The cost of storing this amount of information is increasing and moving these large files across even the fastest network can take weeks. However, we must address storage and data movement as these are essential capabilities for understanding disease, discovering new drugs, and realizing the promise of personalized medicine.
| ||||||||||||||||
|
Digital infrastructure is now a critical ingredient for the most transformative science, and not just a "nice to have" tool. It is the key to unlocking the mysteries of the human brain, provides the power necessary to design the next generation aircraft, and makes it possible to predict natural disasters, saving lives and property.
Today's most spectacular global science efforts depend not only on our excellent scientists but also on our digital infrastructure. These resources allow Canada to participate in these efforts and attract and retain the best scientific minds.
Both are essential to maintaining a leadership position in science. Digital infrastructure is also directly linked to our ability to innovate in mining and energy, aerospace, drug development and medical technologies, and clean technologies. Industrial R&D is now completely powered by computer modeling, simulation, visualization and design.
If the level of broadband Internet access is an accepted productivity factor for our economy, the availability of advanced research computing infrastructure resources is a comparable factor for science and innovation. The best scientists go where the best advanced research computing infrastructure resources are available. Looking at the advanced research computing capacity available to the average researcher (based on the 2012 Top 500 list and UNESCO researcher statistics) we see how Canada ranks (see chart).
In these countries, the number of researchers is not growing radically; however their compute capacity has been growing by roughly 80% annually since 2010. This suggests that we need to double our digital infrastructure capability every year to attract and retain the best researchers, to develop new products, and participate in global transformative science initiatives.
Compute Canada operates and maintains Canada's advanced research computing infrastructure, working in partnership with our regional organizations, research-intensive institutions and CANARIE for networking. It is a unique, Made-in-Canada model that makes these resources accessible on nearly every campus in the country. We can optimize usage over the entire national platform to maximize existing funding.
More than 80% of our capacity is allocated through competition and peer review, ensuring Canada's excellent researchers have the resources they need to succeed, while still enabling all researchers to have access to Compute Canada services.
Today, 55% of our demand is served from outside the researchers' home institutions and 19% is being served from outside their region. Our goal is to make sure researchers don't care what systems they use; ideally they don't even need to know (but we're not there yet).
Compute Canada is also about sharing expertise. Digital scholarship is the new reality for many disciplines, and they need the guidance and support of Compute Canada experts. Often the best way to accelerate research is to create sophisticated online tools that marry the ease of use of the Web, services for finding the right research data and the power of our largest systems.
Two world-leading examples of these tools have come out of Canadian neuroscience, with CBRAIN, and Canadian genomics, with GenAP. Both systems (funded by CANARIE) have the power to amplify impact in their fields — and in similar fields — by orders of magnitude. This is where world-leading scientists come to play.
The storage of research data is our next challenge. The government has signalled its interest in big data and open data. CFI's cyber-infrastructure initiative shows a preference for "data intensive" proposals. But we haven't really figured out how much data needs to be stored or how much it will cost to keep for five years, much less forever.
To better understand the growth of data and the need for enhanced advanced research computing, Compute Canada launched "Sustainable Planning for Advanced Research Computing" (SPARC). This initiative will help researchers and institutions translate their own scientific trajectories over the next 5-10 years into their need for advanced research computing infrastructure. We will be working with the research disciplines, institutions, the granting councils and key organizations like CANARIE and Research Data Canada to build this comprehensive forecast of Canada's requirements for advanced research computing infrastructure.
This forecast of requirements will play a fundamental role in maintaining Canada's leadership capacity in science and innovation.
Compute Canada is committed to working to ensure Canadian investments in digital infrastructure keep pace with the demands of transformative science. The race is on. Our leading industrial and academic scientists and researchers must be able to compete globally, and we need to provide them with the tools they need to excel.
Mark Dietrich is President and CEO of Compute Canada. Mark leads a national staff and a team of more than 150 experts. Compute Canada and its six regional partners deliver advanced computing services across Canada.