Canada's HPC clusters facing operating cash crunch, urgent equipment renewal

Guest Contributor
September 1, 2010

Two centres face cutbacks or closure

Canadian high performance computing (HPC) is facing an urgent funding crisis as aging equipment and insufficient operating support threaten at least two Ontario-based centres with downsizing or closure. With more than $250 million invested in HPC equipment since 2006, seven HPC clusters spread across Canada are experiencing a cash-crunch that could derail current and planned research projects if skilled technicians and high-profile researchers depart for more conducive jurisdictions.

HPC in Canada has only recently been treated as a cohesive whole, with the successful application for Canada Foundation for Innovation (CFI) funding in 2006 under its now moribund National Platforms Fund. Since that time, the federal and provincial governments, institutions and others have invested in HPC while the community prepares for further expansion and upgrading. But with the delay of a planned 2009 CFI competition and uncertainty over whether the agency will hold a competition that's suitable for HPC has led to growing pressure for new equipment and operational support at least approaching levels provided in other countries.

"We are grossly underfunded compared to other jurisdictions where they often spend as much on operations and maintenance as they do on infrastructure," says Susan Baldwin, executive director of Compute Canada, a national body that coordinates and and promotes HPC. "The technology is also advancing so rapidly that you're putting researchers at a disadvantage if you don't offer them the best equipment to work with."

While it's likely HPC will be eligible for the next major CFI competition, it's not known whether there will be a national platforms funds competition. Privately, HPC officials have been told that there will be no set-aside for HPC, meaning that the HPC research community must compete with others and possibly amongst themselves. The CFI and Industry Canada concluded a contribution agreement earlier this year but to date it has not been released (R$, July 6 & 19/10).

The CFI award of $78 million in 2006 provided $60 for equipment and $18 million for limited operations support. Combined with matching funds and $10 million in salary support for technical analysts from the Natural Sciences and Engineering Research Council, the total investment was $178 million. While welcome, it was insufficient to support the community's complete long range plan which called for $76 million annually in public funding, ramping up to $96 million by 2012 (R$, January 24/06 & January 18/07). Only mid-range regional facilities were supported and some in Ontario voluntarily passed on funding to allow for the creation of HPC clusters in other regions.

The latter decision has contributed to a capital and operating crisis at two Ontario-based clusters — Queen's Univ-based High Performance Computing Virtual Laboratory (HPCVL) and the Shared Hierarchical Academic Research Computing NETwork (SHARCNET) at the Univ of Western Ontario.

"In retrospect it was a mistake … The equipment at SHARCNET is five or six years old, which in the HPC world is like driving around in a Model-T," says Dr Paul Maxim, board chair for SHARCNET and associate VP research at Wilfred Laurier Univ, adding that the delay of the CFI competition is "a huge problem". "There's no national funding mechanism set up for renewal. The CFI is not set up for this."

crisis at sharcnet

The problem at SHARCNET is compounded by the lack of operational funding hampering all HPC clusters. Discussions with the Ontario Ministry of Research and Innovation (MRI) have generated a sympathetic response but no commitment of funding relief. If the Ontario-based clusters are not included in next year's provincial Budget drastic measures may be taken.

"If by December 30th we don't know where the operating funding is coming from, we will have to start scaling back and even shutting down the systems. We are running out of money," says Maxim. "We need to convince MRI to provide base operational funding as a necessary component of maintaining our research infrastructure. About $4-5 million is needed which is a small investment given the return … If we could get a signal from the Ontario government that the next Budget would have a line item for HPC, we could limp along for four to six months."

While clusters across the country negotiate with their respective provincial governments for additional support, all eyes remain focused on the CFI and its (still unannounced) intention to extend its role into support for major science facilities. In a letter to institutional presidents former CFI president and CEO Dr Eliot Phillipson stated that the foundation's new funding agreement with the government includes a framework for major science facilities and "predictability in funding" for an unspecified number.

"Until the call (for proposals) is made and details are spelled out nothing is guaranteed. We're waiting like everyone else," says Maxim. "If it's a competitive round, other clusters could compete for those funds. We don't have a coordinated list of national priorities but SHARCNET and HPCVL are number one priorities nationally."

The funding crunch comes at a time when HPC is expanding internationally, with countries such as India and China making major investments. Maxim says it's a good time to be in HPC as the overall pie increases with expansions in backbone and bandwidth opening up the potential for international collaborative projects.

Compute Canada is the logical conduit for collaboration with other countries, but the organization is also challenged by inadequate funding. In its submission to the CFI as part of its mid-term review, Compute Canada outlined its dilemma bluntly:

"Compute Canada's current financial model is arguable its greatest weakness and poses a significant challenge going forward ... (It) does not have funding to undertake initiatives such as middleware development, establish an awards program, stimulate the use of HPC by non-traditional disciplines (and) hold science fairs at the undergraduate level in order to train highly qualified personnel," states the submission. "A larger permanent staff is required ... We are at a critical international disadvantage because the ratio of personnel dollars to infrastructure dollars is too low".

"Industry Canada acknowledges the importance of HPC," says Baldwin. "But the value is going to decrease if we don't sustain the infrastructure and the people."

R$


Other News






Events For Leaders in
Science, Tech, Innovation, and Policy


Discuss and learn from those in the know at our virtual and in-person events.



See Upcoming Events










You have 1 free article remaining.
Don't miss out - start your free trial today.

Start your FREE trial    Already a member? Log in






Top

By using this website, you agree to our use of cookies. We use cookies to provide you with a great experience and to help our website run effectively in accordance with our Privacy Policy and Terms of Service.