SLAC expands and centralizes IT infrastructure

A computing facility at the Department of Energy’s SLAC National Accelerator Laboratory is doubling in size, preparing the lab for new scientific endeavors that promise to revolutionize our understanding of the world from atomic to cosmic scales, but also require handling streams of data without precedents.

When SLAC superconducting x-ray laserfor example, you connect, it will eventually accumulate data in a blistering speed of one terabyte per second. And the The world’s largest digital astronomy camera.under construction at the Vera C. Rubin Observatory laboratory, will eventually capture a whopping 20 terabytes of data every night.

“The new IT infrastructure will be up to these challenges and more,” said Amedeo Perazzo, who heads the Controls and Data Systems division within the lab’s Technological Innovation Department. “We are embracing some of the latest and greatest technologies to build computing capabilities for all of SLAC for years to come.”

The Stanford University-led construction adds a second building to the existing Stanford Research Computing Facility (SRCF). SLAC will become one of the main tenants of SRCF-II, a modern data center that will provide an environment designed to operate 24 hours a day, 7 days a week, without service interruptions and taking into account the integrity of the data. SRCF-II will double the current capabilities of the data center, for a total of 6 megawatts of power capacity.

“Computing is a core competency for a science-driven organization like SLAC,” said Adeyemi Adesanya, head of the Perazzo division’s Scientific Computing Systems department. “I am delighted to see our vision of an integrated computing facility become a reality. It is a necessity for analyzing data at scale and will also pave the way for new initiatives.”

A hub for Big Data from SLAC

The Adesanya team is preparing to configure hardware for the SLAC Shared Science Data Facility (S3DF), which will find its home within SRCF-II. It will become a computing hub for all the data-intensive experiments performed in the lab.

First, it will benefit future users of LCLS-II, the update to the Linac Coherent Light Source (LCLS) X-ray laser that will produce more 8,000 pulses per second more than the first generation machine. The researchers hope to use LCLS-II to gain new insights into the atomic processes that are central to some of the most pressing challenges of our time, including the chemistry of clean energy technologies, the molecular design of drugs, and the development of materials and quantum devices.

But with new capabilities come tough computational challenges, said Jana Thayer, director of the LCLS Data Systems division. “To get the best scientific results and make the most of their time in LCLS-II, users will need quick feedback, within minutes, on the quality of their data,” she said. “To do that with an X-ray laser that produces thousands of times more data per second than its predecessor, we need the petaflops of computing power that S3DF will provide.”

Another problem researchers will have to deal with is the fact that LCLS-II will accumulate too much data to store it all. The new data facility will run an innovative data reduction pipeline that removes unnecessary data before it is saved for analysis.

Another computationally demanding technique that will benefit from the new infrastructure is cryogenic electron microscopy (cryo-EM) of biomolecules, such as proteins, RNA or virus particles. In this method, scientists take pictures of how an electron beam interacts with a sample containing the biomolecules. Sometimes they need to analyze millions of images to reconstruct the three-dimensional molecular structure in near-atomic detail. The researchers also hope to visualize molecular components in cells, not just biochemically purified molecules, at high resolution in the future.

The complex image reconstruction process requires a lot of CPU and GPU power and involves an elaborate process. machine learning algorithms. Doing these calculations in the S3DF will bring new opportunities, said Wah Chiu, head of the Stanford Center-SLAC Cryo-EM.

“I really hope that S3DF will become an intellectual center for computation, where experts come together to write code that allows us to visualize increasingly complex biological systems,” Chiu said. “There is great potential to discover new structural states of molecules and organelles in normal and pathological cells in SLAC.”

In fact, everyone in the lab will be able to use the available computing resources. Other potential “customers” include SLAC’s instrument for ultrafast electron diffraction (MeV-UED), the Stanford Synchrotron Radiation Light Source (SSRL), the entire laboratory machine learning initiative and applications in accelerator science. In total, S3DF will be able to support 80% of SLAC’s computing needs, while 20% of the most demanding scientific computing will be performed on external supercomputer facilities.

Multiple services under one roof

SRCF-II will host two other major data facilities.

One of them is rubin observatoryUS Data Service (USDF). In a few years, the observatory will begin taking images of the southern night sky from a mountaintop in Chile using its 3,200-megapixel camera built by SLAC. For the Legacy Survey of Space and Time (LSST), it will take two images every 37 seconds for 10 years. The resulting data could hold answers to some of the biggest questions about our universe, including what exactly accelerates its expansion, but that data will be contained in a 60-petabyte catalog that researchers will have to sift through. The resulting image archive will reach about 300 petabytes, dominating the storage usage in SRCF-II. The USDF, along with two other centers in the UK and France, will be responsible for producing the huge catalog of data.

A third data center will serve SLAC’s first-generation X-ray laser user community. Existing IT infrastructure for the LCLS data analysis it will gradually move to SRCF-II and become a much larger system there.

Although each data center has specific needs in terms of technical specifications, they are all based on a core of shared services: data must always be transferred, stored, analyzed and managed. In close collaboration with Stanford, Rubin Observatory, LCLS and other partners, Perazzo and Adesanya’s teams are setting up all three systems.

For Adesanya, this unified approach, including a cost model that will help pay for future upgrades and growth, is a dream come true. “Historically, computing at SLAC was highly distributed, with each facility having its own specialized system,” he said. “The new, more centralized approach will help spur new initiatives across the lab, such as machine learningAnd by breaking down silos and converging into one integrated data facility, we’re building something that’s more capable than the sum of everything we’ve had before.”

The construction of SRCF-II is a Stanford project. Much of the S3DF infrastructure is funded by the Department of Energy’s Office of Science. LCLS and SSRL are Office of Science user facilities. The Rubin Observatory is a joint initiative of the National Science Foundation (NSF) and the Office of Science. Its primary mission is to conduct the Legacy Survey of Space and Time, providing an unprecedented data set for scientific research supported by both agencies. Rubin is jointly operated by NOIRLab and NSF’s SLAC. NOIRLab is managed for NSF by the Association of Universities for Research in Astronomy and SLAC is operated by DOE by Stanford. Stanford-SLAC Cryo-EM Center (S2C2) is supported by the National Institutes of Health (NIH) Pooled Transformative High Resolution Electron Cryomicroscopy Program.

SLAC is a vibrant, multi-program lab exploring how the universe works on the largest, smallest, and fastest scales and inventing powerful tools used by scientists around the world. With research spanning particle physics, astrophysics and cosmology, materials, chemistry, life and energy sciences, and scientific computing, we help solve real-world problems and advance the nation’s interests.

SLAC is operated by Stanford University for the US Department of Energy. Science Office. The Office of Science is the largest single supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time.

Disclaimer: AAAS and Eurek Alert! are not responsible for the accuracy of the press releases published on EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Leave a Comment