Integrated Research Infrastructure: Argonne Combines HPC and Experiments to Accelerate Discovery

When the major upgrade to the Advanced Photon Source (APS) at the U. S. Department of Energy’s Argonne National Laboratory is complete, the U. S. Department of Energy’s Advanced Photon Source (APS) upgrade is complete. By the end of the U. S. Department of Energy (DOE) later this year, continued experiments on the rugged X-ray light source are expected to generate between 100 and 200 petabytes. , or one hundred and two hundred million gigabytes of clinical data per year.

“One knowledge exabyte is equivalent to streaming 1. 5 million videos every day for a year,” said Nicholas Schwarz, a computer scientist at Argonne and head of clinical software and knowledge control at APS. “But we want to do a lot more than just move a lot of knowledge. For the X-ray experiments performed at APS, we want to use complex computational equipment to read over each and every pixel in each and every image, analyze knowledge in near real-time, and use the effects to make decisions about the next experiment.

“To process all this knowledge quickly, we need a lot of computing capabilities, from mainframes and data storage to analytics software, to the computing fabric that connects all those resources,” he added.

The growing deluge of scientific data is not unique to light sources. Telescopes, particle accelerators, fusion research facilities, remote sensors and other scientific instruments also produce large amounts of data. And as their capabilities improve over time, the data generation rates will only continue to grow.

“The ability of the clinical network to process, analyze, store and percentage these large data sets is critical to gaining insights that will lead to new discoveries,” said Michael E. Papka, associate director of the Associated Laboratory for Computer Science, Environment. and Argonne Environment. Life sciences. Papka is also director of the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility at Argonne, and is a professor of computer science at the University of Illinois at Chicago.

Argonne’s Nexus effort plays a critical role in advancing DOE’s vision of building an Integrated Research Infrastructure (IRI). The progression of an IRI would boost knowledge-intensive studies by seamlessly integrating the DOE’s experimental conveniences with its world-class supercomputers, synthetic intelligence (AI), and knowledge resources.

For more than a decade, Argonne has been working to expand equipment and strategies to connect its powerful computing resources to large-scale experiments. The merger of the ALCF supercomputers with the APS has been vital to the lab’s IRI-related research, but the paintings have also included collaborations with the DIII-D National Fusion Facility in California and the Large Hadron Collider at CERN. in Switzerland. DIII-D is a DOE Office of Science user facility.

“We’ve been partnering with experimental facilities for several years now to help them use our supercomputing resources to process huge amounts of data more quickly,” Papka said. “With the launch of Nexus, we have a vehicle to coordinate all of our research and collaborations in this space to align with DOE’s broader efforts to lead the new era of integrated science.”

Argonne’s ongoing work has led to the creation of tools for managing computational workflows and the development of new capabilities for on-demand computing, giving the lab valuable experience to support the DOE IRI initiative. Globus and the ALCF Community Data Co-Op (ACDC) are critical resources in enabling the IRI vision. Globus, a research automation platform created by researchers at Argonne and the University of Chicago, is used to manage high-speed data transfers, computing workflows, data collection and other tasks for experiments. ACDC provides large-scale data storage capabilities, offering a portal that makes it easy to share data with external collaborators across the globe.

ALCF’s upcoming Aurora exascale supercomputer will also complement the lab’s IRI efforts, offering significant development in computing power and complex functions for AI and knowledge analysis.

Rationalization of science

The IRI will not only enable experiments to analyze vast amounts of data, but it will also allow them to process large datasets quickly for rapid results. This is crucial as experiment-time analysis often plays a key role in shaping subsequent experiments.

For the Argonne-DIII-D collaboration, researchers demonstrated how the close integration of ALCF supercomputers could benefit a fast-paced experimental setup. Their work centered on a fusion experiment that used a series of plasma pulses, or shots, to study the behavior of plasmas under controlled conditions. The shots were occurring every 20 minutes, but the data analysis required more than 20 minutes using their local computing resources, so the results were not available in time to inform the ensuing shot. DIII-D researchers teamed up with the ALCF to explore how they could leverage supercomputers to speed up the analysis process.

“Every time they fired, we started working on the ALCF. It would retrieve the data from the DIII-D, do the analysis, and send the effects back to them in time to calibrate the next shot,” said Thomas Uram, a computer scientist at Argonne. and IRI manager at ALCF. “Because we had more computing power than DIII-D had locally, we can analyze their knowledge faster and with 16 times the solution of their internal systems. Not only did they get the effects before the next shot, however, they also received analysis of particularly superior solutions for the precision of your configuration.

Many experiments conducted in APS will also gain advantages from near real-time knowledge analysis, adding battery research, fault exploration, and drug development.

“By getting research effects in seconds or less rather than hours, days or even weeks, scientists can get real-time information about their experiments as they unfold,” Schwarz said. “Researchers will be able to use this data to pilot an experiment and expand a specific domain to see critical processes, such as the molecular adjustments that occur in a battery’s velocity and deceleration cycles, as they occur. “

A fully learned IRI would also affect those conducting the research. Scientists have to spend a lot of time and effort managing knowledge when conducting an experiment. This includes responsibilities such as storing, transferring, validating and sharing knowledge before it can be used to gain new knowledge.

“IRI’s vision is to automate many of those tedious knowledge governance responsibilities so researchers can delve deeper into the science,” Uram said. “This would greatly speed up the clinical process, freeing up scientists and giving them more time to formulate hypotheses while experiments are underway. “

HPC On-Demand

Immediately getting to DOE’s supercomputers for data research requires an upgrade in the way computing facilities work. Each facility has policies and processes in place to manage machines, create user accounts, manage data, and other tasks.

“If a researcher is installed in one PC installation but wants to use superpc in other installations, they will have to follow a similar set of steps for the site,” Uram said. “And that takes time. It takes time to do that. “

Once the task is set up, the researchers send their “work” to a queue, where they wait their turn to run it on the supercomputer. While the classic queuing formula is helping to optimize the use of on-premise supercomputers, it doesn’t. fast response times required for IRI.

To make it easier for end users, IRI will require the implementation of a uniform way for experimental groups to temporarily access DOE supercomputing resources.

To this end, Argonne has evolved and demonstrated strategies to overcome task scheduling challenges and user accounts. The joint location of APS and ALCF on the Argonne campus provides an ideal environment to test and demonstrate such capabilities. When ALCF introduced the Polaris supercomputer in 2022, 4 of the system’s racks committed to advancing integration efforts with experimental facilities.

In the case of user accounts, the existing procedure can be voluminous for experiments involving multiple team members who want to use IT conveniences for knowledge processing. The Argonne team tested the concept of employing “service accounts” that provide secure access to a specific experience. instead of requiring each and every team member to have an active account.

“This is vital because many experiments require a team of other people to gather knowledge and take on research responsibilities over a period of a few days or a week,” Uram said. “We want a way to enjoy it regardless of who’s using the tools. “that day. “

To solve the task scheduling problem, the Argonne team reserved a portion of the Polaris nodes for painting with “on-demand” and “preemptables” queues. This technique allows urgent responsibilities to be executed on compromised nodes.

The team successfully conducted full tests of on-demand and interruptible service accounts and queues on Polaris using the knowledge generated in an APS experiment. The races were completely automated, with no humans on the circuit.

“This capability is truly exciting for the experimental integration efforts here at Argonne, but there is much work ahead to develop workable solutions that can be used across all DOE experimental and computing facilities,” Papka said.

Put it together

While Argonne and its fellow national laboratories have been running projects for several years to demonstrate the promise of an embodied paradigm, DOE’s Advanced Scientific Computing Research (ASCR) program made it a more formal initiative in 2020 with the launch of IRI. Task force. Composed of members from several national laboratories, as well as Argonne’s Schwarz, Uram, Jini Ramprakash and Corey Adams, the working group learned about the opportunities, risks and challenging situations posed by such integration.

In 2022, ASCR launched the IRI Blueprint Activity to create a framework for implementing the IRI. The blueprint team, which included Schwarz and Ramprakash, released a report that describes a path forward from the lab’s individual partnerships and demonstrations to a broader long-term strategy that will work across the DOE ecosystem. Over the past year, the blueprint activities have started to formalize with the introduction of IRI testbed resources and environments. Now in place at each of the DOE computing facilities, the testbeds facilitate research to explore and refine IRI ideas in collaboration with teams from DOE experimental facilities.

“With the launch of the Nexus effort here at Argonne, we will continue to leverage our collective knowledge, experience, and resources for DOE and Giant’s clinical network to enable and evolve this new paradigm across a wide diversity of study areas, clinical instruments, and user facilities,” Uram said.

Source: Jim Collins, Argonne

Making discoveries in the field of health and life sciences (HCLS) takes time and requires large amounts of data. An enterprise HPC infrastructure with artificial intelligence and edge-to-cloud capabilities is needed for biomedical studies to make the creation of an atlas of the human body imaginable. . The collaboration between HPE, NVIDIA and Flywheel employs the latest technologies designed to. . .

The cloud market for HPC workloads is expected to earn $11. 5 billion through 2026 and has noticed a significant shift in buying behaviors. This is partly due to the demands of synthetic intelligence (AI) and device learning (ML) workloads, with cloud computing being adequate. A mission-critical environment. Hyperion Research shows how each of the organizations adopting high-performance computing [. . . ]

Leave a Comment Cancel Reply