Brookhaven Lab’s Scientific Data and Computing Center Reaches 100 Petabytes of Recorded Data

Total reflects 17 years of experimental physics data collected by scientists to understand the fundamental nature of matter and the basic forces that shape our universe

(Back row) Ognian Novakov, Christopher Pinkenburg, Jérôme Lauret, Eric Lançon, (front row) Tim Chou, David Yu, Guangwei Che, and Shigeki Misawa at Brookhaven Lab’s Scientific Data and Computing Center, which houses the Oracle StorageTek tape storage system where experimental data are recorded.

Imagine storing approximately 1300 years’ worth of HDTV video, nearly six million movies, or the entire written works of humankind in all languages since the start of recorded history—twice over. Each of these quantities is equivalent to 100 petabytes of data: the amount of data now recorded by the Relativistic Heavy Ion Collider (RHIC) and ATLAS Computing Facility (RACF) Mass Storage Service, part of the Scientific Data and Computing Center (SDCC) at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory. One petabyte is defined as 10245 bytes, or 1,125,899,906,842,624 bytes, of data.

“This is a major milestone for SDCC, as it reflects nearly two decades of scientific research for the RHIC nuclear physics and ATLAS particle physics experiments, including the contributions of thousands of scientists and engineers,” said Brookhaven Lab technology architect David Yu, who leads the SDCC’s Mass Storage Group.

SDCC is at the core of a global computing network connecting more than 2,500 researchers around the world with data from the STAR and PHENIX experiments at RHIC—a DOE Office of Science User Facility at Brookhaven—and the ATLAS experiment at the Large Hadron Collider (LHC) in Europe. In these particle collision experiments, scientists recreate conditions that existed just after the Big Bang, with the goal of understanding the fundamental forces of nature—gravitational, electromagnetic, strong nuclear, and weak nuclear—and the basic structure of matter, energy, space, and time.

Big Data Revolution

The RHIC and ATLAS experiments are part of the big data revolution. These experiments involve collecting extremely large datasets that reduce statistical uncertainty to make high-precision measurements and search for extremely rare processes and particles.

For example, only one Higgs boson—an elementary particle whose energy field is thought to give mass to all the other elementary particles—is produced for every billion proton-proton collisions at the LHC. More, once produced, the Higgs boson almost immediately decays into other particles. So detecting the particle is a rare event, with around one trillion collisions required to detect a single instance. When scientists first discovered the Higgs boson at the LHC in 2012, they observed about 20 instances, recording and analyzing more than 300 trillion collisions to confirm the particle’s discovery.

At the end of 2016, the ATLAS collaboration released its first measurement of the mass of the W boson particle (another elementary particle that, together with the Z boson, is responsible for the weak nuclear force). This measurement, which is based on a sample of 15 million W boson candidates collected at LHC in 2011, has a relative precision of 240 parts per million (ppm)—a result that matches the best single-experiment measurement announced in 2007 by the Collider Detector at Fermilab collaboration, whose measurement is based on several years’ worth of collected data. A highly precise measurement is important because a deviation from the mass predicted by the Standard Model could point to new physics. More data samples are required to achieve the level of accuracy (80 ppm) that scientists need to significantly test this model.

The volume of data collected by these experiments will grow significantly in the near future as new accelerator programs deliver higher-intensity beams. The LHC will be upgraded to increase its luminosity (rate of collisions) by a factor of 10. This High-Luminosity LHC, which should be operational by 2025, will provide a unique opportunity for particle physicists to look for new and unexpected phenomena within the exabytes (one exabyte equals 1000 petabytes) of data that will be collected.

Data archiving is the first step in making available the results from such experiments. Thousands of physicists then need to calibrate and analyze the archived data and compare the data to simulations. To this end, computational scientists, computer scientists, and mathematicians in Brookhaven Lab’s Computational Science Initiative, which encompasses SDCC, are developing programming tools, numerical models, and data-mining algorithms. Part of SDCC’s mission is to provide computing and networking resources in support of these activities.

A Data Storage, Computing, and Networking Infrastructure

Housed inside SDCC are more than 60,000 computing cores, 250 computer racks, and tape libraries capable of holding up to 90,000 magnetic storage tape cartridges that are used to store, process, analyze, and distribute the experimental data. The facility provides approximately 90 percent of the computing capacity for analyzing data from the STAR and PHENIX experiments, and serves as the largest of the 12 Tier 1 computing centers worldwide that support the ATLAS experiment. As a Tier 1 center, SDCC contributes nearly 23 percent of the total computing and storage capacity for the ATLAS experiment and delivers approximately 200 terabytes of data (picture 62 million photos) per day to more than 100 data centers globally.

At SDCC, the High Performance Storage System (HPSS) has been providing mass storage services to the RHIC and LHC experiments since 1997 and 2006, respectively. This data archiving and retrieval software, developed by IBM and several DOE national laboratories, manages petabytes of data on disk and in robot-controlled tape libraries. Contained within the libraries are magnetic tape cartridges that encode the data and tape drives that read and write the data. Robotic arms load the cartridges into the drives and unload them upon request.

Inside one of the automated tape libraries at the Scientific Data and Computing Center (SDCC), Eric Lançon, director of SDCC, holds a magnetic tape cartridge. When scientists need data, a robotic arm (the piece of equipment in front of Lançon) retrieves the relevant cartridges from their slots and loads them into drives in the back of the library.

When ranked by the volume of data stored in a single HPSS, Brookhaven’s system is the second largest in the nation and the fourth largest in the world. Currently, the RACF operates nine Oracle robotic tape libraries that constitute the largest Oracle tape storage system in the New York tri-state area. Contained within this system are nearly 70,000 active cartridges with capacities ranging from 800 gigabytes to 8.5 terabytes, and more than 100 tape drives. As the volume of scientific data to be stored increases, more libraries, tapes, and drives can be added accordingly. In 2006, this scalability was exercised when HPSS was expanded to accommodate data from the ATLAS experiment at LHC.

“The HPSS system was deployed in the late 1990s, when the RHIC accelerator was coming on line. It allowed data from RHIC experiments to be transmitted via network to the data center for storage—a relatively new idea at the time,” said Shigeki Misawa, manager of Mass Storage and General Services at Brookhaven Lab. Misawa played a key role in the initial evaluation and configuration of HPSS, and has guided the system through significant changes in hardware (network equipment, storage systems, and servers) and operational requirements (tape drive read/write rate, magnetic tape cartridge capacity, and data transfer speed). “Prior to this system, data was recorded on magnetic tape at the experiment and physically moved to the data center,” he continued.

Over the years, SDCC’s HPSS has been augmented with a suite of optimization and monitoring tools developed at Brookhaven Lab. One of these tools is David Yu’s scheduling software that optimizes the retrieval of massive amounts of data from tape storage. Another, developed by Jérôme Lauret, software and computing project leader for the STAR experiment, is software for organizing multiple user requests to retrieve data more efficiently.

Engineers in the Mass Storage Group—including Tim Chou, Guangwei Che, and Ognian Novakov—have created other software tools customized for Brookhaven Lab’s computing environment to enhance data management and operation abilities and to improve the effectiveness of equipment usage.

STAR experiment scientists have demonstrated the capabilities of SDCC’s enhanced HPSS, retrieving more than 4,000 files per hour (a rate of 6,000 gigabytes per hour) while using a third of HPSS resources. On the data archiving side, HPSS can store data in excess of five gigabytes per second.

As demand for mass data storage spreads across Brookhaven, access to HPSS is being extended to other research groups. In the future, SDCC is expected to provide centralized mass storage services to multi-experiment facilities, such as the Center for Functional Nanomaterials and the National Synchrotron Light Source II—two more DOE Office of Science User Facilities at Brookhaven.

“The tape library system of SDCC is a clear asset for Brookhaven’s current and upcoming big data science programs,” said SDCC Director Eric Lançon. “Our expertise in the field of data archiving is acknowledged worldwide.”

Source: BNL



Comment this news or article