NVIDIA SC23 Special Address

Show video

Welcome to Supercomputing 2023. I'm Ian Buck, the vice president of HPC and Hyperscale Datacenter business here at NVIDIA. Computing stands as the bedrock of modern civilization, fueling scientific discovery, powering industrial automation and birthing artificial intelligence. For two decades.

CPU performance surged, growing 10,000 times in 20 years. Yet today that trajectory has stagnated and incremental gains in instruction-level parallelism and frequency bring disproportionate costs and power increases. This marked a turning point in computing. An opportunity to forge new pathways, to revolutionize and transcend these limits and invite fresh innovation into computing. NVIDIA's accelerated computing is a multi-domain acceleration platform with full-stack optimization of a wide range of science and industrial applications.

Our dedication to architectural compatibility has created an installed base of hundreds of millions of GPUs for researchers and developers. And NVIDIA's rich ecosystem connects computer makers and cloud service providers to nearly every domain of science and industry. The results of accelerated computing are spectacular. Accelerated workloads can see an order-of-magnitude reduction in system cost and energy used.

For example, Siemens teamed up with Mercedes to analyze the aerodynamics and related acoustics for their new electric EQE vehicle. These simulations traditionally take weeks on CPU clusters. However, using the latest NVIDIA Hopper H100 GPUs they can get their work done with fewer systems without sacrificing productivity and save money. Hopper allows them to reduce costs by 3x and reduce energy consumption by 4x. And moving forward, GPU acceleration also allows them to improve performance further and deliver faster design iterations, expanding design options, and on the right cost-efficient and energy platform. Accelerated computing is a full-stack problem.

It starts with world-class processors, but also requires investment in domain-specific libraries, working closely with application developers to get the most out of the hardware Computing today is not just about the silicon or the server. Today, the entire data center is the new unit of compute. And NVIDIA continues to innovate in our software and hardware solutions to scale applications across the entire data center. And of course, AI is the latest tool for accelerating computing. AI is the software that writes software and AI is transforming HPC.

It’s providing an order of magnitude further acceleration for many scientific domains. As we saw with the Siemens-Mercedes example, accelerated computing is sustainable computing. Whether you're a scientist working on climate modeling, an engineer designing new products, or a data analyst trying to make sense of large datasets, NVIDIA’s solutions can help you do your job better and more efficiently. By harnessing the power of accelerated computing and generative AI, together we can drive innovations across industries while reducing our impact on the environment.

Accelerated computing is a full-stack problem that requires coordinated innovation at all layers of the stack. It starts with amazing hardware. Our CPU, GPU and DPUs are integrated into hardware platforms from edge to on-premise to the cloud. NVIDIA Mellanox Networking is the nervous system of the accelerated data center, connecting the data center to be the new unit of compute with SDKs, frameworks and platforms, we aim to provide researchers the technology to build the software the world needs for the next discovery.

HOLOSCAN is our edge computing and AI platform that captures and analyzes streaming data from medical devices and scientific instruments. NeMo is NVIDIA’s open source AI Framework for training giant AI models at supercomputing scales used by researchers at NVIDIA and everywhere to develop the world's next foundation models. NVIDIA HPC SDK provides all of NVIDIA’s GPU-optimized libraries and tools for developing applications within standard programming languages.

NVIDIA Omniverse provides an open collaborative environment to co-locate multiple heterogeneous types of data and visualize it all. And the world of quantum computing is exploding with cuQuantum and CUDA Quantum, helping bring quantum computing and their applications closer to reality. In fact, supercomputers today are at the center of quantum computing research.

They act as time machines allowing us to simulate future QPUs, unraveling their potential and functionality. Accelerated computing and AI also play pivotal roles in improving the quantum computers of today, allowing researchers to better control, calibrate, and correct the errors to enable the best possible performance. As future supercomputers incorporate quantum processors, NVIDIA is leading the way with an open programming model called CUDA Quantum to facilitate this integration, steering towards the era of quantum accelerated supercomputing. As a result, new breakthroughs in quantum computing are happening every day. BASF researchers pioneered a new hybrid quantum-classical method for simulating chemical catalysts that can shield humans against harmful metals.

Leveraging CUDA Quantum, BASF devised an innovative algorithm for accurately simulating the electronic structure of the chemical catalyst FeNTA, using up to 60 qubits on NVIDIA’s own Eos supercomputer. At Stony Brook and BNL, researchers are simulating particle physics, which can replace experiments that today cost years and billions of dollars to do with particle accelerators for a fraction of the cost. HPE’s research into Quantum Phase Transition aims to understand condensed matter systems and design novel materials.

Using CUDA Quantum on Perlmutter, they accurately simulated a phase transition in the transverse field Ising model, the largest simulation of its kind. These advancements highlight how our platform will play a pivotal role in unlocking the potential of quantum-accelerated supercomputing for science. Exciting advancements continue in this realm with NVIDIA’s cuQuantum and CUDA Quantum and DGX Quantum, fostering strong partnerships. Notably HPE, Dell, Lenovo, Sandbox AQ, Terra Quantum all have embraced NVIDIA’s Quantum Computing platform, offering services and contributing to all the software solutions. All major CSPs and nearly 90% of the leading quantum computing frameworks now leverage cuQuantum acceleration. 80% of all QPU providers are integrating CUDA Quantum, joining the diverse roster that includes Quantinuum, IonQ, IQM, and more.

All benefiting from building on NVIDIA’s quantum platform. In just two years, NVIDIA quantum computing platform has amassed over 120 partners and customer engagement is also on the rise. Over 90% of the top 50 quantum startups are leveraging NVIDIA GPUs for their development. There are over 70,000 monthly downloads of our quantum SDKs from users all across the globe. Our platform is accelerating Israel's quantum computing ecosystem.

Today we're announcing that NVIDIA will be partnering with Classiq on the Quantum Center for Life Sciences at the Tel Aviv Sourasky Medical Center. This partnership integrates DGX H100 and Classiq’s software using CUDA Quantum. We are also announcing that the Israeli National Quantum Center, built by quantum machines, will feature the world's first deployment of DGX Quantum connecting two distinct QPU modalities.

A superconducting QPU by QuantWare and a photonic QPU from Orca Computing, both powered by CUDA Quantum. These strategic partnerships are revolutionizing Israel's research and driving new advancements in quantum technology for diverse scientific fields. Next, we'll hear from Kimberly Powell about how researchers are harnessing generative AI for scientific breakthroughs. Thanks Ian. Generative AI is propelling scientific research across diverse domains from climate and weather prediction to astronomy and disease research.

Generative AI's impact in the field of chemistry and biology is already tremendous and we are just getting started. Groundbreaking work at the University of California, Berkeley built a chemistry assistant that can navigate complex information around metal-organic frameworks or MOFs. The chemistry assistant mined over 26,000 synthesis data points across 800 MOFs, and used this data to build an AI model that can predict crystallization outcomes and a chatbot to answer questions on chemical reactions and synthesis procedures.

Researchers at Northwestern University developed DNABERT, a groundbreaking model that provides a comprehensive understanding of genomic DNA sequences. Building on DNABERT, Instadeep, NVIDIA, and Technical University of Munich developed the nucleotide transformer. It's five times larger and trained on data from different species, both of which contributed to increasing performance. for understanding regulatory code of the genome to predict when and how genes are expressed across cell types and organisms. Researchers from Argonne National Lab, NVIDIA, and University of Chicago leveraged generative AI to address the challenge of identifying new COVID-19 variants. Pre-training on over 110 million genomes, and then fine-tuning the model with data from 1.5 million SARS-CoV-2 genomes,

the model, GenSLIMs, can accurately identify variants of concern. Recently, when the researchers looked back at the nucleotide sequences generated from GenSLIMs, they discovered that specific characteristics of the AI-generated sequences closely matched the real-world Eris and Pirola subvariants that have been prevalent this year. Even though the AI was only trained on COVID-19 virus genomes from 2020.

Now available on NGC NVIDIA’s hub for accelerated software is an interactive visualization of eight different COVID variants to understand how the AI model tracks mutations across various proteins of the viral genome. The visualization helps understand how different parts of the genome can co-evolve, highlighting which snippets of the genome are likely to be seen in a given variant, all of which can help researchers understand new virus vulnerabilities and new forms of resistance. The future is brimming with the possibilities as generative AI continues to redefine the landscape of scientific exploration. The predictive power of the models are proving accurate and are taking us beyond what we have observed experimentally. NVIDIA has developed a broad range of solutions providing cutting-edge, generative AI technologies for scientific researchers. NVIDIA Modulus, an open-source framework, merges physics-based causality with observed data, enabling real-time predictions for engineers.

Through generative AI using diffusion models, it enhances engineering simulations and data fidelity for responsive designs. This framework supports large-scale digital twin models spanning various physics domains, offering improved workflow and cutting-edge AI approaches for enhanced performance. Today, we're announcing the Science and Engineering Teaching Kit, developed in collaboration with Brown University, which includes the Modulus Framework.

It will help teach the next generation of researchers the powerful fusion of AI and physics. NVIDIA NeMo is a multi-modal framework that is at the core of the large language model revolution. It's an end-to-end containerized framework designed for efficient data collection, large-scale model training, and industry- standard benchmark evaluation featuring state-of-the art latency and throughput for large language model training and inference across diverse GPU cluster configurations. It provides advanced parallelization strategies on NVIDIA’s accelerated computing infrastructure, offering high customization, an open-source modular approach, and formalized product support, ensuring stable releases, and the latest research innovations.

BioNeMo builds upon NeMo and is NVIDIA’s platform for building and deploying generative AI and foundation models for computational biology and drug discovery applications. BioNeMo provides drug discovery researchers and developers with speed and scale for building, customizing, and integrating state-of-the-art AI applications across the entire discovery workflow, from disease research and target discovery to optimizing drug candidates for efficacy, safety, and manufacturability. Now, I'll turn it back over to Ian to tell you more about AI Super Computing. Thanks, Kimberly. All of these amazing AI accelerator breakthroughs require world-class AI factories to build, deploy, and maintain scientific foundational AI models at scale.

MLPerf is the leading industry- standard benchmark designed to provide unbiased evaluations of training and inference performance of hardware, software, and services. In this last round of MLPerf, the NVIDIA Eos AI Supercomputer increased its size by 3x, scaling to over 10,000 Hopper GPUs connected via NVIDIA’s Quantum-2 InfiniBand. This new and improved Eos system allowed us to deliver record-shattering large language model performance, setting six new performance records. This round also marked the introduction of text-to-image Gen AI, allowing NVIDIA’s platforms to set a new industry standard for stable diffusion training. We also worked closely with Microsoft Azure to build an AI supercomputer in the cloud that is nearly identical to our very own Eos system.

Azure powers intelligence services like Copilot and ChatGPT with large language models to create and train improved versions of these LLMs, supercomputers with massive computational capabilities are required. Together, we both set a new scale record for large language model training. Using Microsoft's ND H100 v5 cloud equipped with over 10,000 Hopper GPUs interconnected with InfiniBand networking, we were able to triple previous scale in just six months. This resulted in a 3x improvement in the time to train the GPT-3 175 billion parameter model compared to the previous MLPerf round. Model sizes are rapidly expanding, demanding increased computational power.

NVIDIA has consistently spearheaded the adoption and integration of cutting-edge memory standards over time. With the introduction of Volta, Ampere, and Hopper, NVIDIA led the way in pioneering GPUs integrated with the latest HBM2, 2E, and HBM3 memory technologies. The Hopper architecture was designed to be forward-thinking, and supports not just HBM2e and 3, but also HBM3e, ensuring readiness for the future.

This approach enables us to swiftly introduce more advanced products to the market and enhancing our agility. To optimize compute performance, the H200 stands as the world's premier GPU featuring HBM3e memory, marking a significant milestone in our pursuit of maximizing computational capabilities. The H200 offers a remarkable leap in memory performance, boasting 4.8 TBps

of memory bandwidth, and showcasing a substantial 1.4x increase compared to the H100 GPU. H200 also significantly expands memory capacity by nearly 1.8x, reaching a total of 141 GB per GPU.

The integration of faster and more extensive HBM memory serves to accelerate performance across computationally demanding tasks, including generative AI models and HPC applications, while optimizing GPU utilization and efficiency. Moreover, the HGX H200 is seamlessly compatible with HGX H100 systems, allowing our partners to support H200 with the same server systems designed for H100, eliminating the need for redesign. Our relentless pursuit of energy- efficient performance improvements through hardware and software innovation remain a key focus. For LLM performance, we continue to optimize the software stack to extract better results from our GPUs.

H100 is now 11x more performant than A100 on GPT-3 Inference. And we're not stopping there. H200 measured today is 18x more performant than A100 on the same GPT-3. This is just the beginning and we're actively enhancing software optimizations for Hopper, promising continued performance enhancements for both H100 and 200 in the upcoming months. Our roadmap will continue to drive innovation, pushing the boundaries of performance and efficiency. Our leading OEM and CSP partners are working to make HGX H200 systems available everywhere.

Even applications that have adopted accelerated computing often have large portions, which remains CPU limited, Either because the cost of communication to the GPU is too high or refactoring the vast lines of code still running on CPUs hasn’t been taken on. The Grace Hopper Superchip offers a first-of-its-kind NVLink chip-to-chip interconnect so that both the CPU and the GPU can have coherent access to 624 GB of high-speed memory. This capability bridges the gap for legacy CPU applications and makes accelerating HPC in ISO standard languages truly possible. Grace Hopper is nearly 2x more energy efficient than x86 and H100 configurations.

Depending on the needs of the workload, Grace Hopper can dynamically share power between the CPU and GPU to optimize application performance, making it an excellent choice for energy-efficient HPC centers. The Grace Hopper GH200 Superchip is designed to provide an incredible compute capability for the most demanding generative AI and HPC applications like AI chatbots, vector databases, graph neural networks and scientific simulations. Retrieval-augmented generation, or RAG LLM, leverages external files or documents to enhance generative AI model accuracy. This technique provides 11x more energy efficiency than fine-tuning AI methods. A single GH200 using TensorRT-LLM is 100x faster than a dual socket x86 CPU system. The GH200 platform also enables low-latency coupling of quantum computers for error correction and future quantum-accelerated supercomputers.

The Quantum Fourier Transform operation harnesses GH200’s high-bandwidth memory and compute capabilities to achieve a speed of more than 90x compared to dual socket x86 systems. MILC, a popular quantum chromodynamics application used for studying particle physics used a single Grace Hopper superchip to deliver 40x more performance than state-of-the-art dual CPU servers. Lastly, ICON, like many climate and weather applications, sped up 8X over non-accelerated systems with Grace Hopper. We have partners and customers excited to get their hands on GH200 to take advantage of its transformational performance and energy efficiency. Today we've announced that Grace Hopper is coming to dozens of OEM system partners like Dell Technologies, Eviden, HPE, Lenovo, Quanta, and Supermicro. Today, Lambda and Vultr also announced early access to GH200-powered cloud instances.

CoreWeave announced their plans to make GH200 available starting in Q1 of 2024. Grace Hopper early access systems have been purchased by over 50 global enterprises and organizations, including NASA Ames Research and Total Energies. GH200 Superchip will also be accessible via NVIDIA’s LaunchPad next month, providing early access to NVIDIA’s GH200 hardware and software online.

With the introduction of Grace Hopper, a new wave of supercomputers, AI supercomputers, are emerging. If we look back over time, we can see the explosive growth of AI. In 2017, the first GPU-accelerated supercomputers were the Tsubame3 and Piz Daint systems. Sierra and Summit, powered by a combined 45,000 of NVIDIA’s V100 GPUs were the first exascale AI supercomputers delivering a combined peak of seven exaflops of AI.

With the release of Ampere A100 GPUs, AI and Deep Learning were clearly recognized as a tool for science that would revolutionize scientific computing. Systems like Perlmutter, Leonardo, and the JUWELS Booster added another 20 exaflops of AI performance for the scientific community. Today, Grace Hopper is powering the next wave of new exaflop AI systems around the globe.

New systems are being built as we speak, including the Alps system at CSCS, Venado at Los Alamos National Labs, Vista at TACC, Isambard-AI at Bristol and Jupiter at Julich. By the end of 2024, an additional 200 exaflops of AI, all powered by Grace Hopper, will be brought online for the supercomputing community to enjoy. Let's take a closer look at some of the systems coming online next year that will define that new class of Grace Hopper Exascale AI Supercomputers. First, we'll get an update on the Alps system from Thomas Schultess, the director of the Swiss National Supercomputing Centre. The Alps infrastructure that we have started to install in 2020 is expected to be available for research early next year.

We've already stood up the first Grace Hopper-based system with HPE at the Chippewa Falls facility and testing has commenced. Now we are looking forward to the next extension of Alps with thousands of Grace Hopper superchips. I am very confident that Alps will make major contributions to scientific advancements. Foundational models will be leveraged and trained to support verticals in weather climate modeling, medicine, robotics, and many more.

While developing Alps we have been collaborating with MeteoSwiss, ECMWF, as well as scientists from ETH Zurich’s Exclaim and Nvidia's Earth-2 projects to create an infrastructure that will push the envelope in all dimensions of big data analytics and extreme scale computing. We are also working very closely with HPE and Los Alamos National Labs to deliver the Venado system, the first GH200 AI supercomputer that will be deployed in the United States. Today, we're also announcing that TACC, the Texas Advanced Computing Center, has selected NVIDIA Grace CPUs and Grace Hopper superchips to power its Vista system.

Dan Stanzione, Executive Director at TACC, will tell us more about Vista. Vista is the latest in our long line of National Science Foundation funded Open Science systems. This is roughly half comprised of NVIDIA Grace Hopper nodes that are a CPU and a GPU tightly integrated in each node and then roughly half NVIDIA Grace Grace nodes, which is two Arm-based CPUs in a single node for our CPU-only users. We're using the latest generation of InfiniBand technologies to link the nodes together. 200 Gbps between the CPU nodes and at 400 Gbps between the GPU nodes, we have a pretty tightly integrated solution here. InfiniBand, CPU, GPU, all running a common software stack and very low latency integration between all the components.

There's also really impressive energy efficiency gains when we looked across our stacks. We feel like the maturity of that GPU software stack and the wide variety of tools that are out there, both proven in the scientific computing space with CUDA over the last 15 or 20 years, and now, particularly with the proliferation of AI tools that are out there, it makes sense to take a step in this direction and let our user base experience this. VIsta we really see as a bridge between our current Frontera system and we'll follow it along with our next really large scale deployment currently codenamed Horizon.

We hope that it's really a direct follow on to Vista with the next generation of GPUs, Arm CPUs, and InfiniBand from NVIDIA. So this is really the stepping stone to move users from the kinds of systems we've done in the past to looking at this new Grace Arm CPU and Hopper GPU, tightly coupled combination. And we're looking to scale that out by a factor of ten or 15 from what we're deploying with Vista when we deploy Horizon in a couple of years, Congress permitting. We look forward to working closely with Dell Technologies and the team at TACC to bring Vista online next year.

In September, Thomas Lippert, the head of the Jülich Supercomputing Centre, announced that its next supercomputer, Jupiter, will be built on technologies from Eviden, NVIDIA, Partec, and SiPearl. Jupiter will be the world's most powerful AI supercomputer, powered by nearly 24,000 GH200 superchips, all interconnected via NVIDIA InfiniBand, Jupiter is a 90 exaflop AI supercomputer, 45x more than Jülich’s previous JUWELS Booster system, and will deliver 1 exaflop of performance on HPC applications as well. Next, we'll hear from Kristel Michelson, who leads Jülich’s research group on Quantum Information Processing. She'll tell us more about how Jupiter will be used to power scientific innovation.

Jupiter is a new class of supercomputer, and it's a system designed for AI and simulations. It's the world's strongest compute booster, and it's advancing research in foundational models like Physics-ML, LLM, and diffusion. It will revolutionize scientific research across climate, materials, drug discovery, and quantum computing. Jupiter’s architecture also allows for the seamless integration of quantum algorithms which parallel HPC algorithms. And this is mandatory for effective quantum-HPC hybrid simulations.

The Jupiter system will be capable of simulating up to 50 perfect qubits, which we also call ideal qubits, using state vector simulations. Collaboration is really at the heart of our strategy to explore Jupiter's extraordinary performance capabilities in full. So together with NVIDIA, Eviden, and Partec, we are embedded in a perfect complementary partnership and this we see as crucial to the success of our very ambitious project. Jupiter will feature a new quad NVIDIA GH200 superchip configuration.

The quad GH200 features an innovative node architecture with a total of 288 Arm Neoverse cores all capable of achieving 16 petaflops of AI performance alone with 2.3 TBps of high-speed memory access. Each GH200 in the four-way system is connected via the high-speed NVIDIA NVLink chip to chip connection providing a fully coherent architecture. We are thrilled to announce that NVIDIA will partner with Eviden to deliver a GH200-powered AI supercomputer based on Eviden’s BullSequana XH3000 liquid-cooled architecture. Today we're also announcing that the HPE Cray EX2500 system with the same quad GH200 architecture will power many of the first AI supercomputers coming online next year. Today, we've covered a lot of exciting news.

Continued full-stack innovation in accelerated computing is paving the path for sustainable computing. We announced the NVIDIA HGX H200, the world's leading AI computing platform. The combination of faster and larger HBM memory accelerates the performance of computationally intensive workloads, like generative AI and HPC applications.

And it's not just about the hardware. NVIDIA is delivering software solutions to power the workloads of the modern supercomputer. This includes solutions like cuQuantum and CUDA Quantum, the foundation of today's quantum computing ecosystem. NVIDIA Modulus, NeMo, and BioNeMo are enabling researchers to develop and deploy foundational AI models for science. These hardware and software innovations are creating a new class of AI supercomputers.

The New Grace Hopper AI supercomputers coming online next year will deliver an additional 200 exaflops of AI performance for this community. As we navigate this journey. These advancements provide not just amazing technology, but a more sustainable and impactful future. Thank you for joining us today and I look forward to seeing all of you at Supercomputing 2023.

2023-11-14

Show video