Amazon EC2 P5 Instances

Highest performance GPU-based instances for deep learning and HPC applications

Why Amazon EC2 P5 Instances?

Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, powered by NVIDIA H100 Tensor Core GPUs, and P5e and P5en instances powered by NVIDIA H200 Tensor Core GPUs deliver the highest performance in Amazon EC2 for deep learning (DL) and high performance computing (HPC) applications. They help you accelerate your time to solution by up to 4x compared to previous-generation GPU-based EC2 instances, and reduce cost to train ML models by up to 40%. These instances help you iterate on your solutions at a faster pace and get to market more quickly. You can use P5, P5e, and P5en instances for training and deploying increasingly complex large language models (LLMs) and diffusion models powering the most demanding generative artificial intelligence (AI) applications. These applications include question answering, code generation, video and image generation, and speech recognition. You can also use these instances to deploy demanding HPC applications at scale for pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling.

To deliver these performance improvements and cost savings, P5 and P5e instances complement NVIDIA H100 and H200 Tensor Core GPUs with 2x higher CPU performance, 2x higher system memory, and 4x higher local storage as compared to previous-generation GPU-based instances. P5en instances pair NVIDIA H200 Tensor Core GPUs with high performance Intel Sapphire Rapids CPU, enabling Gen5 PCIe between CPU and GPU. P5en instances provide up to 2x the bandwidth between CPU and GPU and lower network latency compared to P5e and P5 instances thereby improving distributed training performance. P5 and P5e instances support provide up to 3,200 Gbps of networking using second-generation Elastic Fabric Adapter (EFA). P5en, with third generation of EFA using Nitro v5, shows up to 35% improvement in latency compared to P5 that uses the previous generation of EFA and Nitro. This helps improve collective communications performance for distributed training workloads such as deep learning, generative AI, real-time data processing, and high-performance computing (HPC) applications. To deliver large-scale compute at low latency, these instances are deployed in Amazon EC2 UltraClusters that enable scaling up to 20,000 H100 or H200 GPUs interconnected with a petabit-scale nonblocking network. P5, P5e, and P5en instances in EC2 UltraClusters can deliver up to 20 exaflops of aggregate compute capability—performance equivalent to a supercomputer.

Amazon EC2 P5 Instances

Benefits

P5, P5e, and P5en instances can train ultra-large generative AI models at scale and deliver up to 4x the performance of previous-generation GPU-based EC2 instances.

P5, P5e, and P5en instances reduce training times and time to solution from weeks to just a few days. This helps you iterate at a faster pace and get to market more quickly.

P5, P5e, and P5en instances deliver up to 40% savings on DL training and HPC infrastructure costs compared to previous-generation GPU-based EC2 instances.

P5, P5e, and P5en instances provide up to 3,200 Gbps of EFA networking. These instances are deployed in EC2 UltraClusters and deliver 20 exaflops of aggregate compute capability.

Features

P5 instances provide up to 8 NVIDIA H100 GPUs with a total of up to 640 GB HBM3 GPU memory per instance. P5e and P5en instances provide up to 8 NVIDIA H200 GPUs with a total of up to 1128 GB HBM3e GPU memory per instance. Both instances support up to 900 GB/s of NVSwitch GPU interconnect (total of 3.6 TB/s bisectional bandwidth in each instance), so each GPU can communicate with every other GPU in the same instance with single-hop latency.

NVIDIA H100 and H200 GPUs have a new transformer engine that intelligently manages and dynamically chooses between FP8 and 16-bit calculations. This feature helps deliver faster DL training speedups on LLMs compared to previous-generation A100 GPUs. For HPC workloads, NVIDIA H100 and H200 GPUs have new DPX instructions that further accelerate dynamic programming algorithms as compared to A100 GPUs.

P5, P5e, and P5en instances deliver up to 3,200 Gbps of EFA networking. EFA is also coupled with NVIDIA GPUDirect RDMA to enable low- latency GPU-to-GPU communication between servers with operating system bypass.

P5, P5e and P5en instances support Amazon FSx for Lustre file systems so you can access data at the hundreds of GB/s of throughput and millions of IOPS required for large-scale DL and HPC workloads. Each instance also supports up to 30 TB of local NVMe SSD storage for fast access to large datasets. You can also use virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

Customer testimonials

Here are some examples of how customers and partners have achieved their business goals with Amazon EC2 P4 instances.

  • Anthropic

    Anthropic builds reliable, interpretable, and steerable AI systems that will have many opportunities to create value commercially and for public benefit.

    At Anthropic, we are working to build reliable, interpretable, and steerable AI systems. While the large general AI systems of today can have significant benefits, they can also be unpredictable, unreliable, and opaque. Our goal is to make progress on these issues and deploy systems that people find useful. Our organization is one of the few in the world that is building foundational models in DL research. These models are highly complex, and to develop and train these cutting-edge models, we need to distribute them efficiently across large clusters of GPUs. We are using Amazon EC2 P4 instances extensively today, and we are excited about the launch of P5 instances. We expect them to deliver substantial price-performance benefits over P4d instances, and they'll be available at the massive scale required for building next-generation LLMs and related products.

    Tom Brown, Cofounder, Anthropic
  • Cohere

    Cohere, a leading pioneer in language AI, empowers every developer and enterprise to build incredible products with world-leading natural language processing (NLP) technology while keeping their data private and secure

    Cohere leads the charge in helping every enterprise harness the power of language AI to explore, generate, search for, and act upon information in a natural and intuitive manner, deploying across multiple cloud platforms in the data environment that works best for each customer. NVIDIA H100-powered Amazon EC2 P5 instances will unleash the ability of businesses to create, grow, and scale faster with its computing power combined with Cohere's state-of-the-art LLM and generative AI capabilities.

    Aidan Gomez, CEO, Cohere
  • Hugging Face

    Hugging Face is on a mission to democratize good ML.

    As the fastest-growing open-source community for ML, we now provide over 150,000 pretrained models and 25,000 datasets on our platform for NLP, computer vision, biology, reinforcement learning, and more. With significant advances in LLMs and generative AI, we're working with AWS to build and contribute the open-source models of tomorrow. We're looking forward to using Amazon EC2 P5 instances via Amazon SageMaker at scale in UltraClusters with EFA to accelerate the delivery of new foundation AI models for everyone.

    Julien Chaumond, CTO and Cofounder, Hugging Face

Product details

Instance Size vCPUs Instance Memory (TiB) GPU   GPU memory Network Bandwidth (Gbps) GPUDirect RDMA GPU Peer to Peer Instance Storage (TB) EBS Bandwidth (Gbps)
p5.48xlarge 192 2 8 H100 640 GB
HBM3
3200 Gbps EFA Yes 900 GB/s NVSwitch 8 x 3.84 NVMe SSD 80
p5e.48xlarge 192 2 8 H200 1128 GB
HBM3e
3200 Gbps EFA Yes 900 GB/s NVSwitch 8 x 3.84 NVMe SSD 80
p5en.48xlarge 192 2 8 H200 1128 GB HBM3e 3200 Gbps EFA Yes 900 GB/s NVSwitch 8 x 3.84 NVMe SSD 100

Getting started with ML use cases

SageMaker is a fully managed service for building, training, and deploying ML models. With SageMaker HyperPod, you can more easily scale to tens, hundreds, or thousands of GPUs to train a model quickly at any scale without worrying about setting up and managing resilient training clusters.

DLAMI provides ML practitioners and researchers with the infrastructure and tools to accelerate DL in the cloud, at any scale. Deep Learning Containers are Docker images preinstalled with DL frameworks to streamline the deployment of custom ML environments by letting you skip the complicated process of building and optimizing your environments from scratch.

If you prefer to manage your own containerized workloads through container orchestration services, you can deploy P5, P5e, and P5en instances with Amazon EKS or Amazon ECS.

Getting started with HPC use cases

P5, P5e, and P5en instances are an ideal platform to run engineering simulations, computational finance, seismic analysis, molecular modeling, genomics, rendering, and other GPU-based HPC workloads. HPC applications often require high network performance, fast storage, large amounts of memory, high compute capabilities, or all of the above. All three instance types support EFA that enables HPC applications using the Message Passing Interface (MPI) to scale to thousands of GPUs. AWS Batch and AWS ParallelCluster help HPC developers quickly build and scale distributed HPC applications.

Learn more