AWS Trainium Customers

See how customers are using AWS Trainium to build, train, and fine-tune deep learning models.

Anthropic

At Anthropic, millions of people rely on Claude daily for their work. We're announcing two major advances with AWS: First, a new "latency-optimized mode" for Claude 3.5 Haiku which runs 60% faster on Trainium2 via Amazon Bedrock. And second, Project Rainier—a new cluster with hundreds of thousands of Trainium2 chips delivering hundreds of exaflops, which is over five times the size of our previous cluster. Project Rainier will help power both our research and our next generation of scaling. For our customers, this means more intelligence, lower prices, and faster speeds. We're not just building faster AI, we're building trustworthy AI that scales.

Tom Brown, Chief Compute Officer at Anthropic
Databricks

Databricks’ Mosaic AI enables organizations to build and deploy quality Agent Systems. It is built natively on top of the data lakehouse, enabling customers to easily and securely customize their models with enterprise data and deliver more accurate and domain-specific outputs. Thanks to Trainium's high performance and cost-effectiveness, customers can scale model training on Mosaic AI at a low cost. Trainium2’s availability will be a major benefit to Databricks and its customers as demand for Mosaic AI continues to scale across all customer segments and around the world. Databricks, one of the largest data and AI companies in the world, plans to use TRN2 to deliver better results and lower TCO by up to 30% for its customers.

Naveen Rao, VP of Generative AI at Databricks
poolside

At poolside, we are set to build a world where AI will drive the majority of economically valuable work and scientific progress. We believe that software development will be the first major capability in neural networks that reaches human-level intelligence because it's the domain where we can combine Search and Learning approaches the best. To enable that, we're building foundation models, an API, and an Assistant to bring the power of generative AI to your developers' hands (or keyboard). A major key to enable this technology, is the infrastructure we are using to build and run our products. With AWS Trainium2 our customers will be able to scale their usage of poolside at a price performance ratio unlike other AI accelerators. In addition, we plan to train future models with Trainium2 UltraServers with expected savings of 40% compared to EC2 P5 instances.

Eiso Kant, CTO & Co-founder, poolside
Itaú Unibanco

Itaú Unibanco's purpose is to improve people's relationship with money, creating positive impact on their lives while expanding their opportunities for transformation. At Itaú Unibanco, we believe that each customer is unique and we focus on meeting their needs through intuitive digital journeys, that leverage the power of AI to constantly adapt to their consumer habits.

We have tested AWS Trainium and Inferentia across various tasks, ranging from standard inference to fine-tuned applications. The performance of these AI chips have enabled us to achieve significant milestones in our research and development. For both batch and online inference tasks, we have seen a 7x improvement in throughput compared to GPUs. This enhanced performance is driving the expansion of more use cases across the organization. The latest generation of Trainium2 chips unlocks groundbreaking features for GenAI and opens the door for innovation at Itau.

Vitor Azeka, Head of Data Science at Itaú Unibanco
NinjaTech AI

Ninja is an All-In-One AI Agent for Unlimited Productivity: one simple subscription, unlimited access to world’s best AI models along with top AI skills such as: writing, coding, brainstorming, image generation, online research. Ninja is an agentic platform and offers “SuperAgent” which uses Mixture-of-agents with world class accuracy comparable to (and in some categories it’s beating) frontier foundation models. Ninja’s Agentic technology demands the highest performance accelerators, to deliver the unique real- time experiences our customers expect.

We are extremely excited for the launch of AWS TRN2 because we believe it’ll offer the best cost per token performance and most the fastest speed currently possible for our core model Ninja LLM which is based off of Llama 3.1 405B. It’s amazing to see Trn2’s low latency coupled with competitive pricing and on-demand availability; we couldn’t be more excited about Trn2’s arrival!

Babak Pahlavan, Founder & CEO, NinjaTech AI
Ricoh

The RICOH machine learning team develops workplace solutions and digital transformation services designed to manage and optimize the flow of information across our enterprise solutions.

The migration to Trn1 instances was easy and straightforward. We were able to pretrain our 13B parameter LLM in just 8 days, utilizing a cluster of 4,096 Trainium chips! After the success we saw with our smaller model, we fine-tuned a new, larger LLM based on Llama-3-Swallow-70B, and leveraging Trainium we were able to reduce our training costs by 50% and improve the energy efficiency by 25% as compared to using latest GPU machines in AWS. We are excited to leverage the latest generation of AWS AI Chips, Trainium2, to continue to provide our customers with the best performance at the lowest cost.

Yoshiaki Umetsu, Director, Digital Technology Development Center, Ricoh
PyTorch

What I liked most about AWS Neuron NxD Inference library is how seamlessly it integrates with PyTorch models. NxD's approach is straightforward and user-friendly. Our team was able to onboard HuggingFace PyTorch models with minimal code changes in a short time frame. Enabling advanced features like Continuous Batching and Speculative Decoding was straightforward. This ease of use enhances developer productivity, allowing teams to focus more on innovation and less on integration challenges.

Hamid Shojanazeri , PyTorch Partner Engineering Lead, Meta
Refact.ai

Refact.ai offers comprehensive AI tools such as code auto-completion powered by Retrieval-Augmented Generation (RAG), providing more accurate suggestions, and a context-aware chat using both proprietary and open-source models.

Customers have seen up to 20% higher performance and 1.5x higher tokens per dollar with EC2 Inf2 instances compared to EC2 G5 instances. Refact.ai’s fine-tuning capabilities further enhance our customers’ ability to understand and adapt to their organizations’ unique codebase and environment. We are also excited to offer the capabilities of Trainium2, that will bring even faster, more efficient processing to our workflows. This advanced technology will enable our customers to accelerate their software development process, by boosting developer productivity while maintaining strict security standards for their code base.

Oleg Klimov CEO & Founder, Refact.ai
Karakuri Inc.

KARAKURI, builds AI tools to improve the efficiency of web based customer support and simplify customer experiences. These tools include AI chatbots equipped with generative AI functions an FAQ centralization tools, and an email response tool, all of which improve the efficiency and quality of customer support. Utilizing AWS Trainium, we succeeded in training KARAKURI LM 8x7B Chat v0.1. For startups, like ourselves, we need to optimize the time to build and the cost required to train LLMs. With the support of AWS Trainium and AWS Team, we were able to develop a practical level LLM in a short period of time. Also, by adopting AWS Inferentia, we were able to build a fast and cost-effective inference service. We're energized about Trainium2 because it will revolutionize our training process, reducing our training time by 2x and driving efficiency to new heights!

Tomofumi Nakayama, Co-Founder, Karakuri Inc.
Stockmark Inc.

With the mission of “reinventing the mechanism of value creation and advancing humanity,” Stockmark helps many companies create and build innovative businesses by providing cutting-edge natural language processing technology. Stockmark’s new data analysis and gathering service called Anews and SAT, a Data structuring service that dramatically improves generative AI uses by organizing all forms of information stored in an organization, required us to rethink how we built and deployed models to support these products. With 256 Trainium accelerators, we have developed and released stockmark- 13b, a large language model with 13 billion parameters, pre-trained from scratch on a Japanese corpus dataset of 220B tokens. Trn1 instances helped us to reduce our training costs by 20%. Leveraging Trainium, we successfully developed an LLM that can answer business- critical questions for professionals with unprecedented accuracy and speed. This achievement is particularly noteworthy given the widespread challenge companies face in securing adequate computational resources for model development. With the impressive speed and cost reduction of Trn1 instances, we are excited to see the additional benefits that Trainium2 will bring to our workflows and customers.

Kosuke Arima, CTO and Co-founder, Stockmark Inc.
Brave

Brave is an independent browser and search engine dedicated to prioritizing user privacy and security. With over 70 million users, we deliver industry-leading protections that make the Web safer and more user-friendly. Unlike other platforms that have shifted away from user-centric approaches, Brave remains committed to putting privacy, security, and convenience first. Key features include blocking harmful scripts and trackers, AI- assisted page summaries powered by LLMs, built-in VPN services, and more. We continually strive to enhance the speed and cost-efficiency of our search services and AI models. To support this, we’re excited to leverage the latest capabilities of AWS AI chips, including Trainium2, to improve user experience as we scale to handle billions of search queries monthly.

Subu Sathyanarayana , VP of Engineering, Brave Software
Anyscale

Anyscale is the company behind Ray, an AI Compute Engine that fuels ML, and Generative AI initiatives for Enterprises. With Anyscale's unified AI platform driven by RayTurbo, customers see up to 4.5x faster data processing, 10X lower cost batch inference with LLMs, 5x faster scaling, 12X faster iteration, and cost savings of 50% for online model inference by optimizing utilization of resources.

At Anyscale, we’re committed to empowering enterprises with the best tools to scale AI workloads efficiently and cost- effectively. With native support for AWS Trainium and Inferentia chips, powered by our RayTurbo runtime, our customers have access to high performing, cost effective options for model training and serving. We are now excited to join forces with AWS on Trainium2, unlocking new opportunities for our customers to innovate rapidly, and deliver high-performing transformative AI experiences at scale.

Robert Nishihara, Cofounder, Anyscale
Datadog

Datadog, the observability and security platform for cloud applications, provides AWS Trainium and Inferentia Monitoring for customers to optimize model performance, improve efficiency, and reduce costs. Datadog’s integration provides full visibility into ML operations and underlying chip performance, enabling proactive issue resolution and seamless infrastructure scaling. We're excited to extend our partnership with AWS for the AWS Trainium2 launch, which helps users cut AI infrastructure costs by up to 50% and boost model training and deployment performance.

Yrieix Garnier, VP of Product Company, Datadog
Hugging Face

Hugging Face is the leading open platform for AI builders, with over 2 million models, datasets and AI applications shared by a community of more than 5 million researchers, data scientists, machine learning engineers and software developers. We have been collaborating with AWS over the last couple of years, making it easier for developers to experience the performance and cost benefits of AWS Inferentia and Trainium through the Optimum Neuron open source library, integrated in Hugging Face Inference Endpoints, and now optimized within our new HUGS self-deployment service, available on the AWS Marketplace. With the launch of Trainium2, our users will access even higher performance to develop and deploy models faster.

Jeff Boudier, Head of Product, Hugging Face
Lightning AI

Lightning AI, the creator of PyTorch Lightning and Lightning Studios offers the most intuitive, all-in-one AI development platform for enterprise-grade AI. Lightning provides full code, low-code and no-code tools to build agents, AI applications and generative AI solutions, Lightning fast. Designed for flexibility, it runs seamlessly on your cloud or ours leveraging the expertise and support of a 3M+ strong developer community.

Lightning now natively offers support for AWS AI Chips, Trainium and Inferentia, which are integrated across Lightning Studios and our open-source tools like PyTorch Lightning, Fabric, and LitServe. This gives users seamless capability to pretrain, fine-tune, and deploy at scale—optimizing cost, availability, and performance with zero switching overhead, and the performance and cost benefits of AWS AI Chips, including the latest generation of Trainium2 chips, delivering higher performance at lower cost.

Luca Antiga, CTO, Lightning AI
Domino Data Lab

Domino orchestrates all data science artifacts, including infrastructure, data, and services on AWS across environments - complementing Amazon SageMaker with governance and collaboration capabilities to support enterprise data science teams. Domino is available via AWS Marketplace as SaaS or self-managed.

Leading enterprises must balance technical complexity, costs, and governance, mastering expansive AI options for a competitive advantage. At Domino, we're committed to giving customers access to cutting-edge technologies. With compute as a bottleneck for so much groundbreaking innovation, we're proud to give customers access to Trainium2 so they can train and deploy models with higher performance, lower cost, and better energy efficiency.

Nick Elprin, CEO and Co-Founder, Domino Data Lab
Scale.ai

Scale is accelerating the development of AI applications. With Scale Gen AI solutions, we help enterprises accelerate generative AI adoption and increase ROI by generating high quality data and providing technology solutions that allow our customers to build, deploy and evaluate the best ai tools and applications. Earlier this year, Scale partnered with AWS to be their first model customization and evaluation partner. As we help our customers accelerate their AI roadmap to build Gen AI solutions, we will offer AWS Trainium and Inferentia to reduce training and deployment cost for their open source models. We are exctied to see AWS Trainium 2 bring greater cost savings.

Vijay Kaunamurthy Field CTO
Money Forward, Inc.

Money Forward, Inc. serves businesses and individuals with an open and fair financial platform.

We launched a large-scale AI chatbot service on the Amazon EC2 Inf1 instances and reduced our inference latency by 97% over comparable GPU-based instances while also reducing costs. As we keep fine-tuning tailored NLP models periodically, reducing model training times and costs is also important. Based on our experience from successful migration of inference workload on Inf1 instances and our initial work on AWS Trainium-based EC2 Trn1 instances, we expect Trn1 instances will provide additional value in improving end-to-end ML performance and cost.

Takuya Nakade, CTO, Money Forward, Inc.
Mimecast

At Mimecast, we process around 1.4 billion emails every day and analyze them for potential risk. It’s a crucial task, and it’s vital we deliver safe emails, free of risk and without delay. Our customers span more than 100 countries, and on average, each organization uses 4.9 Mimecast services. The platform includes advanced email security, collaboration security, email archive, DMARC, insider risk protection and security awareness with a human-centric approach. We don’t want to sacrifice on accuracy, so we built our models in house to achieve precision and recall levels well above 90%. Based on these requirements, Inferentia 2 instances were the most appropriate way forward. Inferentia 2’s exceptional efficiency allows us to achieve remarkable latency, delivering real-time experiences for our customers. AWS AI Chips combined with SageMaker makes it very easy to horizontally scale to meet our real-time demand and we use a custom scheduled scaling policy to scale up to 100’s of instances at peak hours with nearly zero latency overheads.

Felix Laumann Director - Data science
Jax (Google)

AWS Neuron is designed to make it easy to use popular frameworks like JAX with Trainium while minimizing code changes and tie-in to vendor-specific solutions. Google and AWS are collaborating to enable customers to get started with Trn2 instances quickly using JAX for large-scale training and inference through its native OpenXLA integration. With broad collaboration and now the availability of Trainium2, Google expects to see increased adoption of JAX—a significant milestone for the entire ML community.

Bill Jia VP engineering Google
Watashiha

Watashiha offers an innovative and interactive AI chatbot service, “OGIRI AI,” which incorporates humor to provide a funny answer on the spot for a question.

We use Large Language Models to incorporate humor and offer a more relevant and conversational experience to our customers on our AI services. This requires us to pre-train and fine-tune these models frequently. We pre-trained a GPT-based Japanese model on the EC2 Trn1.32xlarge instance, leveraging tensor and data parallelism. The training was completed within 28 days at a 33% cost reduction over our previous GPU based infrastructure. As our models rapidly continue to grow in complexity, we are looking forward to Trn1n instances which has double the network bandwidth of Trn1 to speed up training of larger models.

Yohei Kobashi, CTO, Watashiha, K.K.
Amazon

Amazon’s product search engine indexes billions of products, serves billions of customer queries daily, and is one of the most heavily used services in the world.

We are training large language models (LLM) that are multi-modal (text + image), multilingual, multi-locale, pre-trained on multiple tasks, and span multiple entities (products, queries, brands, reviews, etc.) to improve the customer shopping experience. Trn1 instances provide a more sustainable way to train LLMs by delivering the best performance/watt compared to other accelerated machine-learning solutions and offers us high performance at the lowest cost. We plan to explore the new configurable FP8 datatype, and hardware-accelerated stochastic rounding to further increase our training efficiency and development velocity.

Trishul Chilimbi, VP, Amazon Search
Meta

What I liked most about AWS Neuron NxD Inference library is how seamlessly it integrates with PyTorch models.NxD's approach is straightforward and user-friendly. Our team was able to onboard HuggingFace PyTorch models with minimal code changes in a short time frame. Enabling advanced features like Continuous Batching and Speculative Decoding was straightforward. This ease of use enhances developer productivity, allowing teams to focus more on innovation and less on integration challenges.

Hamid Shojanazeri, Leading Pytorch Partner Engineering Meta