Pick the right ML services and frameworks to support your work
Purpose | Help determine which AWS ML services are the best fit for your needs. |
Last updated | May 3, 2024 |
Covered services |
Introduction
At its most basic, machine learning (ML) is designed to provide digital tools and services to learn from data, identify patterns, make predictions, and then act on those predictions. Almost all artificial intelligence (AI) systems today are created using ML. ML uses large amounts of data to create and validate decision logic. This decision logic forms the basis of the AI model.
Scenarios where AWS machine learning services may be applied include:
-
Specific use cases — AWS machine learning services can support your AI powered use cases with a broad range of pre-built algorithms, models, and solutions for common use cases and industries. You have a choice of 23 pre-trained services, including Amazon Personalize, Amazon Kendra, and Amazon Monitron.
-
Customizing and scaling machine learning — Amazon SageMaker AI is designed to help you build, train, and deploy ML models for any use case. You can build your own or access open source foundational models on AWS through Amazon SageMaker AI and Amazon Bedrock.
-
Accessing specialized infrastructure — Use the ML frameworks and infrastructure provided by AWS when you require even greater flexibility and control over your machine learning workflows, and are willing to manage the underlying infrastructure and resources yourself.
This decision guide will help you ask the right questions, evaluate your criteria and business problem, and determine which services are the best fit for your needs.
In this 7 minute video excerpt, Rajneesh Singh, general manager of Amazon SageMaker AI Low-Code/No-Code team at AWS, explains how machine learning can address business problems.
Understand
As organizations continue to adopt AI and ML technologies, the importance of understanding and choosing among AWS ML services is an on-going challenge.
AWS provides a range of ML services designed to help organizations to build, train, and deploy ML models more quickly and easily. These services can be used to solve a wide range of business problems such as customer churn prediction, fraud detection, and image and speech recognition.

Before diving deeper into AWS ML services, let's look at the relationship between AI and ML.
-
At a high level, artificial intelligence is a way to describe any system that can replicate tasks that previously required human intelligence. Most AI use cases are looking for a probabilistic outcome—making a prediction or decision with a high degree of certainty, similar to human judgement.
-
Almost all AI systems today are created using machine learning. ML uses large amounts of data to create and validate decision logic, which is known as a model.
-
Classification AI is a subset of ML that recognizes patterns to identify something. Predictive AI is a subset of ML that predicts future trends based on statistical patterns an historical data.
-
Finally, generative AI is a subset of deep learning that can create new content and ideas, like conversations, stories, images, videos, and music. Generative AI is powered by very large models that are pretrained on vast corpora of data, called the Foundation Models or FMs. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs for building and scaling generative AI applications. Amazon Q Developer and Amazon Q Business are generative-AI powered assistants for specific use cases.
This guide is designed primarily to cover services in the Classification AI and Predictive AI machine learning categories.
In addition, AWS offers specialized, accelerated hardware for high performance ML training and inference.
-
Amazon EC2 P5
instances are equipped with NVIDIA H100 Tensor Core GPUs, which are well-suited for both training and inference tasks in machine learning. Amazon EC2 G5 instances feature up to 8 NVIDIA A10G Tensor Core GPUs, and second generation AMD EPYC processors, for a wide range of graphics-intensive and machine learning use cases. -
AWS Trainium
is the second-generation ML accelerator that AWS has purpose-built for deep learning (DL) training of 100B+ parameter models. -
AWS Inferentia2-based Amazon EC2 Inf2 instances
are designed to deliver high performance at the lowest cost in Amazon EC2 for your DL and generative AI inference applications.
Consider
When solving a business problem with AWS ML services, consideration of several key criteria can help ensure success. The following section outlines some of the key criteria to consider when choosing a ML service.
Problem definition
The first step in the ML lifecycle is to frame the business problem. Understanding the problem you are trying to solve is essential for choosing the right AWS ML service, as different services are designed to address different problems. It is also important to determine whether ML is the best fit for your business problem.
Once you have determined that ML is the best fit, you can start by choosing from a range of purpose-built AWS AI services (in areas such as speech, vision and documents).
Amazon SageMaker AI provides fully managed infrastructure if you need to build and train your own models. AWS offers an array of advanced ML frameworks and infrastructure choices for the cases where you require highly customized and specialized ML models. AWS also offers a broad set of popular foundation models for building new applications with generative AI.
Choose
Now that you know the criteria by which you will be evaluating your ML service options, you are ready to choose which AWS ML service is right for your organizational needs. The following table highlights which ML services are optimized for which circumstances. Use it to help determine the AWS ML service that is the best fit for your use case.
Categories | When would you use it? | What is it optimized for? | Related AI/ML services or environments |
---|---|---|---|
Specific use cases These artificial intelligence services are intended to meet specific needs. They include personalization, forecasting, anomaly detection, speech transcription, and others. Since they are delivered as services, they can be embedded into applications without requiring any ML expertise. |
Use the AI services provided by AWS when you require specific, pre-built functionalities to be integrated into your applications, without the need for extensive customizations or machine learning expertise. These services are designed to be easy to use and do not require much coding or configuration. | These services are designed to be easy to use and do not require much coding, configuration, or ML expertise. | |
ML services These services can be used to develop customized machine learning models or workflows that go beyond the pre-built functionalities offered by the core AI services. |
Use these services when when you need more customized machine learning models or workflows that go beyond the pre-built functionalities offered by the core AI services. | These services are optimized for building and training custom machine learning models, large-scale training on multiple instances or GPU clusters, more control over machine learning model deployment, real-time inference, and for building end-to-end workflows. | |
Infrastructure To deploy machine learning in production, you need cost-effective infrastructure, which Amazon enables with AWS-built silicon. |
Use when you want to achieve the lowest cost for training models and need to run inference in the cloud. | Optimized for supporting the cost-effective deployment of machine learning. | |
Tools and associated services These tools and associated services are designed to help you ease deployment of machine learning. |
These services and tools are designed to help you accelerate deep learning in the cloud, providing Amazon machine images, docker images and entity resolution. | Optimized for helping you accelerate deep learning in the cloud. |
Use
Now that you have a clear understanding of the criteria you need to apply in choosing an AWS ML service, you can select which AWS AI/ML service(s) are optimized for your business needs.
To explore how to use and learn more about the service(s) you have chosen, we have provided three sets of pathways to explore how each service works. The first set of pathways provides in-depth documentation, hands-on tutorials, and resources to get started with Amazon Comprehend, Amazon Textract, Amazon Translate, Amazon Lex, Amazon Polly, Amazon Rekognition, and Amazon Transcribe.
-
Get started with Amazon Comprehend
Use the Amazon Comprehend console to create and run an asynchronous entity detection job.
-
Analyze insights in text with Amazon Comprehend
Learn how to use Amazon Comprehend to analyze and derive insights from text.
-
Amazon Comprehend Pricing
Explore information on Amazon Comprehend pricing and examples.
The second set of AI/ML AWS service pathways provide in-depth documentation, hands-on tutorials, and resources to get started with the services in the Amazon SageMaker AI family.
-
How Amazon SageMaker AI works
Explore the overview of machine learning and how SageMaker AI works.
-
Getting started with Amazon SageMaker AI
Learn how to join an Amazon SageMaker AI Domain, giving you access to Amazon SageMaker AI Studio and RStudio on SageMaker AI.
-
Use Apache Spark with Amazon SageMaker AI
Learn how to use Apache Spark for preprocessing data and SageMaker AI for model training and hosting.
-
Use Docker containers to build models
Explore how Amazon SageMaker AI makes extensive use of Docker containers for build and runtime tasks. Learn how to deploy the pre-built Docker images for its built-in algorithms and the supported deep learning frameworks used for training and inference.
-
Machine learning frameworks and languages
Learn how to get started with SageMaker AI using the Amazon SageMaker AI Python SDK.
The third set of AI/ML AWS service pathways provide in-depth documentation, hands-on tutorials, and resources to get started with AWS Trainium, AWS Inferentia, and Amazon Titan.
-
Scaling distributed training with AWS Trainium and Amazon EKS
Learn how you can benefit from the general availability of Amazon EC2 Trn1 instances powered by AWS Trainium—a purpose-built ML accelerator optimized to provide a high-performance, cost-effective, and massively scalable platform for training deep learning models in the cloud.
-
Overview of AWS Trainium
Learn about AWS Trainium, the second-generation machine learning (ML) accelerator that AWS purpose built for deep learning training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (EC2) Trn1 instance deploys up to 16 AWS Trainium accelerators to deliver a high-performance, low-cost solution for deep learning (DL) training in the cloud.
-
Recommended Trainium Instances
Explore how AWS Trainium instances are designed to provide high performance and cost efficiency for deep learning model inference workloads.
Explore
-
Architecture diagrams
These reference architecture diagrams show examples of AWS AI and ML services in use.
-
Whitepapers
Explore whitepapers to help you get started and learn best practices in choosing and using AI/ML services.
-
AWS Solutions
Explore vetted solutions and architectural guidance for common use cases for AI and ML services.
Resources
Foundation models
Supported foundation models include:
Using Amazon Bedrock, you can experiment with a variety of foundation models and privately customize them with your data.
Use case or industry-specific services
Associated blog posts