In the past I've worked with SageMaker Deployment through Jupyter Notebooks and Python scripts. This is completely fine, but often times in the scope of a larger applications, you need to be able to define your SageMaker resources with the rest of your infrastructure in a central template. This brings in the idea of Infrastructure as Code, which then brings in AWS CloudFormation. When it comes to productionizing applications it's essential to be able to capture your resources in a central template of sorts, it becomes really difficult to maintain or manage isolated processes for notebooks or individual scripts.
In this article we'll take a look at how you can use CloudFormation to define your SageMaker resources and create a Real-Time Endpoint. Utilizing this template you should be able to extrapolate and create CloudFormation templates for other SageMaker Inference options such as Multi-Model Endpoints, Serverless Inference, and Asynchronous Inference.
NOTE: For those of you new to AWS, make sure you make an account at the following link if you want to follow along. Make sure to also have the AWS CLI installed to work with the example. This article will also assume basic knowledge of CloudFormation, take a look at this article here if you need a starting guide. The article also assumes an intermediate understanding of SageMaker Deployment, I would suggest following this article for understanding Deployment/Inference more in depth, we will be using the same model in this article and mapping it over to CloudFormation.
Setup
Before we can get to building our CloudFormation template we need to understand what we need for our SageMaker endpoint. For this use-case we'll deploy a pre-trained Sklearn model onto a SageMaker Real-Time Endpoint. Utilizing the following script we can quickly run a Linear Regression Model and produce a model data artifact.
After running this script you should end up with a model.joblib file, this contains your model metadata that is necessary for deployment. Generally with SageMaker Inference you also need an inference script that controls your pre/post-processing with custom code.
SageMaker Inference expects this model data and inference script to be packaged into a tarball, so we run the following script to convert these resources to the expected format and upload it to an S3 Bucket.
Now that we have our model artifact we can proceed to working with CloudFormation.
Defining CloudFormation Parameters
A CloudFormation template is a yaml or json file in which you define all your infrastructure. CloudFormation Parameters allow for you to inject custom values into your templates. You can then reference these parameters as you define your resources. For this case we provide default values, but you can override them via the CLI if you wish. For SageMaker Endpoints the following parameters are the ones that we have to define (Note you can name these anything you want just make sure to reference them as you name them):
- RoleARN: The SageMaker Execution Role which you give permissions to. Replace the default role value with your IAM role you have defined for the SageMaker resource.
- ImageURI: This is the Image URI that you can retrieve from the existing Deep Learning Containers AWS provides or your own Custom Image URI from ECR if you did a BYOC deployment. For this example we have an Sklearn model so we have retrieved the appropriate version for that managed container.
- ModelData: This is the model artifact and inference script we packaged together and uploaded to an S3 Bucket.
- InstanceType & InstanceCount: Your hardware that you are defining for your endpoint, change this appropriately for Serverless Inference (Memory Size & Concurrency).
We now have the parameters necessary for deploying a SageMaker Real-Time Endpoint, let's focus on defining our resources next.
CloudFormation Resources & Deployment
To deploy a SageMaker Endpoint there are three main entities that go hand in hand: SageMaker Model, SageMaker Endpoint Configuration, and SageMaker Endpoint. The SageMaker Model Entity defines the model data and image that we are using for deployment and is our first resource we create.
Notice that we are referencing our ImageURI and Model Data parameters that we have defined. Next we do much of the same with Endpoint Configuration where we define the instance configuration behind our endpoint.
Now we point to this resource while defining the last step in our SageMaker endpoint.
Using the AWS CLI we can deploy this CloudFormation Stack by pointing towards our yaml file.
We can verify this in the Console and after a few minutes you should see all three resources created.


Additional Resources & Conclusion
You can find the entire code for the example at the link above. AWS CloudFormation is an extremely powerful tool that makes it really easy to capture your AWS resources in a central template. Without Infrastructure as Code it becomes really difficult to iterate in the software lifecycle and that applies to ML services such as SageMaker as well. I hope this article was useful for those interested in SageMaker, CloudFormation, and AWS in general.
If you enjoyed this article feel free to connect with me on LinkedIn and subscribe to my Medium Newsletter. If you're new to Medium, sign up using my Membership Referral.