Streamlining Data Science & ML: AWS Deep Learning Containers Unify SageMaker Experience

As data scientists’ toolsets continue to expand, the need for a consistent, reproducible environment to manage dependencies, ensure security, and promote collaboration becomes paramount. Amazon Web Services (AWS) Deep Learning Containers facilitate this need by offering pre-built Docker images for various frameworks. However, the focus has shifted toward creating a unified experience for developers. The…

Written by

Casey Jones

Published on

June 9, 2023
BlogIndustry News & Trends

As data scientists’ toolsets continue to expand, the need for a consistent, reproducible environment to manage dependencies, ensure security, and promote collaboration becomes paramount. Amazon Web Services (AWS) Deep Learning Containers facilitate this need by offering pre-built Docker images for various frameworks. However, the focus has shifted toward creating a unified experience for developers. The newly introduced SageMaker open-source distribution aims to meet this growing demand by providing a seamless experience for machine learning (ML) practitioners of all expertise levels.

Announced at the 2023 JupyterCon event, the SageMaker open-source distribution caters to popular data science, ML, and visualization packages and libraries like TensorFlow, PyTorch, Scikit-learn, Pandas, and Matplotlib. The distribution is conveniently available for download from the Amazon ECR Public Gallery.

The power of the SageMaker open-source distribution is best showcased through an example. Imagine training an image classification model utilizing PyTorch on the KMNIST dataset. With SageMaker, the process of transitioning from local experimentation to production jobs becomes a breeze.

Before diving in, certain prerequisites need to be met, including a Docker installation, an active AWS account with administrator permissions, an environment with AWS CLI and Docker installed, and an existing SageMaker domain.

To set up the local environment, developers can employ the open-source distribution directly on their local machines. By running specific commands in the terminal, JupyterLab springs to life, ready for experimentation. Depending on the type of machine used, developers can select the latest-gpu tag for GPU-supported devices or replace the ECRIMAGEID if necessary.

After setting up the local environment, it’s time to embrace the full power of SageMaker Studio – an end-to-end integrated development environment designed specifically for ML projects. Launching a compute instance within SageMaker Studio is a walk in the park, thanks to the user-friendly Studio Launcher. With the example repository cloned, developers can open the notebook and begin transferring their work from the local environment to SageMaker Studio.

Once the notebook environment is set up, developers can dive into data preparation, move on to defining and training their neural network, and finally, test their model’s performance by examining training and test loss.

One of the standout features of the SageMaker open-source distribution is its ability to schedule notebooks as jobs on the platform. This allows developers to train models at specific intervals or in response to particular events. Observing the progress of these jobs is simple through the use of job statuses and logs.

In summary, the introduction of the SageMaker open-source distribution empowers data scientists and ML developers to experiment in local environments while seamlessly promoting jobs on the robust SageMaker platform. By adopting this unified experience, developers benefit from a consistent and reproducible platform, perfect for crafting innovative solutions in the ever-evolving realm of data science and machine learning.