Revolutionizing Retail: Crafting High-Quality Clothing Datasets with FiftyOne & Amazon SageMaker Ground Truth

Revolutionizing Retail: Crafting High-Quality Clothing Datasets with FiftyOne & Amazon SageMaker Ground Truth

Revolutionizing Retail: Crafting High-Quality Clothing Datasets with FiftyOne & Amazon SageMaker Ground Truth

As Seen On

Building High-Quality Labeled Datasets for Retail Industry

Developing a high-quality dataset is vital for any retail company seeking to enhance customer experience, particularly with the use of a mobile app offering personalized clothing recommendations. With the goal of repurposing an existing dataset through data cleaning, preprocessing, and pre-labeling, this article will provide a comprehensive guide on effectively harnessing the Fashion200K dataset. Utilizing the power of FiftyOne, an open-source toolkit by Voxel51, and Amazon SageMaker Ground Truth, a managed data labeling service, a retail-focused, superior-quality dataset can be built with ease.

Solution Overview

Amazon SageMaker Ground Truth offers a solution for developing high-quality datasets through its managed data labeling service. By leveraging Ground Truth, organizations can create accurate and reliable machine learning models. On the other hand, FiftyOne by Voxel51 is a versatile, open-source toolkit that facilitates the curation, visualization, and evaluation of computer vision datasets, bolstering the process even further.

Process Steps

  1. Visualize the dataset in FiftyOne

Before diving into preprocessing or data cleaning, the first crucial step is to visualize and fully comprehend the dataset using FiftyOne. By assessing the structure, quality, and existing labels, it becomes easier to identify potential areas for improvement and optimization.

  1. Clean the dataset with filtering and image deduplication in FiftyOne

A clean dataset is fundamental for training high-performing models. Using FiftyOne’s filtering and image deduplication capabilities, redundant and irrelevant data can be seamlessly removed. This results in a refined, focused dataset that positively impacts the outcomes of any machine learning endeavor.

  1. Pre-label the cleaned data with zero-shot classification in FiftyOne

To streamline the labeling process, pre-labeling the data using zero-shot classification in FiftyOne is highly recommended. This ensures the generation of an initial set of accurate labels that serve as a solid foundation for subsequent labeling efforts.

  1. Label the smaller curated dataset with Amazon SageMaker Ground Truth

Amazon SageMaker Ground Truth steps in to perform the critical task of accurately and reliably labeling the dataset. By using Ground Truth, organizations can create tailored, high-quality datasets that contribute to superior machine learning models and customer experiences.

  1. Inject labeled results from Ground Truth into FiftyOne and review labeled results in FiftyOne

Lastly, it’s imperative to review the labeled results to verify their quality and reliability. By injecting the labeled data from Ground Truth back into FiftyOne, a thorough evaluation can be conducted to ensure dataset integrity and consistency.

Use Case Overview

Consider a retail company aiming to build a mobile app that delivers personalized clothing recommendations. To achieve this, they must accurately identify various articles of clothing for customers based on their preferences. Leveraging an existing dataset such as Fashion200K saves time and resources while still yielding effective results.


FiftyOne and Amazon SageMaker Ground Truth offer unparalleled advantages when building high-quality labeled datasets for the retail industry. By following the steps outlined above, retail companies can harness the power of these tools to provide customers with personalized clothing recommendations through mobile apps. Furthermore, the applications of these tools extend beyond the retail industry, potential in a wide range of industries and use cases.

Casey Jones Avatar
Casey Jones
10 months ago

Why Us?

  • Award-Winning Results

  • Team of 11+ Experts

  • 10,000+ Page #1 Rankings on Google

  • Dedicated to SMBs

  • $175,000,000 in Reported Client

Contact Us

Up until working with Casey, we had only had poor to mediocre experiences outsourcing work to agencies. Casey & the team at CJ&CO are the exception to the rule.

Communication was beyond great, his understanding of our vision was phenomenal, and instead of needing babysitting like the other agencies we worked with, he was not only completely dependable but also gave us sound suggestions on how to get better results, at the risk of us not needing him for the initial job we requested (absolute gem).

This has truly been the first time we worked with someone outside of our business that quickly grasped our vision, and that I could completely forget about and would still deliver above expectations.

I honestly can't wait to work in many more projects together!

Contact Us


*The information this blog provides is for general informational purposes only and is not intended as financial or professional advice. The information may not reflect current developments and may be changed or updated without notice. Any opinions expressed on this blog are the author’s own and do not necessarily reflect the views of the author’s employer or any other organization. You should not act or rely on any information contained in this blog without first seeking the advice of a professional. No representation or warranty, express or implied, is made as to the accuracy or completeness of the information contained in this blog. The author and affiliated parties assume no liability for any errors or omissions.