Dataset Support

Supported Datasets

Currently, the project supports the following datasets:

Image Datasets

MNIST

The MNIST dataset is a large database of handwritten digits that is commonly used for training various image processing systems. It includes 60,000 training images and 10,000 testing images.

CIFAR-10

The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. It is widely used for training machine learning and computer vision algorithms.

ImageNetV2

The ImageNetV2 dataset contains 10,000 images across 1,000 classes, designed to mirror the structure and labeling of the original ImageNet dataset. It was created to evaluate model generalization and robustness by testing on new images while maintaining the same class definitions. ImageNetV2 is widely used for benchmarking large-scale image classification models.

Custom Datasets

We do not support for custom datasets (yet!), however we plan to include that in the initial alpha release.

Usage Examples

To see examples of how to use these datasets in your projects, refer to the examples/image_classification directory. This directory contains example scripts demonstrating how to load and preprocess the MNIST and CIFAR-10 datasets, as well as how to train and evaluate models using these datasets.

Image Datasets

Example: Loading MNIST Dataset

use deltaml::common::DatasetOps;
use deltaml::data::MnistDataset;

#[tokio::main]
async fn main() {
    // Load the train and test data
    let mut train_data = MnistDataset::load_train().await;
    let test_data = MnistDataset::load_test().await;
}