Getting Started with Delta with Classical ML
Adding Delta to Your Project
To add the Delta library to your Rust project, you need to include it in your
Cargo.toml
file. Follow these steps:
- Open your project’s
Cargo.toml
file. - Add the following line under
[dependencies]
:
[dependencies]deltaml = "0.1.0"
Currently, we have published Delta to deltaml, but note that this is still experimental in alpha stage so things might break in the upcoming iterations.
Linear Regression with California Housing Dataset
This example shows how to build, train, and evaluate a Linear Regression model to predict house prices using the California Housing Dataset.
use deltaml::{ algorithms::LinearRegression, data::{CsvHeadersLoader, load_data}, losses::MSE, optimizers::BatchGradientDescent, scalers::StandardScaler,};
#[tokio::main]async fn main() -> Result<(), Box<dyn std::error::Error>> { // Load the training and test datasets let (x_train, y_train) = load_data::<CsvHeadersLoader, _>("../train_data.csv") .expect("Failed to load train_data.csv"); let (x_test, y_test) = load_data::<CsvHeadersLoader, _>("../test_data.csv").expect("Failed to load test_data.csv");
// Instantiate the model with a builder pattern let mut model = LinearRegression::new() .optimizer(BatchGradientDescent) .loss_function(MSE) .scaler(StandardScaler::new()) .normalize(true) .build();
// Train the model let learning_rate = 0.01; let epochs = 1000; model.fit(&x_train, &y_train, learning_rate, epochs)?;
// Make predictions on test data let predictions = model.predict(&x_test)?;
println!("Predictions for new data: {:?}", predictions);
// Evaluate the model using Mean Squared Error let test_loss = model.calculate_loss(&predictions, &y_test)?; println!("Test Loss after training: {:.6}", test_loss);
Ok(())}
Step-by-Step Explanation
1. Import Dependencies
use deltaml::{ algorithms::LinearRegression, data::{CsvHeadersLoader, load_data}, losses::MSE, optimizers::BatchGradientDescent, scalers::StandardScaler,};
LinearRegression
: The algorithm for regression tasks, predicting continuous values.CsvHeadersLoader
: A utility to load CSV data, assuming the first row contains headers.load_data
: A function to read features (x
) and targets (y
) from a CSV file.MSE
: Mean Squared Error loss function, used to evaluate the model’s performance.BatchGradientDescent
: The optimization algorithm to minimize the loss during training.StandardScaler
: A preprocessing tool to standardize features by removing the mean and scaling to unit variance.
2. Load Data
let (x_train, y_train) = load_data::<CsvHeadersLoader, _>("../train_data.csv") .expect("Failed to load train_data.csv");let (x_test, y_test) = load_data::<CsvHeadersLoader, _>("../test_data.csv") .expect("Failed to load test_data.csv");
- The
load_data
function reads the California Housing Dataset from CSV files, returning feature matrices (x_train
,x_test
) asArray2<f64>
and target vectors (y_train
,y_test
) asArray1<f64>
. CsvHeadersLoader
assumes the CSV has headers (e.g., column names) and skips them, loading numerical data. If you have a dataset without headers, you can useCsvLoader
instead.- The dataset should be pre-split into training and test sets, with features (e.g., house age, rooms) and a target (e.g., house price).
3. Instantiate the Model
let mut model = LinearRegression::new() .optimizer(BatchGradientDescent) .loss_function(MSE) .scaler(StandardScaler::new()) .normalize(true) .build();
LinearRegression::new()
creates a builder for configuring the model..optimizer(BatchGradientDescent)
sets the optimization algorithm to Batch Gradient Descent, which updates model weights using the entire training set..loss_function(MSE)
specifies Mean Squared Error as the loss to minimize during training..scaler(StandardScaler::new())
applies feature standardization, ensuring features have zero mean and unit variance..normalize(true)
enables normalization, which is typically used with scaling to preprocess data..build()
constructs theLinearRegression
model with the specified configuration.
4. Train the Model
let learning_rate = 0.01;let epochs = 1000;model.fit(&x_train, &y_train, learning_rate, epochs)?;
fit
trains the model on the training data (x_train
,y_train
).learning_rate = 0.01
controls the step size for weight updates during optimization.epochs = 1000
specifies the number of iterations over the training data.- The
?
operator propagates any errors (e.g., dimension mismatches, invalid data).
5. Make Predictions
let predictions = model.predict(&x_test)?;
predict
generates predictions for the test features (x_test
), returning anArray1<f64>
of predicted house prices.- The model applies the learned weights and any preprocessing (e.g., scaling) to the test data.
6. Evaluate the Model
let test_loss = model.calculate_loss(&predictions, &y_test)?;println!("Test Loss after training: {:.6}", test_loss);
calculate_loss
computes the Mean Squared Error between the predictions and actual test targets (y_test
).- The loss quantifies the model’s performance, with lower values indicating better predictions.
- The result is printed with six decimal places for readability.
Expected Output:
- Predictions: A vector of predicted house prices (e.g.,
[206855.816, 123456.789, ...]
). - Test Loss: A single value indicating the model’s error on the test set.