YSI Predictor — Yield Sooting Index Model

📌 Overview

This repository contains a machine learning model for predicting the Yield Sooting Index (YSI) of single-component fuel molecules directly from their SMILES representation.

YSI is a soot formation metric used in combustion science.

Lower YSI → cleaner combustion
Highly relevant for diesel replacement fuels, bio-fuels, and oxygenated fuels.

This model supports:

molecular design and optimization,
genetic algorithms (e.g., CREM),
Pareto optimization (CN vs YSI),
rapid candidate screening.

🧠 How It Works

The prediction pipeline uses:

RDKit — molecule parsing
Mordred — 2D/3D molecular descriptors
FeatureSelector — dimensionality reduction
Tree-based regression model trained on experimental YSI values

Prediction flow:

Input SMILES → RDKit Molecule
Mordred descriptors generated
Feature selection applied
YSI predicted using trained regressor

Two model artifacts are included:

model.joblib # trained regressor selector.joblib # feature selector used during training

🧬 Training Data

The model was trained using a curated dataset of experimentally measured YSI values, covering a diverse set of fuel molecule structures:

Includes:

linear alkanes
branched alkanes
cyclic hydrocarbons
aromatics
oxygenated species (ethers, esters)

YSI range in dataset: ≈ 3 → 80

📊 Performance

Performance was evaluated on both training and held-out test sets.

⭐ Training Performance

Metric	Score
RMSE	6.9661
MAE	4.0581
R²	0.9309

🧭 Test Performance

Metric	Score
RMSE	5.9667
MAE	3.8324
R²	0.9440
MAPE	18.38%

The test R² = 0.9440 shows strong predictive accuracy.

📉 Generalization Check

Metric	Value
Train RMSE	6.9661
Test RMSE	5.9667
Δ (Test − Train)	−0.9994

➡️ The negative Δ indicates no overfitting, and even better test performance due to more stable distribution.

🚀 Usage

Below is a minimal example showing how to use the model in Python.

The feature calculation must match the training pipeline.

import joblib
from rdkit import Chem
from shared_features import featurize_df, FeatureSelector

# Load model & selector
model = joblib.load("model.joblib")
selector = joblib.load("selector.joblib")

def predict_ysi(smiles: str):
    mol = Chem.MolFromSmiles(smiles)
    df = featurize_df([smiles])
    X = selector.transform(df)
    y = model.predict(X)
    return float(y[0])

print(predict_ysi("CCCCCCC"))

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

SalZa2004
/

YSI_Predictor