Agentic Disease Spread CatBoost Regressor Model for Pollutant effects with Beta

Model Description

This is a CatBoost Regressor model trained for regression tasks on tabular data created by simulations from Agent-based Implementations for Infectious Disease Transmission Models simulator. CatBoost (Categorical Boosting) is a gradient boosting library developed by Yandex that excels at handling categorical features natively without extensive preprocessing.

Model type: Gradient Boosting Decision Trees
Task: Regression
License: MIT
Repository: https://github.com/AlekseiAgarkov/AgenticInfectiousDiseaseTransmissionModels

Intended Uses & Limitations

Intended Use

Regression analysis on structured/tabular disease spread agentic simulations data
Scenarios with pollutant effects

Limitations

Primarily designed for pollutant effects checking
Not suitable for unstructured data (images, text, audio)

How to Use

Installation

pip install catboost

Basic Usage

import pickle
import pandas as pd
from catboost import CatBoostRegressor

# Load the model
with open('catboost_model.pkl', 'rb') as f:
    model = pickle.load(f)

# Prepare your data (as pandas DataFrame)
# Ensure features match training data format
data = pd.DataFrame({
    'beta': [value0],
    'initially_infected': [value1],
    'lowest_immunity': [value2],
    'highest_immunity': [value3],
    'mask_beta_penalty': [value4],
    'pollutant_immunity_reduction': [value5]
})

# Make prediction
prediction = model.predict(data)

Using with CatBoost directly

from catboost import CatBoostRegressor

# Load saved model
model = CatBoostRegressor()
model.load_model('catboost_model.cbm')

# Make predictions
predictions = model.predict(data)

Training Procedure

Training Data

Data details:

Source: https://raw.githubusercontent.com/AlekseiAgarkov/MIFIML-2-Sem1-M25-525-Project-Practice/refs/heads/main/data/sim_data_metrics_20251214.csv
Features:
- beta: float - infectivity coefficient (beta)
- initially_infected: int - number of initially infected agents
- lowest_immunity: float - lowest possible immunity in simulation
- highest_immunity: float - highest possible immunity in simulation
- mask_beta_penalty: float - beta reduction coefficient for a mask weared at contact
- pollutant_immunity_reduction: float - immunity reduction coefficient for pollutant
Target variable: 'infected_90d'
Samples: 2000
Preprocessing: None

Training Hyperparameters

iterations: 10000
learning_rate: 0.025
depth: 5
loss_function: 'RMSE'
cat_features: None
verbose: False
early_stopping_rounds: 500
random_seed: 42

Evaluation Results

Metric	Value
Train RMSE	476.41
Validation RMSE	535.55

Feature Information

Feature Name	Type	Description	Importance
beta	Numeric	infectivity coefficient (`beta`)	80.79
initially_infected	Numeric	number of initially infected agents	17.94
lowest_immunity	Numeric	lowest possible immunity in simulation	0.17
highest_immunity	Numeric	highest possible immunity in simulation	0.42
mask_beta_penalty	Numeric	beta reduction coefficient for a mask weared at contact	0.53
pollutant_immunity_reduction	Numeric	immunity reduction coefficient for pollutant	0.15

Model Architecture

Algorithm: Gradient Boosting on Decision Trees
Number of trees: 188
Tree depth: 5
Learning rate: 0.025
Loss function: RMSE
Feature importance type: default

Model Card Authors

Aleksei Agarkov / MEPhI

Model Card Contact

[email protected]

Disclaimer

This model is provided "as is" without warranty of any kind. Users should evaluate the model's suitability for their specific use case and perform appropriate testing before deployment in production environments.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support