Agentic Disease Spread CatBoost Regressor Model for Pollutant effects with Beta
Model Description
This is a CatBoost Regressor model trained for regression tasks on tabular data created by simulations from Agent-based Implementations for Infectious Disease Transmission Models simulator. CatBoost (Categorical Boosting) is a gradient boosting library developed by Yandex that excels at handling categorical features natively without extensive preprocessing.
- Model type: Gradient Boosting Decision Trees
- Task: Regression
- License: MIT
- Repository: https://github.com/AlekseiAgarkov/AgenticInfectiousDiseaseTransmissionModels
Intended Uses & Limitations
Intended Use
- Regression analysis on structured/tabular disease spread agentic simulations data
- Scenarios with pollutant effects
Limitations
- Primarily designed for pollutant effects checking
- Not suitable for unstructured data (images, text, audio)
How to Use
Installation
pip install catboost
Basic Usage
import pickle
import pandas as pd
from catboost import CatBoostRegressor
# Load the model
with open('catboost_model.pkl', 'rb') as f:
model = pickle.load(f)
# Prepare your data (as pandas DataFrame)
# Ensure features match training data format
data = pd.DataFrame({
'beta': [value0],
'initially_infected': [value1],
'lowest_immunity': [value2],
'highest_immunity': [value3],
'mask_beta_penalty': [value4],
'pollutant_immunity_reduction': [value5]
})
# Make prediction
prediction = model.predict(data)
Using with CatBoost directly
from catboost import CatBoostRegressor
# Load saved model
model = CatBoostRegressor()
model.load_model('catboost_model.cbm')
# Make predictions
predictions = model.predict(data)
Training Procedure
Training Data
Data details:
- Source: https://raw.githubusercontent.com/AlekseiAgarkov/MIFIML-2-Sem1-M25-525-Project-Practice/refs/heads/main/data/sim_data_metrics_20251214.csv
- Features:
- beta: float - infectivity coefficient (
beta) - initially_infected: int - number of initially infected agents
- lowest_immunity: float - lowest possible immunity in simulation
- highest_immunity: float - highest possible immunity in simulation
- mask_beta_penalty: float - beta reduction coefficient for a mask weared at contact
- pollutant_immunity_reduction: float - immunity reduction coefficient for pollutant
- beta: float - infectivity coefficient (
- Target variable: 'infected_90d'
- Samples: 2000
- Preprocessing: None
Training Hyperparameters
iterations: 10000
learning_rate: 0.025
depth: 5
loss_function: 'RMSE'
cat_features: None
verbose: False
early_stopping_rounds: 500
random_seed: 42
Evaluation Results
| Metric | Value |
|---|---|
| Train RMSE | 476.41 |
| Validation RMSE | 535.55 |
Feature Information
| Feature Name | Type | Description | Importance |
|---|---|---|---|
| beta | Numeric | infectivity coefficient (beta) |
80.79 |
| initially_infected | Numeric | number of initially infected agents | 17.94 |
| lowest_immunity | Numeric | lowest possible immunity in simulation | 0.17 |
| highest_immunity | Numeric | highest possible immunity in simulation | 0.42 |
| mask_beta_penalty | Numeric | beta reduction coefficient for a mask weared at contact | 0.53 |
| pollutant_immunity_reduction | Numeric | immunity reduction coefficient for pollutant | 0.15 |
Model Architecture
- Algorithm: Gradient Boosting on Decision Trees
- Number of trees: 188
- Tree depth: 5
- Learning rate: 0.025
- Loss function: RMSE
- Feature importance type: default
Model Card Authors
Aleksei Agarkov / MEPhI
Model Card Contact
Disclaimer
This model is provided "as is" without warranty of any kind. Users should evaluate the model's suitability for their specific use case and perform appropriate testing before deployment in production environments.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support