ReZero is All You Need: Fast Convergence at Large Depth
Paper
β’
2003.04887
β’
Published
A Convolutional Neural Network (CNN) trained on ImageNet-1k with PReLU activation.
Repository: github.com/chrisjob1021/model-examples
This is a ResNet-style CNN architecture featuring:
cnn/ directory in the repository for training scripts and model implementation| Model | Top-1 | Top-5 | Parameters | Year | Notes |
|---|---|---|---|---|---|
| AlexNet | 57.0% | 80.3% | 60M | 2012 | First deep CNN |
| VGG-16 | 71.5% | 90.1% | 138M | 2014 | Deep with small filters |
| ResNet-50 | 76.0% | 93.0% | 25M | 2015 | Baseline |
| ResNet-152 | 78.3% | 94.3% | 60M | 2015 | Deeper variant |
| Inception-v3 | 78.0% | 93.9% | 24M | 2015 | Multi-scale |
| This model | 78.01% | 93.89% | ~23M | 2025 | PReLU |
Key Achievement: +2.01% improvement over ResNet-50 baseline
from prelu_cnn import CNN
# Load the model
model = CNN.from_pretrained(
"your-username/cnn-prelu-imagenet",
use_prelu=True,
num_classes=1000
)
# Use for inference
import torch
from torchvision import transforms
from PIL import Image
# Prepare image
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
image = Image.open("path/to/image.jpg")
input_tensor = transform(image).unsqueeze(0)
# Get prediction
model.eval()
with torch.no_grad():
output = model(input_tensor)
probabilities = torch.nn.functional.softmax(output[0], dim=0)
top5_prob, top5_catid = torch.topk(probabilities, 5)
This model was trained on ImageNet-1k using advanced techniques:
CNN(
conv1: [ConvAct(3 β 64, 7Γ7, stride=2) + MaxPool(3Γ3, stride=2)]
Input: 224Γ224 β 112Γ112 β 56Γ56
conv2_x: 3Γ BottleneckBlock(64 β 64 β 256)
56Γ56 (no downsampling)
conv3_x: 4Γ BottleneckBlock(256 β 128 β 512)
56Γ56 β 28Γ28 (first block stride=2)
conv4_x: 6Γ BottleneckBlock(512 β 256 β 1024)
28Γ28 β 14Γ14 (first block stride=2)
conv5_x: 3Γ BottleneckBlock(1024 β 512 β 2048)
14Γ14 β 7Γ7 (first block stride=2)
avgpool: AdaptiveAvgPool2d(1Γ1)
7Γ7 β 1Γ1
fc: Linear(2048 β 1000)
)
Total Layers: 50 (1 + 3Γ3 + 4Γ3 + 6Γ3 + 3Γ3 = 49 conv + 1 fc)
If you use this model, please cite:
@misc{cnn_prelu_imagenet,
title={cnn-prelu-imagenet: CNN with PReLU for ImageNet Classification},
year={2025},
publisher={HuggingFace Hub},
}
This model uses PReLU activation. Please also cite the original paper:
@inproceedings{he2015delving,
title={Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={1026--1034},
year={2015}
}
This model is released under the MIT License.