Update README.md
Browse files
README.md
CHANGED
|
@@ -7,98 +7,74 @@ tags:
|
|
| 7 |
- image-restoration
|
| 8 |
- image-enhancement
|
| 9 |
- denoising
|
|
|
|
| 10 |
- comfyui
|
| 11 |
- pytorch
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
-
This project is a personal experiment created out of curiosity. The main part of the code was generated by an AI assistant, and my task was to set the goal, prepare the data, run the training and evaluate the result. The model is trained to remove artifacts from images (JPEG, noise) and even shows good results.
|
| 16 |
|
|
|
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
## Examples
|
| 23 |
|
| 24 |
-
|
| 25 |
-
![
|
| 26 |
-
![
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
### Training Details
|
| 78 |
-
|
| 79 |
-
The model was trained on a dataset of approximately 30,000 high-quality images, primarily consisting of anime-style art. Instead of using pre-degraded images, the training process generated (degraded, clean) image pairs on-the-fly.
|
| 80 |
-
|
| 81 |
-
* **Architecture**: The network is a `UNetRestorer` built with `ResidualBlock`s for deep feature extraction. To enhance important features, the deeper levels of the encoder utilize the Convolutional Block Attention Module (CBAM). The model employs a final residual connection, learning to predict the difference (`clean - degraded`) rather than the entire clean image.
|
| 82 |
-
|
| 83 |
-
* **Degradation Process**: Each clean image patch was subjected to a sequence of randomly ordered degradations:
|
| 84 |
-
* **JPEG Compression**: A random quality level was chosen between 5 and 85.
|
| 85 |
-
* **Gaussian Noise**: Gaussian noise was added with a standard deviation randomly selected from the range [0.0, 7.0].
|
| 86 |
-
* **Identity Mapping**: With a 20% probability (`--clean-prob 0.2`), the input image was left clean (not degraded). This encourages the model to preserve details when no artifacts are present.
|
| 87 |
-
|
| 88 |
-
* **Training Procedure**:
|
| 89 |
-
* **Optimizer**: AdamW with a learning rate of `2e-4` and weight decay of `1e-4`.
|
| 90 |
-
* **Learning Rate Scheduler**: A Cosine Annealing scheduler with a linear warmup phase of 2000 steps was used.
|
| 91 |
-
* **Batch & Patch Size**: The model was trained with a batch size of 12 using 320x320 pixel patches.
|
| 92 |
-
* **Loss Function**: A comprehensive, multi-component loss function was employed to balance pixel accuracy, structural integrity, and perceptual quality:
|
| 93 |
-
* **Primary Loss**: A weighted sum of `0.7 * CharbonnierLoss` (a smooth L1 variant) and `0.3 * MixL1SSIM`. The `MixL1SSIM` component itself was weighted with `alpha=0.9`, combining L1 loss and a structural similarity term (`0.9*L1 + 0.1*(1-SSIM)`).
|
| 94 |
-
* **Edge Loss**: `GradientLoss` was added with a weight of 0.15 (`--edge-loss-w 0.15`) to penalize blurry edges and promote sharpness.
|
| 95 |
-
* **High-Frequency Error Norm (HFEN)**: To better preserve fine textures and details, `HFENLoss` was included with a weight of 0.12 (`--hfen-w 0.12`).
|
| 96 |
-
* **Identity Loss**: For the 20% of samples where the input was clean, an additional L1 loss with a weight of 0.5 (`--id-loss-w 0.5`) was calculated between the model's output and the input. This forces the network to act as an identity function for high-quality images, preventing it from introducing blur or altering details.
|
| 97 |
-
* **Techniques**: Training was accelerated using Automatic Mixed Precision (AMP) with the `bfloat16` data type. An Exponential Moving Average (EMA) of the model's weights (`decay=0.999`) was maintained to produce a more stable and generalized final model for inference.
|
| 98 |
-
|
| 99 |
-
### Limitations and Potential Issues
|
| 100 |
-
|
| 101 |
-
* The model was trained on primarily consisting of anime-style art. Results on photos, line art, or text may be suboptimal.
|
| 102 |
-
* With very high levels of noise or artifacts beyond the training range, the model may hallucinate details or over-smooth the image.
|
| 103 |
-
* The model might interpret very fine, low-contrast textures (e.g., fabric, sand) as noise and smooth them out. For such cases, use the `blend` parameter in the node to mix back some of the original detail.
|
| 104 |
-
* The model does not correct for other types of degradation, such as motion blur, chromatic aberrations, or optical flaws.
|
|
|
|
| 7 |
- image-restoration
|
| 8 |
- image-enhancement
|
| 9 |
- denoising
|
| 10 |
+
- deblurring
|
| 11 |
- comfyui
|
| 12 |
- pytorch
|
| 13 |
+
- nafnet
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# JPG Artifacts & Noise Remover (V2)
|
|
|
|
| 17 |
|
| 18 |
+
This is a lightweight model designed to remove JPEG compression artifacts, digital noise, and slight blur from images. Ideally suited for anime/illustration art, but works reasonably well on photos.
|
| 19 |
|
| 20 |
+
**Version 2 Update:**
|
| 21 |
+
* **New Architecture:** Switched to a NAFNet-based UNet (State-of-the-Art blocks for restoration).
|
| 22 |
+
* **Larger Dataset:** Trained on ~40,000 high-quality images from Danbooru2024.
|
| 23 |
+
* **Improved Training:** Added Perceptual Loss, HFEN, and blur degradation handling.
|
| 24 |
|
| 25 |
+
> **Which version to pick?** V2 generally restores better, but it is also cautious and may decide an image is already clean. If V2 feels too conservative for a specific image, try the V1 weights (`best_ema_15E.safetensors`) and keep the result you like more.
|
| 26 |
|
| 27 |
## Examples
|
| 28 |
|
| 29 |
+

|
| 30 |
+

|
| 31 |
+

|
| 32 |
+
|
| 33 |
+
## How to use in ComfyUI
|
| 34 |
+
|
| 35 |
+
The model is designed to work with the **[JPG & Noise Remover ComfyUI Node](https://github.com/SnJake/SnJake_JPG_Artifacts_Noise_Cleaner)**.
|
| 36 |
+
|
| 37 |
+
1. **Install the Node:** Follow instructions in the [GitHub Repository](https://github.com/SnJake/SnJake_JPG_Artifacts_Noise_Cleaner).
|
| 38 |
+
2. **Download Weights:** Download the `.pt` or `.safetensors` v2 file from this repository.
|
| 39 |
+
3. **Place Weights:** Put the file in `ComfyUI/models/artifacts_remover/`.
|
| 40 |
+
4. **Select Model:** Select the new weight file in the node settings. Ensure `base_ch` is set to **64**.
|
| 41 |
+
|
| 42 |
+
## Training Details (V2)
|
| 43 |
+
|
| 44 |
+
The goal was to create a restorer that not only removes noise but preserves fine structural details without "plastic" smoothing.
|
| 45 |
+
|
| 46 |
+
### Dataset
|
| 47 |
+
Trained on **40,000 images** from the [Danbooru2024](https://huggingface.co/datasets/deepghs/danbooru2024) dataset (anime/illustration style).
|
| 48 |
+
|
| 49 |
+
### Architecture: NAFNet-based UNet
|
| 50 |
+
The model uses a U-Net structure but replaces standard convolutional blocks with **NAFBlocks** (Nonlinear Activation Free Blocks).
|
| 51 |
+
* **SimpleGate:** Replaces complex activation functions with element-wise multiplication.
|
| 52 |
+
* **LayerNorm2d:** Stabilizes training.
|
| 53 |
+
* **Simplified Channel Attention (SCA):** captures global context efficiently.
|
| 54 |
+
* **Base Channels:** 64
|
| 55 |
+
|
| 56 |
+
### Degradation Pipeline
|
| 57 |
+
The model is trained on "on-the-fly" generated pairs. The degradation pipeline is more aggressive than V1:
|
| 58 |
+
1. **Blur:** Random downscale-upscale (probability 50%, scale down to 0.85x) to simulate soft blur.
|
| 59 |
+
2. **JPEG Compression:** Quality 10 - 85.
|
| 60 |
+
3. **Gaussian Noise:** Standard deviation 0 - 20.0.
|
| 61 |
+
4. **Identity:** 2% of images are left clean to teach the model not to alter good images.
|
| 62 |
+
|
| 63 |
+
### Loss Function
|
| 64 |
+
A composite loss function was used to balance pixel accuracy and perceptual quality:
|
| 65 |
+
* **Pixel Loss:** Charbonnier Loss (0.7) + MixL1SSIM (0.3).
|
| 66 |
+
* **Perceptual Loss (VGG19):** Weight 0.05. Helps generating realistic textures.
|
| 67 |
+
* **HFEN (High-Frequency Error Norm):** Weight 0.1. Enforces edge reconstruction.
|
| 68 |
+
* **Gradient/Edge Loss:** Weight 0.2.
|
| 69 |
+
* **Identity Loss:** Weight 0.02 (applied on clean images).
|
| 70 |
+
|
| 71 |
+
### Training Config
|
| 72 |
+
* **Optimizer:** AdamW (`lr=2e-4`, `wd=1e-4`).
|
| 73 |
+
* **Scheduler:** Cosine Annealing (2000 warmup steps).
|
| 74 |
+
* **Precision:** BFloat16 (AMP).
|
| 75 |
+
* **Patch Size:** 288x288.
|
| 76 |
+
* **Batch Size:** 4 (Accumulated to effective 8).
|
| 77 |
+
|
| 78 |
+
## Limitations
|
| 79 |
+
* Primarily trained on anime/2D art.
|
| 80 |
+
* May struggle with extremely heavy motion blur (since it trained mostly on slight downscale blur).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|