lc2004 commited on
Commit
a21b31d
·
verified ·
1 Parent(s): 2d1a898

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -6
README.md CHANGED
@@ -1,10 +1,151 @@
1
  ---
 
 
 
 
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ license: mit
6
  tags:
7
+ - tokenizer
8
+ - time-series
9
+ - bitcoin
10
+ - btc
11
+ - cryptocurrency
12
+ - numeric-encoding
13
  ---
14
 
15
+ # BTCUSDT 1-Hour Tokenizer
16
+
17
+ ## Tokenizer Description
18
+
19
+ This is a specialized tokenizer designed for **time-series cryptocurrency data encoding**, specifically fine-tuned for BTCUSDT (Bitcoin/USDT) 1-hour candlestick data. It converts numerical trading data (OHLCV - Open, High, Low, Close, Volume) into token representations suitable for transformer-based models.
20
+
21
+ ### Tokenizer Details
22
+
23
+ - **Type**: Numeric Time-Series Tokenizer
24
+ - **Vocabulary Size**: Model-specific
25
+ - **Input Format**: BTCUSDT candlestick data (OHLCV)
26
+ - **Output**: Token sequences for model inference
27
+ - **Framework**: Hugging Face Transformers compatible
28
+
29
+ ## Purpose
30
+
31
+ This tokenizer is used to preprocess historical BTCUSDT 1-hour trading data before feeding it into the fine-tuned prediction model. It handles:
32
+
33
+ - **Price normalization**: Converts raw price values to a standardized token space
34
+ - **Volume encoding**: Encodes trading volume information
35
+ - **Temporal sequences**: Preserves time-series relationships in data
36
+ - **Model compatibility**: Ensures proper input format for the BTCUSDT 1h fine-tuned model
37
+
38
+ ## How to Use
39
+
40
+ ### Installation
41
+
42
+ ```bash
43
+ pip install transformers torch
44
+ ```
45
+
46
+ ### Loading the Tokenizer
47
+
48
+ ```python
49
+ from transformers import AutoTokenizer
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained("your-huggingface-username/BTCUSDT-1h-tokenizer")
52
+ ```
53
+
54
+ ### Tokenizing BTCUSDT Data
55
+
56
+ ```python
57
+ # Example: Tokenize BTCUSDT candlestick data
58
+ candlestick_data = "BTCUSDT 1h: Open=45230.5, High=45600.2, Low=45100.3, Close=45450.8, Volume=2345.67"
59
+
60
+ tokens = tokenizer.encode(candlestick_data, return_tensors="pt")
61
+ print(tokens)
62
+
63
+ # Decode tokens back to readable format
64
+ decoded = tokenizer.decode(tokens[0])
65
+ print(decoded)
66
+ ```
67
+
68
+ ### Integration with Model
69
+
70
+ ```python
71
+ from transformers import AutoTokenizer, AutoModelForCausalLM
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained("your-huggingface-username/BTCUSDT-1h-tokenizer")
74
+ model = AutoModelForCausalLM.from_pretrained("your-huggingface-username/BTCUSDT-1h-finetuned")
75
+
76
+ # Prepare data
77
+ historical_data = "OHLCV data here..."
78
+ tokens = tokenizer.encode(historical_data, return_tensors="pt")
79
+
80
+ # Get predictions
81
+ outputs = model.generate(tokens, max_length=50)
82
+ predictions = tokenizer.decode(outputs[0])
83
+ ```
84
+
85
+ ## Technical Specifications
86
+
87
+ - **Compatible with**: BTCUSDT 1-Hour Fine-tuned Model
88
+ - **Data Format**: Open, High, Low, Close, Volume (OHLCV)
89
+ - **Time Granularity**: 1-hour candlesticks
90
+ - **Supported Operations**: Encoding, decoding, tokenization
91
+ - **Framework**: PyTorch / TensorFlow compatible
92
+
93
+ ## Training Data
94
+
95
+ - **Dataset**: BTCUSDT 1-hour historical candles
96
+ - **Source**: Cryptocurrency exchange data
97
+ - **Time Coverage**: Historical trading data up to October 2025
98
+ - **Data Points**: Thousands of 1-hour candles
99
+
100
+ ## Limitations
101
+
102
+ - **Specialized for BTCUSDT**: Not recommended for other cryptocurrency pairs or timeframes
103
+ - **1-Hour Granularity**: Designed specifically for 1-hour candlestick data
104
+ - **Numeric Focus**: Optimized for OHLCV data format
105
+ - **Normalization**: Assumes price ranges similar to historical BTCUSDT data
106
+
107
+ ## Usage Notes
108
+
109
+ ⚠️ **Important**:
110
+ - This tokenizer should be used **exclusively with the BTCUSDT 1h fine-tuned model**
111
+ - Do not use this tokenizer with other models or datasets
112
+ - Ensure your input data follows the OHLCV format
113
+ - Maintain consistent data normalization across datasets
114
+
115
+ ## Related Models
116
+
117
+ - **Fine-tuned Model**: [BTCUSDT 1h Fine-tuned Model](https://huggingface.co/your-huggingface-username/BTCUSDT-1h-finetuned)
118
+ - **Base Model**: [Kronos](https://huggingface.co/antonop/Kronos-1B-MSN)
119
+
120
+ ## License
121
+
122
+ This tokenizer is released under the **MIT License**.
123
+
124
+ ## Citation
125
+
126
+ If you use this tokenizer, please cite:
127
+
128
+ ```bibtex
129
+ @misc{btcusdt_tokenizer_2025,
130
+ title={BTCUSDT 1-Hour Tokenizer},
131
+ author={Your Name},
132
+ year={2025},
133
+ publisher={Hugging Face},
134
+ howpublished={\url{https://huggingface.co/your-username/BTCUSDT-1h-tokenizer}}
135
+ }
136
+ ```
137
+
138
+ ## Acknowledgments
139
+
140
+ - Base framework: [Hugging Face Transformers](https://huggingface.co/transformers/)
141
+ - Compatible with: [BTCUSDT 1h Fine-tuned Model](https://huggingface.co/your-huggingface-username/BTCUSDT-1h-finetuned)
142
+
143
+ ## Contact & Support
144
+
145
+ For questions:
146
+ - GitHub: [https://github.com/Liucong-JunZi/Kronos-Btc-finetune](https://github.com/Liucong-JunZi/Kronos-Btc-finetune)
147
+
148
+
149
+ ---
150
+
151
+ **Last Updated**: October 20, 2025