| # NSQL-Llama-2-70B | |
| ## Model Description | |
| NSQL is a family of autoregressive open-source large foundation models (FMs) designed specifically for SQL generation tasks. | |
| In this repository we are introducing a new member of NSQL, NSQL-Llama-2-70B. It's based on Meta's original [Llama-2 70B model](https://huggingface.co/meta-llama/Llama-2-70b) and further pre-trained on a dataset of general SQL queries and then fine-tuned on a dataset composed of text-to-SQL pairs. | |
| ### Basic Information | |
| <!-- Provide the basic links for the model. --> | |
| - **Blog Post**: [Link](TBA) | |
| - **Discord**: [Link](TBA) | |
| - **HF Hosting**: [Chat with me!](TBA) | |
| ## Training Data | |
| The general SQL queries are the SQL subset from [The Stack](https://huggingface.co/datasets/bigcode/the-stack), containing 1M training samples. The labeled text-to-SQL pairs come from the NSText2SQL dataset (https://huggingface.co/datasets/NumbersStation/NSText2SQL). | |
| ## Evaluation Data | |
| We evaluate our models on three text-to-SQL benchmarks: Spider, Bird, and text2sql. | |
| ## Training Procedure | |
| NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs. | |
| ### Hyperparameters | |
| **Continous pretraining on Stack-SQL dataset** | |
| - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU) | |
| - Optimizer: AdamW | |
| - Epochs: 2 | |
| - Global Batch size: 256 | |
| - Batch tokens: 256 * 4096 = 1,048,576 tokens | |
| - Learning Rate: 1e-5 | |
| - Learning Rate Scheduler: Fixed | |
| - Warmup Steps: 0 | |
| - Weight decay: 0.1 | |
| **Finetuning on NSText2SQL dataset** | |
| - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU) | |
| - Optimizer: AdamW | |
| - Epochs: 10 | |
| - Global Batch size: 64 | |
| - Batch tokens: 64 * 4096 = 262,144 tokens | |
| - Learning Rate: 1e-5 | |
| - Learning Rate Scheduler: Cosine Schedule with Warmup | |
| - Warmup Steps: 0 | |
| - End Learning Ratio: 0.1 | |
| - Weight decay: 0.1 | |
| ## Intended Use and Limitations | |
| The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries. | |
| ## How to Use | |
| Example 1: | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/nsql-Llama-2-70B") | |
| model = AutoModelForCausalLM.from_pretrained("sambanovasystems/nsql-Llama-2-70B", torch_dtype=torch.bfloat16) | |
| ``` | |