| --- |
| license: other |
| base_model: deepseek-ai/deepseek-coder-1.3b-base |
| tags: |
| - axolotl |
| - generated_from_trainer |
| model-index: |
| - name: deepseek-coder-1.3b-typescript |
| results: [] |
| datasets: |
| - bigcode/the-stack-dedup |
| widget: |
| - text: "class Person {\n constructor(public name:" |
| example_title: "class" |
| - text: "function quickSort" |
| example_title: "function" |
| --- |
| |
| <p align="center"> |
| <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="codegpt-deepseek-typescript.png?raw=true"> |
| </p> |
| <p align="center"><a href="https://codegpt.co/">[CodeGPT.co]</a> | <a href="https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript">[🦙 Ollama]</a> | <a href="https://discord.gg/fKyyJX5pne">[Discord]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt">[VSCode Extension]</a> </p> |
| <hr> |
|
|
| [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
| <details><summary>See axolotl config</summary> |
|
|
| axolotl version: `0.3.0` |
| ```yaml |
| base_model: deepseek-ai/deepseek-coder-1.3b-base |
| model_type: AutoModelForCausalLM |
| trust_remote_code: true |
| load_in_8bit: false |
| load_in_4bit: false |
| strict: false |
| |
| |
| datasets: |
| - path: CodeGPTPlus/typescript-0-500000-seq1024 |
| type: completion |
| field: text |
| |
| |
| val_set_size: 0.001 |
| output_dir: ./fft-out |
| |
| sequence_len: 1024 |
| |
| adapter: |
| lora_model_dir: |
| lora_r: |
| lora_alpha: |
| lora_dropout: |
| lora_target_linear: |
| lora_fan_in_fan_out: |
| lora_modules_to_save: |
| |
| wandb_project: deepseek_1.3_fft |
| wandb_entity: |
| wandb_watch: |
| wandb_name: aws_a10g |
| wandb_log_model: end |
| |
| |
| gradient_accumulation_steps: 2 |
| micro_batch_size: 20 |
| num_epochs: 1 |
| optimizer: adamw_bnb_8bit |
| adam_beta1: 0.9 |
| adam_beta2: 0.999 |
| adam_epsilon: 0.000001 |
| max_grad_norm: 1.0 |
| weight_decay: 0.1 |
| lr_scheduler: cosine |
| learning_rate: 0.00002 |
| train_on_inputs: false |
| group_by_length: false |
| bf16: true |
| fp16: false |
| tf32: false |
| gradient_checkpointing: true |
| early_stopping_patience: |
| resume_from_checkpoint: |
| local_rank: |
| logging_steps: 1 |
| xformers_attention: |
| flash_attention: true |
| |
| loss_watchdog_threshold: 5.0 |
| loss_watchdog_patience: 3 |
| |
| hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript |
| hub_strategy: every_save |
| warmup_ratio: 0.01 |
| evals_per_epoch: 20 |
| saves_per_epoch: 3 |
| debug: |
| deepspeed: |
| |
| fsdp: |
| fsdp_config: |
| special_tokens: |
| bos_token: "<|begin▁of▁sentence|>" |
| eos_token: "<|end▁of▁sentence|>" |
| pad_token: "<|end▁of▁sentence|>" |
| ``` |
|
|
| </details><br> |
|
|
| # deepseek-coder-1.3b-typescript |
|
|
| CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language. |
|
|
| The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion. |
|
|
| This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team. |
|
|
| It achieves the following results on the evaluation set: |
| - Loss: 0.7681 |
|
|
| **Model Developers** CodeGPT Team |
|
|
| **Variations** 1.3B |
|
|
| **Input** Models input text only. |
|
|
| **Output** Models generate text only. |
|
|
| ## How to Use |
| This model is for completion purposes only. Here give some examples of how to use the model. |
|
|
| #### Running the model on a GPU |
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", |
| trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", |
| trust_remote_code=True).cuda() |
| |
| input_text = """<|fim▁begin|>function quickSort(arr: number[]): number[] { |
| if (arr.length <= 1) { |
| return arr; |
| } |
| const pivot = arr[0]; |
| const left = []; |
| const right = []; |
| <|fim▁hole|> |
| return [...quickSort(left), pivot, ...quickSort(right)]; |
| }<|fim▁end|>""" |
| |
| inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_length=256) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Running with Ollama |
| **Model:** https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript |
|
|
| ```ollama run codegpt/deepseek-coder-1.3b-typescript``` |
|
|
| ### Running with Ollama and CodeGPT Autocomplete in VSCode |
|
|
| **Documentation:** https://docs.codegpt.co/docs/tutorial-features/code_autocompletion |
| |
| Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector. |
| |
| Then, write any code or comment in the vscode text editor, and the model will provide you with code suggestions through the CodeGPT code autocomplete. |
| |
| <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="ollama_autocomplete_codegpt.gif"> |
| |
| ### Fill In the Middle (FIM) |
| ```python |
| <|fim▁begin|>function quickSort(arr: number[]): number[] { |
| if (arr.length <= 1) { |
| return arr; |
| } |
| const pivot = arr[0]; |
| const left = []; |
| const right = []; |
| <|fim▁hole|> |
| return [...quickSort(left), pivot, ...quickSort(right)]; |
| }<|fim▁end|> |
| ``` |
| |
| ## Training procedure |
| |
| ### Training hyperparameters |
| |
| The following hyperparameters were used during training: |
| - learning_rate: 2e-05 |
| - train_batch_size: 20 |
| - eval_batch_size: 20 |
| - seed: 42 |
| - gradient_accumulation_steps: 2 |
| - total_train_batch_size: 40 |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 |
| - lr_scheduler_type: cosine |
| - lr_scheduler_warmup_steps: 261 |
| - num_epochs: 1 |
| |
| ### Training results |
| |
| | Training Loss | Epoch | Step | Validation Loss | |
| |:-------------:|:-----:|:-----:|:---------------:| |
| | 1.0745 | 0.0 | 1 | 0.8681 | |
| | 1.2267 | 0.05 | 1308 | 0.8130 | |
| | 1.1594 | 0.1 | 2616 | 0.8018 | |
| | 0.7674 | 0.15 | 3924 | 0.7942 | |
| | 0.6443 | 0.2 | 5232 | 0.7889 | |
| | 0.9155 | 0.25 | 6540 | 0.7847 | |
| | 0.7501 | 0.3 | 7848 | 0.7819 | |
| | 0.8835 | 0.35 | 9156 | 0.7792 | |
| | 0.7261 | 0.4 | 10464 | 0.7769 | |
| | 0.9746 | 0.45 | 11772 | 0.7748 | |
| | 0.6884 | 0.5 | 13080 | 0.7734 | |
| | 0.6104 | 0.55 | 14388 | 0.7722 | |
| | 0.8876 | 0.6 | 15696 | 0.7710 | |
| | 0.9567 | 0.65 | 17004 | 0.7703 | |
| | 0.6915 | 0.7 | 18312 | 0.7696 | |
| | 0.8874 | 0.75 | 19620 | 0.7691 | |
| | 0.6124 | 0.8 | 20928 | 0.7686 | |
| | 0.8147 | 0.85 | 22236 | 0.7684 | |
| | 0.8021 | 0.9 | 23544 | 0.7683 | |
| | 0.8665 | 0.95 | 24852 | 0.7681 | |
| |
| |
| ### Framework versions |
| |
| - Transformers 4.37.0.dev0 |
| - Pytorch 2.0.1+cu118 |
| - Datasets 2.16.1 |
| - Tokenizers 0.15.0 |