Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
6
3
2
calculatortamer
calculatortamer
Follow
lunarflu's profile picture
1 follower
ยท
1 following
AI & ML interests
None yet
Recent Activity
reacted
to
nyuuzyou
's
post
with ๐ฅ
about 11 hours ago
๐จ๐ณ Gitee Code Dataset - The Missing Piece of the Stack https://huggingface.co/datasets/nyuuzyou/gitee-code Gitee is not included in the Software Heritage archive, meaning it is currently missing from datasets like The Stack. This release fills that massive gap, serving as the largest Chinese code dataset and one of the largest code corpuses overall. - 819,472,785 files from 3,105,923 repositories - 536 GB compressed Parquet storage - 554 programming languages - Extensive quality filtering: Removed vendor code, artifacts, and generated files - Rich Chinese language understanding: High volume of Chinese comments and docs Huge thanks to Hugging Face for the storage grant that made hosting this (and all my other datasets) possible! I have also already dropped several other new code datasets and rolled out QoL improvements for older ones. I will be dropping posts on those throughout the week.
reacted
to
nyuuzyou
's
post
with ๐คฏ
about 11 hours ago
๐จ๐ณ Gitee Code Dataset - The Missing Piece of the Stack https://huggingface.co/datasets/nyuuzyou/gitee-code Gitee is not included in the Software Heritage archive, meaning it is currently missing from datasets like The Stack. This release fills that massive gap, serving as the largest Chinese code dataset and one of the largest code corpuses overall. - 819,472,785 files from 3,105,923 repositories - 536 GB compressed Parquet storage - 554 programming languages - Extensive quality filtering: Removed vendor code, artifacts, and generated files - Rich Chinese language understanding: High volume of Chinese comments and docs Huge thanks to Hugging Face for the storage grant that made hosting this (and all my other datasets) possible! I have also already dropped several other new code datasets and rolled out QoL improvements for older ones. I will be dropping posts on those throughout the week.
upvoted
an
article
about 2 months ago
Continuous batching from first principles
View all activity
Organizations
None yet
calculatortamer
's models
1
Sort:ย Recently updated
calculatortamer/Llama-3-5B-Sheard-Q4_K_M-GGUF
Text Generation
โข
6B
โข
Updated
Jun 8, 2024
โข
2