How to deploy and fine-tune DeepSeek models on AWS
…I have been trying to deploy deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on inferentia with a context window higher than 4096 (let's say MAX_TOTAL_TOKENS=8192 ), but it seems…
Tracked topic
…I have been trying to deploy deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on inferentia with a context window higher than 4096 (let's say MAX_TOTAL_TOKENS=8192 ), but it seems…
…https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py Is it possible to contribute to this project? · Yes, you can look at https://huggingface.co/open-r1 and https…
…According to DeepSeek's paper, DeepSeek-Distill-Qwen-7B's performance in MATH-500 and AIME24 is 92.8 and 55.5 respectively, which seems to be very different from the values…
…I tested out the new DeepSeek-R1-Distill-Llama-70B-Uncensored-v2-Unbiased model yesterday. It was a very crude test, but I was quite impressed. I'm a newb over here…
…Otherwise use reasoning-format flag and pass DeepSeek value to get pure tokens Now I can use llama.cpp all the time. A big thank you to the devs. Is there currently…
…blog post, check it out! This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS . Thx to all all. Great work!!! I have a question for concurrency…
📻 🎙️ Hey, I generated an AI podcast about this blog post, check it out! This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS . when run this…
…you chose APO over GRPO, so I dove into comparing approaches across SmolLM3, Tulu3, and DeepSeek-R1. Ended up creating a visual guide to help navigate the post-training landscape on 🤗…