Search

Showing top 8 results for "DeepSeek"

…I have been trying to deploy deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on inferentia with a context window higher than 4096 (let's say MAX_TOTAL_TOKENS=8192 ), but it seems…

Mar 27, 2025 · Simon Pagezy

Open-R1: a fully open reproduction of DeepSeek-R1

…https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py Is it possible to contribute to this project? · Yes, you can look at https://huggingface.co/open-r1 and https…

Mar 27, 2025 · Elie Bakouch

Open R1: Update #2

…According to DeepSeek's paper, DeepSeek-Distill-Qwen-7B's performance in MATH-500 and AIME24 is 92.8 and 55.5 respectively, which seems to be very different from the values…

Feb 6, 2025

Open-source DeepResearch – Freeing our search agents

…I tested out the new DeepSeek-R1-Distill-Llama-70B-Uncensored-v2-Unbiased model yesterday. It was a very crude test, but I was quite impressed. I'm a newb over here…

Mar 27, 2025 · Aymeric Roucher

New in llama.cpp: Model Management

…Otherwise use reasoning-format flag and pass DeepSeek value to get pure tokens Now I can use llama.cpp all the time. A big thank you to the devs. Is there currently…

Dec 11, 2025

FastRTC: The Real-Time Communication Library for Python

…blog post, check it out! This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS . Thx to all all. Great work!!! I have a question for concurrency…

Jan 12, 2025 · Freddy Boulton

SmolVLM2: Bringing Video Understanding to Every Device

📻 🎙️ Hey, I generated an AI podcast about this blog post, check it out! This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS . when run this…

Apr 8, 2025 · Orr Zohar

SmolLM3: smol, multilingual, long-context reasoner

…you chose APO over GRPO, so I dove into comparing approaches across SmolLM3, Tulu3, and DeepSeek-R1. Ended up creating a visual guide to help navigate the post-training landscape on 🤗…

Sep 10, 2025 · Elie Bakouch

Followed topics