Open-R1: a fully open reproduction of DeepSeek-R1
…https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py Is it possible to contribute to this project? · Yes, you can look at https://huggingface.co/open-r1 and https…
Tracked topic
…https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py Is it possible to contribute to this project? · Yes, you can look at https://huggingface.co/open-r1 and https…
…I have been trying to deploy deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on inferentia with a context window higher than 4096 (let's say MAX_TOTAL_TOKENS=8192 ), but it seems…
…I tested out the new DeepSeek-R1-Distill-Llama-70B-Uncensored-v2-Unbiased model yesterday. It was a very crude test, but I was quite impressed. I'm a newb over here…
…According to DeepSeek's paper, DeepSeek-Distill-Qwen-7B's performance in MATH-500 and AIME24 is 92.8 and 55.5 respectively, which seems to be very different from the values…