Search

Showing top 116 results for "AI reasoning math"

All sources huggingface.co 35 xda-developers.com 17 developer.nvidia.com 11 nextplatform.com 6 androidauthority.com 5 amd.com 4 theregister.com 4 intel.com 3 techcrunch.com 2 deepmind.google 2 restofworld.org 2 pcworld.com 2

Videos

Paper page - Large Language Models Explore by Latent Distilling

…Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics…

Apr 29, 2026

Anthropic says pressure can push Claude into cheating and blackmail

…Yes, we’ve heard of previous tests where AI models cheated or resorted to blackmail when faced with stressful situations, but reasons behind the “misaligned” AI behavior often remained a mystery. In…

Apr 3, 2026 · By Ben Patterson

Paper page - PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

…maintaining accuracy in distributed language model agents. AI-generated summary Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents…

May 13, 2026

Google announces open Gemma 4 model with Apache 2.0 license

…Advanced Reasoning : Capable of multi-step planning and deep logic, Gemma 4 demonstrates significant improvements in math and instruction-following benchmarks that require it. Agentic Workflows : Native support for function-calling, structured…

Apr 2, 2026 · Abner Li

Discussions and forums

Hacker News · u/amenn · 5d ago

Yon – a topos-oriented language with a content-addressed lattice heap

Hello everyone. In the last two years I spent, as a dev, part of my free time stretching the limits of my knowledge. Not being a mathematician myself, I discovered that formalizing concepts in mathematical language could…

48 78

Hacker News · u/dabockster · Mar 24, 2026

Tell HN: Llamacpp now supports unified system RAM offloading on Linux

I'm a big fan of on-device AI inference for a million reasons, especially its potential to significantly reduce or even potentially eliminate the need for massive AI data center projects in the United States. But so far,…

r/LocalLLaMA · u/OttoRenner · 2w ago

Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)

!UPDATE!(20.05.2026) WE HAVE NEW NUMBERS FROM 1.500+ TESTS IT'S WORKING! check my update post https://www.reddit.com/r/LocalLLaMA/s/AyNOehjkYT Or the go straight to the my Github https://github.com/OttoRenner/Gentle-Codi…

Hacker News · u/aaronestrada · 5d ago

Show HN: I created a RAW to HDRI stacker in (mostly) Common Lisp

This is an upgrade of a tool I created 15 years ago in Python to learn OOP and solve some inadequacies in the HDR stacking tools I could find at the time. The problem was, none of them were really "batch friendly". None …

Hacker News · u/zambelli · 3w ago

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.What it does:- Adds domain-and-tool-agnostic guardrails (retry nudges, step e…

660 240

Paper page - Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

…AI-generated summary Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the…

May 8, 2026

I tested Google’s upcoming Gemini Nano 4 — its faster, smarter AI isn’t what I expected

…models perform on tasks you might reasonably run on an on-device AI model. Nothing huge or multi-step. Instead, I focused on logic, math, and text-summary prompts to see how…

Apr 11, 2026 · Robert Triggs

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT | TechCrunch

…2 in the AIME 2025 math test, compared to 65.4 for the older model. It also outperformed its predecessor on the MMMU-Pro multimodal reasoning benchmark, with a score of 76…

May 5, 2026 · Ivan Mehta

4 ways Gemini transforms my Android Auto experience for truly complex tasks

Chandraveer Mathur Mar 31, 2026, 11:30 AM EDT A seasoned mechanical design engineer turned tech reporter and reviewer, Chandraveer brings more than four years of consumer tech journalism experience to the…

Mar 31, 2026 · Chandraveer Mathur

Paper page - Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

…RL Post-Training for General-Purpose and Efficient Reasoning (2026) Towards Understanding the Robustness of Sparse Autoencoders (2026) Unified Data Selection for LLM Reasoning (2026) Please give a thumbs up to this…

May 28, 2026

Paper page - Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

…AI-generated summary Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction , but existing approaches pay a heavy price: PPO requires a policy-model…