Paper page - HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
…if you push rlvr to grow both breadth and depth, i’d want to see how compute scales and whether there’s a sweet spot where extra trajectories stop paying off. This…
…if you push rlvr to grow both breadth and depth, i’d want to see how compute scales and whether there’s a sweet spot where extra trajectories stop paying off. This…
…recursively expands a single user click by (1) identifying reliable same-type locations through non-parametric gating of multi-scale encoder features , and (2) selecting the most spatially distant reliable point as…
…every generation step sees the same positional structure regardless of how far generation has progressed, and the state transition is identical at every chunk. Together, these properties introduce a recurrence into the…
…By constructing controlled tokenizer variants, we identify three key properties of a diffusion-friendly latent manifold : coherent spatial structure , local manifold continuity , and global manifold semantics . We find that these properties are…
…Subsequently, we introduce a critic agent to evaluate the generator's outputs, identify samples that deviate from the planned instructions, and refine the instructions for regeneration. To implement this pipeline, we construct…
…As a potential solution, we present a case study demonstrating that leveraging identified error patterns to preemptively detect and correct recognition errors, while requiring only minimal human intervention (e.g., routing 3…
…Yurim Jeon , , Abstract CIPER is a unified cross-view geo-localization framework that simultaneously performs city-scale retrieval and precise 3-DoF pose estimation using a shared transformer encoder and two-way…
…We further identify that the conventional MTP training objectives ( cross-entropy or KL) are suboptimal in such settings, and therefore we propose a novel end-to-end TV loss that directly optimizes…
…This framework labels every turn along two independent axes (internal reasoning and visible output), yielding four operationally defined failure cells: robust alignment, alignment faking , overt jailbreak , and a distinct failure mode we…
…To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance…