Search: executive messaging

Paper page - Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

…Research on the application of RMs in code generation , however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over…

May 4, 2026

Paper page - Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

…AI-generated summary We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework…

May 1, 2026

Paper page - BraveGuard: From Open-World Threats to Safer Computer-Use Agents

…This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign…

Jun 4, 2026

Paper page - AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI…

May 29, 2026

Paper page - ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

…Jie Zhu , , , , , , , Abstract ESC-Skills is a skill-centric framework that discovers and self-evolves executable emotional support skills through intervention units and multi-profile refinement to improve interpretability and dialogue outcomes…

May 28, 2026

Paper page - Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

…For grading, Claw-Eval-Live records execution traces , audit logs , service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions…

May 1, 2026

Paper page - 3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

…Yipeng Gao , , , , , , , Abstract Vision-language models are evaluated for procedural 3D modeling tasks through a benchmark and ranking platform that assess their ability to translate text and images into executable 3D code…

Jun 2, 2026

Paper page - GEM: Generative Supervision Helps Embodied Intelligence

…However, a significant gap remains between the high-level semantic focus of standard text-guided pre-training paradigms and the low-level spatial and physical knowledge critical for execution in embodied environments…

May 28, 2026

Paper page - PhoneWorld: Scaling Phone-Use Agent Environments

…Zhengyang Tang , , , , , , , , , , , , , , , , Shangpin Peng , Zheng Ruan , , , , Abstract PhoneWorld is a pipeline that transforms real GUI trajectories and screenshots into controllable mobile environments, executable tasks, and automated verifiers, enabling scalable creation of phone…