📊 raki research
📄 arXiv130 paper

📄 arXiv 일일 적재 — 산업 paradigm 시그널

cs.AI · cs.LG · cs.CV · cs.RO · q-bio · physics.optics · econ.GN 등 톱다운 chain 관련. 새 paradigm 자동 감지·marker prompt에 inject.

📅 2026-05-18 eess.SY econ.GN

Residential Battery Pooling Under Backup Commitments

Residential batteries increasingly serve two roles: they can earn money by arbitraging wholesale prices and providing grid services, and they provide backup power during outages. This dual use creates a basic tradeoff between earning market value and preserving outage readiness. Coordination across …
Jerry Anunrojwong, Baosen Zhang · 📄 PDF
📅 2026-05-18 Economics

Engagement vs. Commitment: The Economic Trade-Offs of Polarizing News Content

Content that drives engagement need not be the same content that drives willingness to pay. We study how polarizing content affects engagement (time on site) and commitment (subscriptions and retention) on a major news platform. We measure article-level polarization with deep-learning classifiers an…
Shunyao Yan, Klaus M. Miller · 📄 PDF
📅 2026-05-18 Optics·CPO

Crosstalk-free Chiral Anomaly Bulk States in Photonic Crystals

Ultracompact cladding-free waveguide arrays with zero inter-channel spacing and negligible crosstalk open a new avenue for high-density integrated photonic circuits. However, existing cladding-free waveguide arrays typically rely on conventional trivial bulk modes, making them highly susceptible to …
Guochao Wei, Yingfeng Qi, Kang Du, Wei Zhu, Zhenzhen Liu, Junjun Xiao, Shengxiang Wang, Zhen Gao · 📄 PDF
📅 2026-05-18 Optics·CPO quant-ph

A Wafer-Scale Heterogeneous III-V-on-Silicon Nitride Quantum Photonic Platform

Heterogeneous integration of gain and strongly nonlinear materials with ultra-low-loss silicon nitride (SiN) photonics offers a route to scalable quantum circuits, but concurrent wafer-scale manufacturability, low interlayer loss, and high performance have been challenging to realize. Here we demons…
Lillian Thiel, Boqiang Shen, Jasper R. Venneberg, Melissa A. Guidry, Nic Arnaud, Adam Slater, Lucas Wang, Xuefeng Li, Jo… · 📄 PDF
📅 2026-05-18 quant-ph physics.optics

Optical Neural Networks from Coherent Transient Dynamics in Waveguide QED

Optical neural networks promise ultrafast, low-energy information processing by performing computation directly with photons. Current implementations, however, are largely restricted to steady-state operation and rely on high-latency electro-optical conversion for nonlinear activation. To address th…
Jiande Cao, Yexiong Zeng, Franco Nori, Ze-Liang Xiang · 📄 PDF
📅 2026-05-18 Optics·CPO

Amplification of Weak Forces via Parametric Interactions and Non-Markovian Effects in Cavity Optomechanics

Weak force amplification describes the process of amplifying a faint low-frequency signal by means of an additional high-frequency modulation, which plays a vital role in quantum sensing and high-precision measurement. However, the potential enhancement of weak-force amplification in non-Markovian e…
Y. F. Li, Ze Wang, W. Y. Hu, Yan-Hui Zhou, Cheng Shang, and H. Z. Shen · 📄 PDF
📅 2026-05-18 eess.SP physics.optics

A Computationally Efficient Reciprocal Effective Roughness Model for Diffuse Scattering

Ray-tracing (RT) has become central to site-specific electromagnetic propagation modeling in dynamic complex environments. Yet its computational burden grows sharply as high-fidelity digital twins of these environments scale to millions of facets whose material parameters must be continuously update…
Giacomo Melloni, Enrico M. Vitucci, Vittorio Degli Esposti, Samuel Berweger, Jack Chuang, Camillo Gentile, Nada Golmie · 📄 PDF
📅 2026-05-18 Optics·CPO

Optoelectronic Chromatic Dispersion in a Single Photodiode for Machine-Learning-Based Computational Spectroscopy

Spectroscopy requires high-precision wavelength discrimination but typically requires bulky, alignment-sensitive instrumentation. To address this, we present a compact computational spectrometer built from a single germanium PN photodiode. The system exploits optoelectronic chromatic dispersion (OED…
Endalamaw Ewnu Kassa, Ziv Glasser, Uttama K. Saint, Roi Yozevitch, Shmuel Sternklar · 📄 PDF
📅 2026-05-18 Optics·CPO

The thin line for optical neural networks towards broad practical relevance

Optical neural networks promise unmatched efficiency, bandwidth, and latency, critical benefits as demand for neural network hardware surges. However, their practical value for general-purpose acceleration or specialized applications must be proven under application-realistic conditions. We discuss …
Anas Skalli, Daniel Brunner · 📄 PDF
📅 2026-05-18 Optics·CPO quant-ph

Energy-Resolved Eigenmode Spectroscopy of 1-D and 2-D Non-Hermitian Skin Effects

Non-Hermitian lattices can host the non-Hermitian skin effect, a boundary-induced collapse of all bulk eigenstates into exponentially localized edge modes. This effect underlies anomalous bulk-boundary correspondence and remarkable enhancements in non-Hermitian sensing, yet direct energy-resolved ac…
Rohith Srikanth, Sashank Kaushik Sridhar, Avik Dutt · 📄 PDF
📅 2026-05-18 cond-mat.mes-hall physics.optics quant-ph

Strong nanomechanical Duffing nonlinearity and interactions induced through cavity optomechanics

Nonlinearity is a key resource in both classical and quantum signal processing. Nonlinear nanomechanical elements have found applications ranging from sensing to computing, while networks of nonlinear resonators, as well as nonlinearly coupled networks of linear resonators, constitute promising plat…
Jesse J. Slim, Ewold Verhagen · 📄 PDF
📅 2026-05-18 Optics·CPO

Self-healing of the Montgomery pattern

Self-healing -- the ability of a structured beam to reconstruct its transverse profile after partial obstruction -- has been demonstrated for diffraction-free beams, where the recovery distance varies continuously with obstruction size. Here, we investigate self-healing in the Montgomery pattern, a …
Athena Xu, Oscar de Vries, Alfonso Palmieri, Murat Yessenov, Ayman F. Abouraddy, Federico Capasso · 📄 PDF
📅 2026-05-18 quant-ph cond-mat.mtrl-sci physics.optics

Quantum Emitters at Telecommunication Wavelengths based on Carbon Defects in Transition Metal Dichalcogenides

Low-dimensional materials have emerged as promising hosts for quantum emitters, whose emission typically arises from either strain-induced band bending or defect-induced two-level systems. Among these materials, transition metal dichalcogenide (TMD) monolayers have attracted particular attention; ho…
Chanaprom Cholsuk, Sujin Suwanna, Tobias Vogl · 📄 PDF
📅 2026-05-18 Optics·CPO

Comparative study of second harmonic generation at 1030 nm in BiBO and LBO crystals using a 100 W-class picosecond laser

We present a systematic experimental comparison of single-pass second-harmonic generation (SHG) in bismuth triborate (BiBO) and lithium triborate (LBO) nonlinear crystals, driven by a 1.3 ps, 91 kHz laser at 1030 nm with up to 57 W of average input power. Both crystals yielded 32 W of second harmoni…
Huzefa Aliasger, Šimon Šatra, Ondřej Novák, Jiří Mužík, Michal Jelínek, Martin Smrž, Tomáš Mocek · 📄 PDF
📅 2026-05-18 Optics·CPO eess.IV

Using a Digital Twin for Fringe Projection Profilometry Optimisation

Fringe projection profilometry (FPP) is a widely used technique for measuring object surface form and three-dimensional (3D) geometry, capable of delivering high-precision, high-resolution measurements when paired with suitable cameras and projectors. However, in practical deployments, identifying p…
D. Weston, X. Kong, G. S. D. Gordon, S. Piano · 📄 PDF
📅 2026-05-18 cond-mat.mtrl-sci cond-mat.other physics.atom-ph physics.comp-ph

Electronic mechanism of sub-100-fs demagnetization induced by a femtosecond light pulse

A quantitative understanding of the processes that trigger light-induced demagnetization on ultrashort timescales is crucial for achieving an ultrafast, radiation-controlled magnetic response in materials. This milestone is essential for developing next-generation magnetic storage devices and ultraf…
Konrad J. Kapcia, Victor Tkachenko, Flavio Capotondi, Alexander Lichtenstein, Serguei Molodtsov, Przemysław Piekarz, Bea… · 📄 PDF
📅 2026-05-18 Optics·CPO

From order to chaos in a chip-scale Kerr parametric oscillator

Integrated photonics has enabled a wide class of chip-scale light sources and quantum technologies. Within this field, microresonator-based degenerate optical parametric oscillators (DOPOs) have gained prominence. Above a critical power threshold, these systems undergo spontaneous symmetry breaking …
Luca O. Trinchão, Juan Diego Mazo-Vásquez, Miguel Nienstedt, Luiz Peres, Julius T. Gohsrich, Eduardo S. Gonçalves, Alekh… · 📄 PDF
📅 2026-05-18 quant-ph physics.optics

Detecting nonclassicality in randomly-displaced copies of a squeezed state

We address a fundamental question: Can one determine whether a received signal is squeezed when each copy arrives with a different displacement/amplitude? We introduce an interaction Hamiltonian that converts quadrature squeezing into number squeezing. Using this conversion, we test whether the copi…
Mehmet Emre Tasgin · 📄 PDF
📅 2026-05-18 Bio·Genomics cs.LG

PACE: Geometry-Aware Bridge Transport for Single-Cell Trajectory Inference

Single-cell trajectory inference from destructive time-course snapshots is fundamentally ill-posed: neither cross-time cell correspondences nor continuous trajectories are observed, so the snapshot distributions alone do not uniquely determine the underlying dynamics. Existing optimal transport and …
Chenglei Yu*, Chuanrui Wang*, Bangyan Liao, Tailin Wu · 📄 PDF
📅 2026-05-18 ML cs.AI q-bio.QM

DCFold: Efficient Protein Structure Generation with Single Forward Pass

AlphaFold3 introduces a diffusion-based architecture that elevates protein structure prediction to all-atom resolution with improved accuracy. This state-of-the-art performance has established AlphaFold3 as a foundation model for diverse generation and design tasks. However, its iterative design sub…
Zhe Zhang, Yuanning Feng, Yuxuan Song, Keyue Qiu, Hao Zhou, Wei-Ying Ma · 📄 PDF
📅 2026-05-18 ML q-bio.BM q-bio.QM

Protein Fold Classification at Scale: Benchmarking and Pretraining

Classifying protein topology is essential for deciphering biological function, but progress is held back by the lack of large-scale benchmarks that avoid duplicates and by models that do not scale well. We introduce TEDBench, a large-scale, non-redundant benchmark for protein fold classification con…
Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt · 📄 PDF
📅 2026-05-18 q-bio.PE q-bio.QM

Incorporating vaccine effects into epidemiological models: common pitfalls and solutions

Incorporating vaccination into mathematical models appears deceptively simple: models integrate vaccine-derived protections, such as reduced susceptibility to infection, using parameters informed by empirical estimates of vaccine efficacy or effectiveness (VE). In practice, however, empirical VE est…
Casey E. Middleton, Oliver Eales, James M. McCaw, Freya M. Shearer · 📄 PDF
📅 2026-05-18 stat.ME q-bio.QM

OSSMM: An Open-Source Sleep Monitor and Modulator

We present the Open-Source Sleep Monitor and Modulator (OSSMM), an open-source hardware and software platform for accessible sleep research. The OSSMM comprises a small wearable headband built from 3D prints and affordable commercial-off-the-shelf (COTS) components at a material cost under 40 euros,…
Jonny Giordano, Fergal Stapleton, Gabriel Palma, Barak A. Pearlmutter · 📄 PDF
📅 2026-05-18 quant-ph cs.ET

Adaptive Clifford+T Decomposition of Large Toffoli Gates with One Clean Ancilla

Multi-controlled Toffoli gates are fundamental building blocks in quantum computation, with applications in quantum arithmetic, simulation, and search algorithms. In fault-tolerant architectures, their realization is constrained by the high cost of non-Clifford resources, particularly in terms of T-…
Abhoy Kole, Majd Assaad, Till Schnittka, Rolf Drechsler · 📄 PDF
📅 2026-05-18 cs.HC cs.AI cs.CY cs.ET

The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration

Large Language Models (LLMs) are increasingly used in educational settings as interactive tools for collaboration. However, their tendency toward sycophancy, aligning with user beliefs even when incorrect, raises concerns for learning and decision-making, especially for less knowledgeable users. Thi…
Cansu Koyuturk, Sabrina Guidotti, Dimitri Ognibene · 📄 PDF
📅 2026-05-18 Architecture

ROA-Based Subharmonic Injection Locking for Oscillator-Based Ising Machines

This paper introduces on-chip integrated rotary traveling wave oscillators (RTWOs) organized into rotary oscillator array (ROA) bricks as an external perturbation to induce subharmonic injection locking (SHIL) in oscillator-based Ising machines (OIMs). The implementation of SHILs on chip is challeng…
Nicholas Sica, Baris Taskin · 📄 PDF
📅 2026-05-18 Architecture

CPPL: A Circuit Prompt Programming Language

Large language models (LLMs) have shown promise in register-transfer level (RTL) design automation, but direct RTL generation remains difficult to validate, optimize, and integrate with compiler-based hardware design flows. Hardware compiler infrastructures such as CIRCT provide typed intermediate r…
Shuo Yin, Yihe Wang, Lancheng Zou, Xufeng Yao, Tinghuan Chen, Chen Bai, Zhengrong Wang, Tsung-Yi Ho, Bei Yu · 📄 PDF
📅 2026-05-18 cs.NI cs.AR

Enabling Agile Ambient IoT Networking via a Parameterized Hybrid Radio

The emergence of Ambient IoT signals a paradigm shift toward massive batteryless networking. However, the absence of an agile physical layer substrate remains a fundamental barrier to research and standardization. Current testbeds are hindered by decoupled radio paths, high static power, and cumbers…
Jiazhen Lei, Fengyuan Zhu, Tianze Cao, Yuxin Sha, Linling Zhong, Wenhui Li, Bingbing Wang, Zeming Yang, Jinyang Sun, Yib… · 📄 PDF
📅 2026-05-18 cs.DC cs.AR

iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience

Uninterrupted system availability is a critical requirement for enterprise operations, yet traditional high-availability clusters suffer from limitations such as single points of failure and inefficient resource allocation. This paper introduces and evaluates the Integrated High Availability Cluster…
Siddique Abubakr Muntaka, Edward Danso Ansong, Benjamin Yankson, Oliver Kornyo, Faiza Hussein, Mohammed Nadhir Muntaka, … · 📄 PDF
📅 2026-05-18 Architecture cs.AI

Building Reliable Arithmetic Multipliers Under NBTI Aging and Process Variations

Hardware aging poses a significant challenge for integrated circuits (ICs), leading to performance degradation and eventual failure. In this work, we focus on the aging of arithmetic multipliers, which are a cornerstone of modern computing systems including in CPUs, GPUs, and FPGAs, as well as AI ac…
Masoud Heidary, Biresh Kumar Joardar · 📄 PDF
📅 2026-05-18 Robotics eess.SY

Active Defense Against False Data Injection Attacks in Robotic Manipulators

Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in…
Gabriele Gualandi, Carl Mikael Larsson, Alessandro V. Papadopoulos · 📄 PDF
📅 2026-05-18 eess.IV cs.CV cs.RO

See Silhouettes in Motion with Neuromorphic Vision

Quasi-bimodal objects, such as text, road signs, and barcodes, play a basic yet vital role in daily visual communication. By boiling these down to clear silhouettes, binarization uses a minimal language to convey essential vision cues for maximum downstream efficiency. The catch is that frame-based …
Pei Zhang, Shijie Lin, Zhou Ge, Jinpeng Chen, Wei Pu · 📄 PDF
📅 2026-05-18 Robotics

Scenario Generation in Roundabouts with Adjustable Interaction Intensity

Roundabouts, characterized by frequent merging and yielding interactions, remain a safety-critical corner case for the development and testing of intelligent driving functions. However, extracting sufficient near-critical scenarios from naturalistic data is inefficient. Most existing scenario genera…
Li Li, Till Temmen, Tobias Brinkmann, Björn Krautwig, Markus Eisenbarth, Jakob Andert · 📄 PDF
📅 2026-05-18 Robotics cs.AI

Confidence-Gated Robot Autonomy: When Does Uncertainty Actually Help?

Robotic systems often use predictive uncertainty to decide whether to act autonomously or defer to a fallback policy. In threshold-gated autonomy, uncertainty matters mainly through its ability to rank likely errors. Standard metrics such as expected calibration error and AUROC do not directly test …
Johannes A. Gaus, Jhon P. F. Charaja, Daniel Haeufle · 📄 PDF
📅 2026-05-18 Robotics

FUSE: A Framework for Unified State Estimation in Robotic SLAM Systems

Tightly coupled SLAM formulations under mixed-rate sensing often bind temporal processing, local geometric association, estimator formulation, and map-update policy into method-specific designs. Such binding makes it difficult to vary one design choice without re-engineering the rest of the state-es…
Wei Wu, Honglin Chen, Wenhan Cao, Yao Lyu, Jiangtao Li, Tao Zhang, Shengbo Eben Li · 📄 PDF
📅 2026-05-18 Robotics

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules…
Zhiyuan Zhang, Zhenghao Jin, Yanlun Peng, Xianda Guo, Haoran Liu, Shaofeng Zhang, Xingjun Ma, Zuxuan Wu, Junchi Yan, Xia… · 📄 PDF
📅 2026-05-18 Robotics

4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

We present 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving, centered on 4D frequency-modulated continuous-wave (FMCW) Lidar sensing. Unlike conventional time-of-flight Lidar datasets that mainly provide geometric measurements, 4DLidarOpen includes point-wise radial velocit…
Kane Qian, Xin Zhao, Yining Shi, Rujun Yan, Zhengqing Pan, Kaojin Zhu, Mengmeng Yang, Kai Sun, Diange Yang, Kun Jiang · 📄 PDF
📅 2026-05-18 AI cs.CV cs.RO

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

In real home deployments, household agents must often operate from a complete household scene and a situated household request, rather than from a clean task specification. Such requests require agents to identify task-relevant entities, recover intended task conditions, and resolve ordering constra…
ZhiYuan Feng, Yu Deng, Ruichuan An, Zhenhua Liu, Qixiu Li, Keming Wu, Zhiying Du, Weijie Wang, Haoxiao Wang, Shuang Chen… · 📄 PDF
📅 2026-05-18 Robotics cs.AI cs.CV

Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide valuable geometric and semantic context for autonomous robotic systems. In this paper, we treat observations from fixed external RGB cameras as Common Prior Maps (CPMs): wide-field views of …
Giorgia Modi, Davide Buoso, Giuseppe Averta, Daniele De Martini · 📄 PDF
📅 2026-05-18 Robotics cs.AI cs.CV

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Ex…
Giorgia Modi, Davide Buoso, Giuseppe Averta, Daniele De Martini · 📄 PDF
📅 2026-05-18 Vision cs.RO

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, …
Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang, Daquan Z… · 📄 PDF
📅 2026-05-18 Robotics

Assessing Localization Technologies for Pedestrian Collision Avoidance

Robust pedestrian safety is crucial to the next-generation of intelligent transportation systems. Such systems rely on active pedestrian localization and predictive collision alerts. Pedestrian localization can be supported by Ultra-Wideband technology and Bluetooth 6.0, which offer high-precision r…
Joshua Varughese, Joseba Gorospe, Novel Certad, Cristina Olaverri-Monreal · 📄 PDF
📅 2026-05-18 ML cs.AI cs.CV cs.RO

PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics

World models built on recurrent state space architectures enable efficient latent imagination, yet remain physically unstructured, producing dynamics that violate conservation and dissipative principles. We introduce a unified Port-Hamiltonian framework that remedies this through three synergistic m…
Xueyu Luan, Chenwei Shi · 📄 PDF
📅 2026-05-18 Robotics cs.LG math.DS math.OC

Dynamic robotic cloth folding with efficient Koopman operator-based model predictive control

Robotic cloth folding is a challenging task, particularly when considering dynamic folding tasks, which aim at folding cloth by fast motions that leverage its dynamics. When subject to such fast motions, the complexity of cloth dynamics hinders both system identification and planning of folding traj…
Edoardo Caldarelli, Franco Coltraro, Adrià Colomé, Lorenzo Rosasco, Carme Torras · 📄 PDF
📅 2026-05-18 Robotics cs.AI

Towards Ubiquitous Mapping and Localization for Dynamic Indoor Environments

We present UbiSLAM, an innovative solution for real-time mapping and localization in dynamic indoor environments. By deploying a network of fixed RGB-D cameras strategically throughout the workspace, UbiSLAM addresses limitations commonly encountered in traditional SLAM systems, such as sensitivity …
Halim Djerroud, Nico Steyn, Olivier Rabreau, Patrick Bonnin, Abderraouf Benali · 📄 PDF
📅 2026-05-18 cond-mat.mes-hall cond-mat.mtrl-sci cs.AI cs.RO

Qumus: Realization of An Embodied AI Quantum Material Experimentalist

While modern Large Language Models (LLMs) and agentic artificial intelligence (AI) have demonstrated transformative capabilities in digital domains, the realization of embodied AI capable of real-world scientific discovery remains a difficult frontier. The advancements are hindered by the inherent c…
Lihan Shi, Zhaoyi Joy Zheng, Xinzhe Juan, Yimin Wang, Ming Yin, Mayank Sengupta, Kristina Wolinski, Yanyu Jia, Jingzhi S… · 📄 PDF
📅 2026-05-18 Robotics cs.CY

REBAR: Reference Ethical Benchmark for Autonomy Readiness

As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of their limitations and ensuring accountability of those who misuse them. Current ethical embodied AI frameworks remain mostly qualitative, focusing on sys…
Jonathan Diller, David Barnes, Rebekah Bogdanoff, Rhett Collier, Roddy Collins, Keith Fieldhouse, Yonatan Gefen, Cameron… · 📄 PDF
📅 2026-05-18 Robotics eess.SY

REACT: Environment-Adaptive Architecture for Continuous Formation Navigation of Wheeled Mobile Robots

Formation control of wheeled mobile robots (WMRs) has been extensively studied due to its broad applications in fields such as logistics transportation, environmental monitoring, and search and rescue. However, most existing works mainly focus on tracking predefined formations, which limits their ad…
Jianghong Dong, Yifeng Zhang, Jiawei Wang, Mengchi Cai, Keqiang Li, Guillaume Sartoretti · 📄 PDF
📅 2026-05-18 Robotics

Bidirectional Optical sensors for Actuation Tracking (BOAT) in soft lattice systems

The growing adoption of lattice-based structures in soft robotics creates a need for advanced sensing solutions capable of monitoring their global deformation, particularly compression and extension. In this work, we address this challenge by introducing a novel optical sensor based on two patterned…
Petr Trunin, Carolina Gay, Anderson Brazil Nardin, Trevor Exley, Diana Cafiso, Lucia Beccai · 📄 PDF
📅 2026-05-18 Robotics

Geometry-Aware Surrogate for Real-Time Hydrodynamics Estimation of Autonomous Ground Vehicles in Amphibious Environments

Autonomous ground vehicles operating in shallow water or flood-prone terrains require dynamic models that account for hydrodynamic forces. However, the simulation and planning tools currently available either lack the physical fidelity or are too computationally expensive to run in real time. This w…
Ammar Waheed, Luke Gallantree, Zohaib Hasnain · 📄 PDF
📅 2026-05-18 Robotics cs.AI

Key-Gram: Extensible World Knowledge for Embodied Manipulation

Embodied control increasingly requires models to follow compositional language instructions while reasoning over dynamic visual states. However, current vision-language-action policies and world-action models often couple linguistic knowledge with visual computation in a shared backbone or condition…
Jingjing Fan, Siyuan Li, Botao Ren, Zhidong Deng · 📄 PDF
📅 2026-05-18 cs.CR cs.AI cs.RO

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Open-vocabulary embodied AI agents increasingly rely on vision-language models such as CLIP for object perception and task grounding. However, the shared embedding space that enables this flexibility introduces a structural vulnerability to typographic attacks, where printed text in a physical scene…
Ali Iranmanesh, Peng Liu · 📄 PDF
📅 2026-05-18 Robotics

Data-Driven Dynamic Modeling of a Tendon-Actuated Continuum Robot

Developing dynamic models for tendon-driven continuum robots is challenging due to their nonlinear, high-dimensional, and friction-dominated dynamics. This paper presents a comparative study of data-driven system identification methods, including N4SID, ARX, and SINDYc, for modeling a tendon-actuate…
Harald Minde Hansen, Bjørn Kåre Sæbø, Kristin Y. Pettersen, Jan Tommy Gravdahl, Mario Di Castro · 📄 PDF
📅 2026-05-18 Robotics

Dexora: Open-source VLA for High-DoF Bimanual Dexterity

Vision-Language-Action (VLA) models have recently become a central direction in embodied AI, but current systems are restricted to either dual-gripper control or single-arm dexterous hand manipulation. While low-dimensional gripper control can often be handled with simpler methods, high-dimensional …
Zongzheng Zhang, Jingrui Pang, Zhuo Yang, Kun Li, Minwen Liao, Saining Zhang, Guoxuan Chi, Jinbang Guo, Huan-ang Gao, Mo… · 📄 PDF
📅 2026-05-18 Vision cs.AI

StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video

Recovering world space 4D motion of two interacting hands from egocentric video is a fundamental capability for supervising robot policy learning, where wrist trajectories track the end-effector and finger articulations specify the grasp pose. Two major challenges arise in this setting: hands freque…
Huajian Zeng, Chaohua Yao, Yuantai Zhang, Jiaqi Yang, Rolandos Alexandros Potamias, Xingxing Zuo · 📄 PDF
📅 2026-05-18 Vision

OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Omni-proactive streaming video understanding, i.e., autonomously deciding when to speak and what to say from continuous audio-visual streams, is an emerging capability of omni-modal large language models. Existing benchmarks fall short in three key aspects: they rely primarily on visual signals, ado…
Ruixiang Zhao, Jie Yang, Zijie Xin, Tianyi Wang, Fengyun Rao, Jing LYU, Xirong Li · 📄 PDF
📅 2026-05-18 Vision

Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling

Transformer-based models have advanced feedforward novel view synthesis (NVS). Current architectures such as GS-LRM and LVSM mix semantic information (e.g., RGB) and spatial information (e.g., Plücker rays) into a shared feature space. Since Plücker rays naturally carry lattice-like spatial structur…
Yihang Wu, Yihang Sun, Shaofeng Zhang, Zuxuan Wu, Junchi Yan, Xiaosong Jia, Yu-gang Jiang · 📄 PDF
📅 2026-05-18 Vision

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

Modern interactive video world models have achieved impressive visual fidelity, yet lack fine-grained multi-entity control and cross-entity, cross-world generalization. We trace this gap to the action interface: standard control protocols (e.g. animation IDs, device inputs, scene-level captions) bin…
Shangwen Zhu, Qianyu Peng, Zhao Pu, Zhilei Shu, Xiangrui Ke, Zhaohu Xing, Zizhao Tong, Zeqing Wang, Xinyu Cui, Huangji W… · 📄 PDF
📅 2026-05-18 Vision

Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth

Vision-Language Models (VLMs) deployed as situated agents in high-resolution visual environments require active perception -- the ability to dynamically decide where to look through operations like zooming, cropping, and panning. However, current training paradigms produce models that mimic the surf…
Yuhuan Wu, Cong Wei, Fangzhen Lin, Wenhu Chen, Haozhe Wang · 📄 PDF
📅 2026-05-18 Vision

Dance Across Shifts: Forward-Facilitation Continual Test-Time Adaptation through Dynamic Style Bridging

Continual Test-Time Adaptation (CTTA) aims to empower perception systems to handle dynamic distribution shifts encountered after deployment. Existing methods predominantly follow a backward-alignment paradigm, which rigidly aligns incoming data with supervisory surrogates derived from the source dom…
Zhilin Zhu, Yabin Wang, Zhiheng Ma, Yaguang Song, Yaowei Wang, Xiaopeng Hong · 📄 PDF
📅 2026-05-18 Vision cs.AI cs.LG

CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

Vision-language models (VLMs) have shown remarkable ability in aligning visual and textual representations, enabling a wide range of multimodal applications. However, their large-scale training data inevitably raises concerns about privacy, copyright, and undesirable content, creating a strong need …
Shen Lin, Junhao Dong, Rongjie Chen, Xiaoyu Zhang, Li Xu, Xiaofeng Chen · 📄 PDF
📅 2026-05-18 Robotics cs.AI cs.CV

ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics

Most existing vision-language manipulation research targets rigid robotic arms, whose fixed morphology limits adaptability in cluttered or confined spaces. Soft robotic arms offer an appealing alternative due to their deformability, but confront challenges such as unreliable proprioception and distr…
Ziyu Wei, Luting Wang, Chen Gao, Li Wen, Si Liu · 📄 PDF
📅 2026-05-18 Vision cs.AI

CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

Spatial intelligence requires multimodal large language models (MLLMs) to move beyond single-view perception and reason consistently about objects, visibility, geometry, and interactions across multiple viewpoints. However, progress in cross-view reasoning remains limited by three major gaps: the sc…
Wei Wang, Yuqian Yuan, Tianwei Lin, Wenqiao Zhang, Siliang Tang, Jun Xiao, Yueting Zhuang · 📄 PDF
📅 2026-05-18 Vision

SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents

Long-horizon multimodal agents in open-world games must stay goal-directed across many low-level interactions under tight token and latency budgets. Existing approaches often trade off costly per-step reasoning against reactive execution that can drift, repeat failures, and recover poorly. Our key i…
Wencan Jiang, Jiangning Zhang, Jianbiao Mei, Jinzhuo Liu, Yu Yang, Xiaobin Hu, Zhucun Xue, Yong Liu, Dacheng Tao · 📄 PDF
📅 2026-05-18 Vision

Leveraging Latent Visual Reasoning in Silence

Latent visual reasoning involves visual evidence more directly in multimodal reasoning by inserting continuous latent tokens before textual generation. However, the necessity of these latent tokens at inference remains ambiguous. We show that replacing latent tokens with random noise or removing the…
Dongyao Zhu, Zhen Wang, Xi Xiao, Han Jiang, Saeed Vahidian, Wei-Lun Chao, Tanya Berger-Wolf, Yu Su, Raju Vatsavai, Jiany… · 📄 PDF
📅 2026-05-18 Vision

Articulation in Prime: Primitive-Based Articulated Object Understanding from a Single Casual Video

Retrieving the 3D kinematics of articulated objects from monocular video is a fundamental challenge in computer vision. Existing methods rely on complex video setups or cues such as long-term point tracking or wide-baseline matching, but are frequently brittle under severe occlusions, rapid camera e…
Arslan Artykov, Tom Ravaud, Nicolás Violante-Grezzi, Vincent Lepetit · 📄 PDF
📅 2026-05-18 Vision

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely on raw history replay or text-only memory, which either over…
Ziyun Zeng, Hang Hua, Bocheng Zou, Mu Cai, Rogerio Feris, Jiebo Luo · 📄 PDF
📅 2026-05-18 Vision

CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

Metaverse platforms rely on creator-driven marketplaces where avatars are assembled from discrete, taxonomy-labeled 3D assets (e.g., tops, bottoms, shoes, accessories) under strict category and topology constraints. While users increasingly expect free-form text control, text-only retrieval is britt…
Rajeev Goel, Jason Ding, Phani Harish Wajjala, Pavan Turaga, Tejaswi Gowda, Krishna C. Garikipati · 📄 PDF
📅 2026-05-18 Vision

A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

Prior work on fine-grained image recognition (FGIR) has established the importance of the backbone selection, but has neglected the accuracy-vs-cost trade-offs under different training and evaluation settings. In this work we conduct a large-scale study with over 2000 experiments across 6 training a…
Edwin Arkel Rios, Augusto Christian Surya, Oswin Gosal, Fernando Mikael, Mary Madeline Nicole, Kisoon Jang, Bo-Cheng Lai… · 📄 PDF
📅 2026-05-18 Vision

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo…
Komal Kumar, Ankan Deria, Abhishek Basu, Fahad Shamshad, Hisham Cholakkal, Karthik Nandakumar · 📄 PDF
📅 2026-05-18 Robotics cs.CV

Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction

The ability to navigate and interact with complex environments is central to real-world embodied agents, yet navigation in unseen environments remains challenging due to "experiential amnesia," where existing trajectory-driven or reactive policies fail to synthesize generalizable strategies from pas…
Nga Teng Chan, Yi Zhang, Yechi Liu, Renwen Cui, Fanhu Zeng, Zeyuan Ding, Xiancong Ren, Zhang Zhang, Qifeng Chen, Jian Li… · 📄 PDF
📅 2026-05-18 Vision

Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

Autoregressive video generation has improved rapidly in visual fidelity and interactivity, but it still suffers from long-term inconsistency and memory degradation. Most existing solutions either compress historical frames using predefined strategies or retrieve keyframes based on coarse implicit at…
Jinzhuo Liu, Jiangning Zhang, Wencan Jiang, Yabiao Wang, Dingkang Liang, Zhucun Xue, Ran Yi, Yong Liu · 📄 PDF
📅 2026-05-18 Vision

EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchmark for cross-view memory reasoning over synchronized egocen…
Ruiping Liu, Junwei Zheng, Yufan Chen, Di Wen, Shaofang Quan, Chengzhi Wu, Jiaming Zhang, Kailun Yang, Kunyu Peng, Raine… · 📄 PDF
📅 2026-05-18 Vision

Spectral Progressive Diffusion for Efficient Image and Video Generation

Diffusion models have been shown to implicitly generate visual content autoregressively in the frequency domain, where low-frequency components are generated earlier in the denoising process while high-frequency details emerge only in later timesteps. This structure offers a natural opportunity for …
Howard Xiao, Brian Chao, Lior Yariv, Gordon Wetzstein · 📄 PDF
📅 2026-05-18 Vision cs.DC

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-desi…
Yukang Chen, Luozhou Wang, Wei Huang, Shuai Yang, Bohan Zhang, Yicheng Xiao, Ruihang Chu, Weian Mao, Qixin Hu, Shaoteng … · 📄 PDF
📅 2026-05-18 Vision

Aurora: Unified Video Editing with a Tool-Using Agent

Recent video editing models have converged on a unified conditioning design: a single diffusion transformer jointly consumes text, source video, and reference images, and one set of weights covers replacement, removal, style transfer, and reference-driven insertion. The design is flexible, but it as…
Yongsheng Yu, Ziyun Zeng, Zhiyuan Xiao, Zhenghong Zhou, Hang Hua, Wei Xiong, Jiebo Luo · 📄 PDF
📅 2026-05-18 cs.SD cs.CV

WavFlow: Audio Generation in Waveform Space

Modern audio generation predominantly relies on latent-space compression, introducing additional complexity and potential information loss. In this work, we challenge this paradigm with WavFlow, a framework that generates high-fidelity audio directly in raw waveform space without intermediate repres…
Feiyan Zhou, Luyuan Wang, Shoufa Chen, Zhe Wang, Zhiheng Liu, Yuren Cong, Xiaohui Zhang, Fanny Yang, Belinda Zeng · 📄 PDF
📅 2026-05-18 cs.CR cs.LG

Learning to Look Benign: Targeted Evasion of Malware Detectors via API Import Injection

Machine learning-based malware detectors are widely deployed in antivirus and endpoint detection systems, yet their reliance on static features makes them vulnerable to adversarial manipulation. This paper investigates whether a malware sample can be intentionally misclassified as a specific benign …
Juozas Dautartas, Olga Kurasova, Juozapas Rokas Čypas, Viktor Medvedev · 📄 PDF
📅 2026-05-18 ML

Efficient and Noise-Tolerant PAC Learning of Multiclass Linear Classifiers

Noise-tolerant PAC learning of linear models has been of central interests in machine learning community since the last century. In recent years, many computationally-efficient algorithms have been proposed for the problem of learning linear threshold functions under multiple noise models. Yet, when…
Rita Adhikari, Shiwei Zeng · 📄 PDF
📅 2026-05-18 Vision cs.LG

Better Together: Evaluating the Complementarity of Earth Embedding Models

Earth embedding models transform Earth observation data into embeddings uniquely tied to locations on the Earth's surface. These models are typically evaluated in isolation, comparing the downstream task performance across different Earth embeddings. However, spatially aligned embeddings can natural…
Thijs L van der Plas, Jacob JW Bakermans, Vishal Nedungadi, Gabrielė Tijūnaitytė, Marc Rußwurm, Ioannis N Athanasiadis · 📄 PDF
📅 2026-05-18 cond-mat.quant-gas cs.LG physics.atom-ph quant-ph

Can machine learning for quantum-gas experiments be explainable?

Virtually all aspects of many-body atomic physics are challenging: experiments are technically demanding, datasets have become enormous, and the memory and CPU requirements for classical simulation of generic quantum systems often scale exponentially with system size. Machine learning (ML) methods a…
I. B. Spielman amd J. P. Zwolak · 📄 PDF
📅 2026-05-18 ML q-bio.QM

Learning Normal Representations for Blood Biomarkers

Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risk…
Aashna P. Shah, Michelle M. Li, Yash Lal, Seffi Cohen, Liat F. Antwarg, Morgan Sanchez, James A. Diao, Chirag J. Patel, … · 📄 PDF
📅 2026-05-18 cs.CL cs.LG

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly re…
Minrui Xu, Zilin Wang, Mengyi DENG, Zhiwei Li, Zhicheng Yang, Xiao Zhu, Yinhong Liu, Boyu Zhu, Baiyu Huang, Chao Chen, H… · 📄 PDF
📅 2026-05-18 ML cs.CL

General Preference Reinforcement Learning

Post-training has split large language model (LLM) alignment into two largely disconnected tracks. Online reinforcement learning (RL) with verifiable rewards drives emergent reasoning on math and code but depends on a programmatic verifier that cannot reach open-ended tasks, while preference optimiz…
Muhammad Umer, Muhammad Ahmed Mohsin, Ahsan Bilal, Arslan Chaudhry, Andreas Haupt, Sanmi Koyejo, Emily Fox, John M. Ciof… · 📄 PDF
📅 2026-05-18 Vision cs.GR cs.LG

PIXLRelight: Controllable Relighting via Intrinsic Conditioning

We present PIXLRelight, a feed-forward approach for physically controllable single-image relighting. Existing methods either provide limited lighting control (e.g. through text or environment maps), accumulate errors when chaining inverse and forward rendering, or require costly per-image optimizati…
Miguel Farinha, Ronald Clark · 📄 PDF
📅 2026-05-18 stat.ML cs.LG math.NA math.PR

SURGE: Approximation-free Training Free Particle Filter for Diffusion Surrogate

Diffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introducing bias, high compu…
Lifu Wei, Yinuo Ren, Naichen Shi, Yiping Lu · 📄 PDF
📅 2026-05-18 cs.DC cs.LG

A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability

Pipeline parallelism is a key technique for scaling large-model training, but modern workloads exhibit runtime variability in computation and communication. Existing pipeline systems typically consume static, profiled, or adaptively generated schedules as pre-committed execution orders. When realize…
Ruitao Liu, Xinyang Tian, Shuo Chen, Tingrui Zhang, Guang Yang, Alan Zhao, Wei Xu · 📄 PDF
📅 2026-05-18 AI physics.comp-ph

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific problem is already wel…
Nithin Somasekharan, Youssef Hassan, Shiyao Lin, Gihan Panapitiya, Patrick Emami, Anurag Acharya, Sameera Horawalavithan… · 📄 PDF
📅 2026-05-18 ML cs.AI

Position: Weight Space Should Be a First-Class Generative AI Modality

Neural network checkpoints have quietly become a large-scale data resource: millions of trained weight vectors now exist, each encoding task-, domain-, and architecture-specific knowledge. This position paper argues that model checkpoints should be treated as a first-class data modality, and that ge…
Zhangyang Wang, Peihao Wang, Kai Wang · 📄 PDF
📅 2026-05-18 ML cs.AI

Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models

Credit default prediction is a tabular learning problem with severe class imbalance, heterogeneous features, and tight latency budgets. Tabular Foundation Models (TFMs) approach this problem through in-context learning, which makes their predictions sensitive to how the context window is built. We b…
Aditya Tanna, Mitul Solanki, Mohamed Bouadi, Nassim Bouarour, Pratinav Seth, Vinay Kumar Sankarapu · 📄 PDF
📅 2026-05-18 ML cs.AI cs.CL

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific a…
Xingtai Lv, Li Sheng, Kaiyan Zhang, Yichen You, Siyan Gao, Xueheng Luo, Yuxin Zuo, Yuchen Fan, Junlin Yang, Ganqu Cui, B… · 📄 PDF
📅 2026-05-18 ML cs.AI cs.CL

An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

Central to human-aligned AI is understanding the benefits of human-elicited labels over synthetic alternatives. While human soft-labels improve calibration by capturing uncertainty, prior studies conflate these benefits with the implicit correction of mislabeled data (mode shifts), obscuring true ef…
Maja Pavlovic, Silviu Paun, Massimo Poesio · 📄 PDF
📅 2026-05-18 ML cs.AI

Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

A fraud scorer needs to answer in under 2 ms. The best tabular foundation models (TFMs) take 151-1,275 ms on GPU. We close this gap by distilling the TFM offline into an XGBoost or CatBoost student that runs natively on CPU. The central obstacle is specific to in-context learning (ICL) teachers: the…
Aditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay kumar Sankarapu, Pratinav Seth · 📄 PDF
📅 2026-05-18 stat.ML cs.AI cs.LG stat.ME

Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for differentially private (DP) federated M estimation. The two standard m…
Arnab Auddy, Xiangni Peng, Subhadeep Paul · 📄 PDF
📅 2026-05-18 ML cs.AI

KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

Time Series Foundation Models (TSFMs) have demonstrated notable success in general-purpose forecasting tasks; however, their adaptation to specialized classification problems remains constrained by the computational bottleneck of standard attention and the systematic omission of classical statistica…
Luis Balderas, José Alberto Rodríguez, Miguel Lastra, Antonio Arauzo-Azofra, José M. Benítez · 📄 PDF
📅 2026-05-18 AI

AI for Auto-Research: Roadmap & User Guide

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity…
Lingdong Kong, Xian Sun, Wei Chow, Linfeng Li, Kevin Qinghong Lin, Xuan Billy Zhang, Song Wang, Rong Li, Qing Wu, Wei Ga… · 📄 PDF
📅 2026-05-18 AI cs.CL cs.LG

GIM: Evaluating models via tasks that integrate multiple cognitive domains

As LLM benchmarks saturate, the evaluation community has pursued two strategies to increase difficulty: escalating knowledge demands (GPQA, HLE) or removing knowledge entirely in favor of abstract reasoning (ARC-AGI). The first conflates memorization with capability; the second divorces reasoning fr…
Rohit Patel, Alexandre Rezende, Steven McClain · 📄 PDF
📅 2026-05-18 AI

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

This position paper argues that enforcing LLM agent safety within a single abstraction layer is not merely suboptimal but categorically insufficient for deployed LLM agents -- a structural consequence of how agent execution works, not a contingent limitation of current systems. The three dimensions …
S. Bensalem, Y. Dong, M. Franzle, X. Huang, J. Kroger, D. Nickovic, A. Nouri, R. Roy, C. Wu · 📄 PDF
📅 2026-05-18 AI

Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning

Generalized planning aims to learn policies that generalize across collections of instances within a classical planning domain. Recent Graph Neural Network (GNN) approaches have learned nearly perfect policies for several domains. This work improves on the recently published idea of Iterated Width (…
Michael Aichmüller, Simon Ståhlberg, Martin Funkquist, Hector Geffner · 📄 PDF
📅 2026-05-18 ML cs.AI

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online methods bridges these domains but suffers from distribution dr…
Qisai Liu, Zhanhong Jiang, Joshua Russell Waite, Aditya Balu, Cody Fleming, Soumik Sarkar · 📄 PDF
📅 2026-05-18 Vision cs.AI

Lance: Unified Multimodal Modeling by Multi-Task Synergy

We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collabor…
Fengyi Fu, Mengqi Huang, Shaojin Wu, Yunsheng Jiang, Yufei Huo, Hao Li, Yinghang Song, Fei Ding, Jianzhu Guo, Qian He, Z… · 📄 PDF
📅 2026-05-18 AI cs.LG

Learning Quantifiable Visual Explanations Without Ground-Truth

Explainable AI (XAI) techniques are increasingly important for the validation and responsible use of modern deep learning models, but are difficult to evaluate due to the lack of good ground-truth to compare against. We propose a framework that serves as a quantifiable metric for the quality of XAI …
Amritpal Singh, Andrey Barsky, Mohamed Ali Souibgui, Ernest Valveny, Dimosthenis Karatzas · 📄 PDF
📅 2026-05-18 cs.SE cs.AI

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Legacy systems concentrate business rules, architectural decisions, and operational exceptions that often remain implicit in code, data, configuration, and maintenance practices. At the same time, language-model-based coding agents depend on reliable context, correctness criteria, and behavioral con…
Sanderson Oliveira de Macedo, Ronaldo Martins da Costa · 📄 PDF
📅 2026-05-18 AI math.OC

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. However, real-world environments are dynamic, with evolving business rules, previously overlooked constraints, and unforeseen perturbations. In such contexts, end …
Tinghan Ye, Arnaud Deza, Ved Mohan, El Mehdi Er Raqabi, Pascal Van Hentenryck · 📄 PDF
📅 2026-05-18 AI

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

As LLM agents are increasingly built around reusable skills, a central challenge is no longer only whether agents can use provided skills, but whether they can generate correct, reusable, and executable skills from repositories and documents. Existing benchmarks primarily evaluate the efficacy of gi…
Yifan Zhou, Zhentao Zhang, Ziming Cheng, Shuo Zhang, Qizhen Lan, Zhangquan Chen, Zhi Yang, QianyuXu, Ronghao Chen, Huaca… · 📄 PDF
📅 2026-05-18 ML cs.AI

Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

Tabular foundation models (TFMs) now match or beat tuned gradient-boosted trees on a growing fraction of tabular tasks, but no single TFM wins on every dataset. Ensembling is the go to fix here, and it works less well than expected. Six modern TFMs form a near-redundant pool: their mean pairwise Q-s…
Aditya Tanna, Yash Desai, Pratinav Seth, Mohamed Bouadi, Nassim Bouarour, Vinay Kumar Sankarapu · 📄 PDF
📅 2026-05-18 cs.DC cs.AI cs.PL

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to tradition…
Stephen Mell, David Mell, Konstantinos Kallas, Steve Zdancewic, Osbert Bastani · 📄 PDF
📅 2026-05-18 ML cs.AI

Distilling Tabular Foundation Models for Structured Health Data

Tabular foundation models (TFMs) achieve strong performance on health datasets, but their inference cost and infrastructure requirements limit practical use. We study whether their predictive behavior can be transferred to lightweight tabular models through knowledge distillation. Since in-context T…
Aditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay Kumar Sankarapu, Pratinav Seth · 📄 PDF
📅 2026-05-18 Vision cs.AI

Semantic Generative Tuning for Unified Multimodal Models

Unified multimodal models (UMMs) strive to consolidate visual understanding and visual generation within a single architecture. However, prevailing training paradigms independently optimize understanding via sparse text signals and generation through dense pixel objectives. Such a decoupled strategy…
Songsong Yu, Yuxin Chen, Ying Shan, Yanwei Li · 📄 PDF
📅 2026-05-18 Robotics cs.AI

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real…
Feng Chen, Tianzhe Chu, Li Sun, Pei Zhou, Zhuxiu Xu, Shenghua Gao, Yuexiang Zhai, Yanchao Yang, Yi Ma · 📄 PDF
📅 2026-05-18 cs.CL cs.AI cs.LG

Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency

While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a…
Matthew L. Smith, Jonathan P. Shock, Samuel T. Segun, Iyiola E. Olatunji, Tegawendé F. Bissyandé · 📄 PDF
📅 2026-05-18 AI

What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

Medicine is inherently pluralistic. Principles such as autonomy, beneficence, nonmaleficence, and justice routinely conflict, and such ethical dilemmas often sharply divide reasonable physicians. Good clinical practice navigates these tensions in concert with each patient's values rather than imposi…
Payal Chandak, Victoria Alkin, David Wu, Maya Dagan, Taposh Dutta Roy, Maria Clara Saad Menezes, Ayush Noori, Nirali Som… · 📄 PDF
📅 2026-05-18 Vision cs.AI cs.CL cs.LG

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but decisive evidence in the full image. We observe a regional-to-global perception gap: the same MLLM answers fine-grained questions more accurately when conditioned o…
Qianhao Yuan, Jie Lou, Xing Yu, Hongyu Lin, Le Sun, Xianpei Han, Yaojie Lu · 📄 PDF
📅 2026-05-18 AI

Actionable World Representation

Inspired by the emergent behaviors in large language models that generalized human intelligence, the research community is pursuing similar emergent capabilities within world models, with a emphasis on modeling the physical world. Within the scope of physical world model, objects are the fundamental…
Kunqi Xu, Jitao Li, Jianglong Ye, Tianshu Tang, Isabella Liu, Sifei Liu, Xueyan Zou · 📄 PDF
📅 2026-05-18 Vision cs.AI cs.CL cs.LG

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and func…
Yining Hong, Jiageng Liu, Han Yin, Manling Li, Leonidas Guibas, Li Fei-Fei, Jiajun Wu, Yejin Choi · 📄 PDF
📅 2026-05-18 cs.CL cs.AI

Code as Agent Harness

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substra…
Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, Zihao Li, Yuanchen Bei, Jiaru Zou, Mengting Ai, Zhining Liu, Ting-W… · 📄 PDF
📅 2026-05-18 cs.CL cs.AI cs.LG

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of relevant tokens for any …
Yuxiang Huang, Nuno M. T. Gonçalves, Federico Alvetreti, Lei Li, Xu Han, Edoardo M. Ponti, André F. T. Martins, Marcos V… · 📄 PDF