Daily Paper | Aug 2, 2025

Table of Content

Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
SWE-Exp: Experience-Driven Software Issue Resolution
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Viser: Imperative, Web-based 3D Visualization in Python
UserBench: An Interactive Gym Environment for User-Centric Agents
Three-loop banana integrals with four unequal masses
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Github Link: https://github.com/safety-research/persona_vectors.
Paper Link: https://www.alphaxiv.org/abs/2507.21509

The paper introduces persona vectors, which are identified as linear directions within a language model’s activation space, representing distinct character traits like “evil,” “sycophancy,” or “propensity to hallucinate.” An automated pipeline generates these vectors from natural language descriptions, enabling the monitoring of personality shifts during a model’s deployment and prediction of behavioral changes during its training. The research demonstrates that these vectors can be used to mitigate undesirable persona shifts through “steering” interventions—either by subtracting the persona vector after finetuning or by applying preventative steering during the finetuning process itself. Additionally, the study shows that analyzing training data through the lens of persona vectors can help flag problematic datasets or individual samples that might induce unintended persona shifts, even those that traditional filtering methods miss.

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Paper Link: https://arxiv.org/abs/2507.23785
Github Link: https://github.com/ForeverFancy/gvfdiffusion

The paper from the University of Science and Technology of China and Microsoft Research Asia introduces Gaussian Variation Field Diffusion (GVFDiffusion), a framework that enables high-fidelity 4D content generation from a single video input. This is achieved by creating a canonical 3D Gaussian Splatting representation and generating its temporal variations via a compact latent diffusion model, resulting in significantly faster generation times and improved quality compared to prior methods.

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Researchers from Shanghai Jiao Tong University and Huawei developed SWE-Debate, a multi-agent debate framework leveraging graph-guided localization to resolve software issues. The system achieved a 41.4% Pass@1 success rate on the SWE-Bench-Verified dataset and 81.67% file-level localization accuracy on the SWE-Bench-Lite dataset.

SWE-Exp: Experience-Driven Software Issue Resolution

SWE-Exp introduces an experience-enhanced framework that enables Large Language Model agents to learn from past software issue resolution attempts, achieving a Pass@1 score of 41.6% on the SWE-bench-Verified dataset. It systematically captures and reuses knowledge via a multi-faceted experience bank and a dual-agent architecture, transforming agents from memoryless explorers into strategic, experience-driven problem solvers.

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Github Link: https://github.com/tiiuae/falcon-h1
Paper Link: https://www.alphaxiv.org/abs/2507.22448

The Falcon LLM Team at the Technology Innovation Institute introduces Falcon-H1, a series of hybrid-head language models that integrate Transformer attention with Mamba-2 SSMs, achieving strong performance across various tasks while demonstrating enhanced parameter and training efficiency. The models set new benchmarks for efficiency and capability, particularly in reasoning-intensive domains and long-context processing, often matching or exceeding larger models.

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Github Link: https://github.com/alibaba-damo-academy/VL-Cogito
Paper Link: https://www.alphaxiv.org/abs/2507.22607

Researchers from Alibaba’s DAMO Academy and Fudan University developed VL-Cogito, a multimodal large language model, using a Progressive Curriculum Reinforcement Learning framework. This approach systematically enhances the model’s ability to perform complex multimodal reasoning and adaptively adjust its reasoning length, achieving competitive performance across diverse benchmarks in mathematics, science, and general understanding.

3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

Github Link: https://github.com/AIGeeksGroup/3D-R1
Paper Link: https://www.alphaxiv.org/abs/2507.23478

A generalist 3D Vision-Language Model, 3D-R1, combines cold-start initialization with reinforcement learning and dynamic view selection to enhance reasoning for unified scene understanding. It achieves state-of-the-art performance across seven distinct 3D vision-language tasks, demonstrating an average improvement of 10% over prior methods.

Viser: Imperative, Web-based 3D Visualization in Python

Project Link: https://viser.studio/main/
Paper Link: https://www.alphaxiv.org/abs/2507.22885

Viser is a Python library that offers imperative, web-based 3D visualization, addressing the need for a versatile tool that bridges the gap between lightweight and domain-specific visualization solutions. The system provides comprehensive 3D scene and 2D GUI primitives, supports real-time data streaming, and has been adopted across various computer vision and robotics research areas, including as a foundational component for neural rendering frameworks.

UserBench: An Interactive Gym Environment for User-Centric Agents

Github Link: https://github.com/SalesforceAIResearch/UserBench
Paper Link: https://www.alphaxiv.org/abs/2507.22034

UserBench is an interactive Gym environment and benchmark that evaluates LLM-based agents on their ability to understand and align with user needs, particularly when instructions are underspecified, incremental, or indirect, using travel planning scenarios. Evaluations with leading LLMs revealed that models struggle significantly with user preference elicitation and making optimal, user-aligned decisions, even while demonstrating competence in tool use.

Three-loop banana integrals with four unequal masses

Researchers at the Bethe Center, Universität Bonn, constructed the first complete system of canonical differential equations for the master integrals of the three-loop banana diagram with four distinct unequal masses in D=2-2epsilon dimensions. This work identifies that the complex integrals can be expressed using known K3 periods and only two new fundamental iterated integrals, enabling full analytic results for precision calculations.

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

The paper is similar to second-order flow matching.

Github Link: https://github.com/gmongaras/On-the-Expressiveness-of-Softmax-Attention-A-Recurrent-Neural-Network-Perspective

Paper Link: https://www.alphaxiv.org/abs/2507.23632

This paper, a preprint by Gabriel Mongaras and Eric C. Larson from Southern Methodist University, investigates the expressiveness of softmax attention within transformer architectures. The authors propose a novel recurrent neural network (RNN) perspective on softmax attention, demonstrating that linear attention can be understood as a first-order approximation of its more complex counterpart. They achieve this by deriving a recurrent form of softmax attention using a Taylor series expansion and then empirically evaluating this formulation against traditional softmax and various linear attention methods. The research also reinterprets the softmax denominator as either a gate or a norm, with experiments indicating that a vector norm most accurately replicates softmax’s behavior, ultimately aiming to explain why softmax attention consistently outperforms linear attention in various downstream tasks.

Unveiling Super Experts in Mixture-of-Experts Large Language Models

Github Link: https://github.com/ZunhaiSu/Super-Experts-Profilling
Paper Link: https://www.alphaxiv.org/abs/2507.23279

Researchers from Tsinghua University and Meituan identified “Super Experts” (SEs) in Mixture-of-Experts Large Language Models, a tiny subset of experts (less than 0.5%) that are mechanistically responsible for inducing massive activations and crucial attention sinks. Removing these SEs leads to a catastrophic collapse in model performance, particularly for reasoning tasks, highlighting their indispensable role in maintaining core LLM capabilities.

Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling

Github Link: https://github.com/bytedance/trae-agent
Paper Link: https://www.alphaxiv.org/abs/2507.23370

Software issue resolution is a critical challenge in software engineering and has garnered increasing attention in recent years. With the rapid advancement of large language models (LLMs), substantial progress has been made in addressing real-world software engineering tasks. Recent studies have introduced ensemble reasoning techniques to enhance the performance of LLM-based issue resolution. However, existing prompting-based methods still face limitations in effectively exploring large ensemble spaces and lack the capacity for repository-level understanding, both of which constrain their overall effectiveness. In this paper, we propose Trae Agent, the first agent-based ensemble reasoning approach for repository-level issue resolution. Trae Agent formulates our goal as an optimal solution search problem and addresses two key challenges, i.e., large ensemble spaces and repository-level understanding, through modular agents for generation, pruning, and selection. We conduct extensive experiments using three leading LLMs on the widely-adopted SWE-bench benchmark, comparing Trae Agent against four state-of-the-art ensemble reasoning techniques. Experimental results demonstrate that Trae Agent consistently achieves superior performance, with an average improvement of 10.22% over all baselines in terms of Pass@1. Trae Agent has achieved first place on the SWE-bench Verified leaderboard, with a notable Pass@1 score of 75.20%.

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models

Github Link: https://github.com/lavoiems/DiscreteLatentCode
Paper Link: https://www.alphaxiv.org/abs/2507.12318

Researchers at Mila, Université de Montréal, introduce Discrete Latent Codes (DLCs) as a conditioning representation for diffusion models, which enables state-of-the-art unconditional image generation on ImageNet (FID 1.59) and facilitates diverse, compositional image synthesis. The method also enables an efficient text-to-image pipeline that leverages pre-trained language models to generate DLCs, requiring significantly less training data than end-to-end models.