blog

Welcome to my blog!

Daily Paper | Aug 7, 2025

ab's Avatar 2025-08-07 Daily Paper

  1. 1. Qwen-Image Technical Report
  2. 2. Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
  3. 3. Agent Lightning: Train ANY AI Agents with Reinforcement Learning
  4. 4. SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
  5. 5. Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models’ Instruction Following
  6. 6. CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
  7. 7. Trainable Dynamic Mask Sparse Attention
  8. 8. TURA: Tool-Augmented Unified Retrieval Agent for AI Search

Qwen-Image Technical Report

Paper Link: https://www.alphaxiv.org/abs/2508.02324
Github Link: https://github.com/QwenLM/Qwen-Image

Alibaba Cloud’s Qwen Team developed Qwen-Image, a multimodal foundation model that advances text-to-image generation and image editing. The model delivers state-of-the-art performance in complex text rendering, especially for Chinese, and achieves high precision in various editing tasks by unifying generative and understanding capabilities.

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Project Link: https://seed.bytedance.com/en/seed_diffusion
Paper Link: https://www.alphaxiv.org/abs/2508.02193

Developed by ByteDance Seed and Tsinghua University, Seed Diffusion introduces a large-scale discrete-state diffusion model for code generation that achieves an inference speed of 2,146 tokens/second on H20 GPUs. The model maintains competitive performance across various code generation and editing benchmarks, establishing a new efficiency benchmark for code models.

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Github Link: https://github.com/microsoft/agent-lightning
Paper Link: https://www.alphaxiv.org/abs/2508.03680

Agent Lightning, developed by Microsoft Research, introduces a framework that completely decouples reinforcement learning (RL) training from AI agent execution, enabling continuous self-improvement for any LLM-based agent with minimal code modifications. It demonstrates stable and continuous performance improvement across diverse tasks, including Text-to-SQL (LangChain), Retrieval-Augmented Generation (OpenAI Agents SDK), and Math QA (AutoGen).

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

SE-Agent introduces a self-evolution framework for LLM-based agents, optimizing multi-step reasoning through iterative revision, recombination, and refinement of complete interaction trajectories. The framework consistently outperformed strong baselines on the challenging SWE-bench Verified benchmark, achieving up to 112% relative improvement and uniquely solving issues previously unaddressed by other models.

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models’ Instruction Following

Github Link: https://github.com/Rainier-rq/verl-if
Paper link: https://www.alphaxiv.org/abs/2508.02150

A self-supervised reinforcement learning framework enhances large language models’ instruction-following capabilities by leveraging internal signals, rather than external models, while preserving or improving their core reasoning performance. The approach uses an incremental curriculum and a novel reward model for both hard and soft constraints, achieving higher scores on instruction-following benchmarks and maintaining general reasoning abilities.

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper: https://www.arxiv.org/pdf/2508.03686
Github: https://github.com/open-compass/CompassVerifier

Researchers from Shanghai AI Laboratory and University of Macau introduce CompassVerifier, a lightweight model for verifying Large Language Model outputs, alongside VerifierBench, a challenging new benchmark. CompassVerifier demonstrates improved accuracy across diverse domains and answer types, and enhances reinforcement learning for LLM optimization by providing precise reward signals.


Trainable Dynamic Mask Sparse Attention

Paper: https://www.arxiv.org/pdf/2508.02124
Github: https://github.com/SmallDoges/flash-dmattn

Dynamic Mask Attention (DMA) enables Large Language Models to process significantly longer contexts by dynamically selecting relevant tokens for attention computation, achieving up to 15.5x speedup over standard attention while maintaining or improving performance on long-context benchmarks.

https://www.arxiv.org/pdf/2508.04604

TURA introduces a tool-augmented unified retrieval agent to bridge the gap between static content retrieval and dynamic information access in AI search. This framework enables the handling of real-time and transactional queries by integrating RAG with tool-augmented agents, leading to an 8.9% increase in session success rate and a 44.2% reduction in latency for complex queries in a production deployment at Baidu Inc.

本文最后更新于 天前,文中所描述的信息可能已发生改变