Qwen-Image Technical Report
Paper Link: https://www.alphaxiv.org/abs/2508.02324
Github Link: https://github.com/QwenLM/Qwen-Image
Alibaba Cloud’s Qwen Team developed Qwen-Image, a multimodal foundation model that advances text-to-image generation and image editing. The model delivers state-of-the-art performance in complex text rendering, especially for Chinese, and achieves high precision in various editing tasks by unifying generative and understanding capabilities.
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Project Link: https://seed.bytedance.com/en/seed_diffusion
Paper Link: https://www.alphaxiv.org/abs/2508.02193
Developed by ByteDance Seed and Tsinghua University, Seed Diffusion introduces a large-scale discrete-state diffusion model for code generation that achieves an inference speed of 2,146 tokens/second on H20 GPUs. The model maintains competitive performance across various code generation and editing benchmarks, establishing a new efficiency benchmark for code models.
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Github Link: https://github.com/microsoft/agent-lightning
Paper Link: https://www.alphaxiv.org/abs/2508.03680
Agent Lightning, developed by Microsoft Research, introduces a framework that completely decouples reinforcement learning (RL) training from AI agent execution, enabling continuous self-improvement for any LLM-based agent with minimal code modifications. It demonstrates stable and continuous performance improvement across diverse tasks, including Text-to-SQL (LangChain), Retrieval-Augmented Generation (OpenAI Agents SDK), and Math QA (AutoGen).
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
SE-Agent introduces a self-evolution framework for LLM-based agents, optimizing multi-step reasoning through iterative revision, recombination, and refinement of complete interaction trajectories. The framework consistently outperformed strong baselines on the challenging SWE-bench Verified benchmark, achieving up to 112% relative improvement and uniquely solving issues previously unaddressed by other models.
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models’ Instruction Following
Github Link: https://github.com/Rainier-rq/verl-if
Paper link: https://www.alphaxiv.org/abs/2508.02150
A self-supervised reinforcement learning framework enhances large language models’ instruction-following capabilities by leveraging internal signals, rather than external models, while preserving or improving their core reasoning performance. The approach uses an incremental curriculum and a novel reward model for both hard and soft constraints, achieving higher scores on instruction-following benchmarks and maintaining general reasoning abilities.
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Paper: https://www.arxiv.org/pdf/2508.03686
Github: https://github.com/open-compass/CompassVerifier
Researchers from Shanghai AI Laboratory and University of Macau introduce CompassVerifier, a lightweight model for verifying Large Language Model outputs, alongside VerifierBench, a challenging new benchmark. CompassVerifier demonstrates improved accuracy across diverse domains and answer types, and enhances reinforcement learning for LLM optimization by providing precise reward signals.
Trainable Dynamic Mask Sparse Attention
Paper: https://www.arxiv.org/pdf/2508.02124
Github: https://github.com/SmallDoges/flash-dmattn
Dynamic Mask Attention (DMA) enables Large Language Models to process significantly longer contexts by dynamically selecting relevant tokens for attention computation, achieving up to 15.5x speedup over standard attention while maintaining or improving performance on long-context benchmarks.
TURA: Tool-Augmented Unified Retrieval Agent for AI Search
https://www.arxiv.org/pdf/2508.04604
TURA introduces a tool-augmented unified retrieval agent to bridge the gap between static content retrieval and dynamic information access in AI search. This framework enables the handling of real-time and transactional queries by integrating RAG with tool-augmented agents, leading to an 8.9% increase in session success rate and a 44.2% reduction in latency for complex queries in a production deployment at Baidu Inc.