On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
R-Zero: Self-Evolving Reasoning LLM from Zero Data
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Learning to Reason for Factuality
Self-Questioning Language Models
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
本文最后更新于 天前,文中所描述的信息可能已发生改变