2025-04#
文本#
- [2025.04] OTC: Optimal Tool Calls via Reinforcement Learning
- [2025.04] CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
- [2025.04] VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
- [2025.03] DAPO: An Open-Source LLM Reinforcement Learning System at Scale
- Dialogue Natural Language Inference
- Generative Recommendation: Towards Next-generation Recommender Paradigm
- Observational Scaling Laws and the Predictability of Language Model Performance
- [todo] LLM Research Papers: The 2024 List
- 系统论述:构建高性能 Prompt 之路——结构化 Prompt
- https://www.blog.chai-research.com/post/chai-gpt-rlhf-part-i-reward-modelling chai的rl
- https://github.com/volcengine/verl 强化学习框架
- 【论文解读】MTP:让LLM一次性预测多个token 一个emb出多个head
- 【论文解读】SPCT:DeepSeek 的「通用」奖励模型训练方法
语音#
- 语音大模型概述(持续更新中2025.03) 总结得很好。需要了解hifi-gan、RVQ
- [2025.02]Recent Advances in Speech Language Models: A Survey 还没看,收藏
- Baichuan-Audio:端到端音频大模型,实时双语对话+语音生成
图像#
- MiniMax-AI/One-RL-to-See-Them-All
- ViT(Vision Transformer)解析
- https://paperswithcode.com/sota/image-classification-on-imagenet 图像分类的benchmark
应用#
- Agent 要被吃进大模型了 同时做底模和RL更能做好agent
- GenAI网页数据2024年度报告
- https://pippit.capcut.com/
- https://ai-2027.com/
- 谷歌Agent2Agent (A2A) 协议技术细节分析,包括其与 MCP 关系
- 大模型上下文协议——MCP详解