Agent#
相关工作#
- [2026.01] Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Agent 在命令行环境的 benchmark
- [2026.02] SWE-Universe: Scale Real-World Verifiable Environments to Millions 大规模真实世界可验证的软件开发环境
- [2026.02] GLM-5: from Vibe Coding to Agentic Engineering 从 Vibe Coding 到智能体工程
- [2026.01] When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail
- [2025.11] [Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory]https://arxiv.org/abs/2511.20857)
- [2025.09] The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
- [2025.09] Effective context engineering for AI agents anthropic,
- [2025.08] A SURVEY OF SELF-EVOLVING AGENTS: ON PATH TOARTIFICIAL SUPER INTELLIGENCE
- [2025.05] Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration system1和system2
- [2025.03] Why Do Multi-Agent LLM Systems Fail?
- [2024.10] AutoGLM: Autonomous Foundation Agents for GUIs 三个insight,中间接口设计、自进化的课程RL、策略分布漂移
- AutoGLM 演示视频
- [2024.03] 深度长文』吴恩达:AI Agent 4种最常见的设计模式 reflection、tool use、planning、multi-agent collaboration
- [2024.02] Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models 跟deepresearch有点像了
- [2024.01] Agent AI: Surveying the Horizons of Multimodal Interaction
- [《Agent AI:多模态交互前沿调查》-- 李飞飞团队]((https://zhuanlan.zhihu.com/p/12759357195)
- [2023.12] An LLM Compiler for Parallel Function Calling
- [2023.03] Reflexion: Language Agents with Verbal Reinforcement Learning
- [2022.10] ReAct: Synergizing Reasoning and Acting in Language Models ReAct,Google,query、think、action、result。
个性化#
记忆#
开源项目#
- mem0ai/mem0 agent的记忆层
- Perplexica ai搜索引擎
- bytedance/deer-flow