大模型笔记
  • Home

0.inbox

  • [Read]2025 04
  • Alg interview faq

1.[基建]数据

  • Data

3.[基建]效率

  • Inference
  • Train

4.[模型]文本

  • Embedding
  • PostTraining
  • PreTraining

5.[模型]多模态

  • MultiModalEmbedding
  • T2V
  • VLA
    • 开源工作
    • 数据集
  • VLM

6.[模型]评测

  • Benchmark

7.[应用]产品

  • Agent
  • Product
  • VibeCoding
大模型笔记
  • 5.[模型]多模态
  • VLA

VLA#

开源工作#

  • [2025.06] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
  • [2025.06] Unified Vision-Language-Action Model
  • [2025.05] Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
  • [2025.03] CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
    • https://cot-vla.github.io/
  • [2024.06] OpenVLA: An Open-Source Vision-Language-Action Model
    • https://github.com/openvla/openvla
  • [2024.03] Octo: An Open-Source Generalist Robot Policy
    • https://octo-models.github.io/
  • [2023.07] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
  • [2023.03] PaLM-E: An Embodied Multimodal Language Model
  • [2023.03] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
  • [2022.12] RT-1: Robotics Transformer for Real-World Control at Scale
  • [2022.08] Do As I Can, Not As I Say:Grounding Language in Robotic Affordances
    • https://say-can.github.io/
  • [2022.05] Gato: A Generalist Agent

数据集#

  • [2023.10] Open X-Embodiment: Robotic Learning Datasets and RT-X Models
    • https://robotics-transformer-x.github.io/
Previous Next

Built with MkDocs using a theme provided by Read the Docs.
« Previous Next »