大模型笔记
  • Home

0.inbox

  • [Read]2025 04
  • Alg interview faq

1.[基建]数据

  • Data

3.[基建]效率

  • Inference
  • Train

4.[模型]文本

  • Embedding
  • PostTraining
  • PreTraining

5.[模型]多模态

  • MultiModalEmbedding
  • T2V
  • VLA
  • VLM
    • 学习资料
    • 开源项目
      • Qwen
      • Deepseek
    • 核心模块
      • Encoder-Decoder

6.[模型]评测

  • Benchmark

7.[应用]产品

  • Agent
  • Product
  • VibeCoding
大模型笔记
  • 5.[模型]多模态
  • VLM

多模态大模型#

学习资料#

开源项目#

  • LLaVA
    • [2023.10] Improved Baselines with Visual Instruction Tuning
    • [2023.04] Visual Instruction Tuning
  • minimind-v 极简vlm模型
  • [2025.01] MiniMax-01: Scaling Foundation Models with Lightning Attention
    • MiniMax-AI/MiniMax-01
  • [2025.04] Kimi-VL Technical Report
    • MoonshotAI/Kimi-VL

Qwen#

  • Qwen-VL
    • [2023.08] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Deepseek#

  • DeepSeek-VL

核心模块#

Encoder-Decoder#

  • [2021.02] Learning Transferable Visual Models From Natural Language Supervision
    • openai-clip
    • clip-vit-base-patch16
Previous Next

Built with MkDocs using a theme provided by Read the Docs.
« Previous Next »