14小時近500Star!快速進階LLM/AI的必讀系列

MLNLP社群是國內外知名的機器學習與自然語言處理社群,受眾覆蓋國內外NLP碩博生、高校老師以及企業研究人員。
社群的願景是促進國內外自然語言處理,機器學習學術界、產業界和廣大愛好者之間的交流和進步,特別是初學者同學們的進步。
來源 | 深度學習自然語言處理

專案地址:https://github.com/InterviewReady/ai-engineering-resources
Tokenization 分詞處理
  • Byte-pair Encodinghttps://arxiv.org/pdf/1508.07909
  • Byte Latent Transformer: Patches Scale Better Than Tokenshttps://arxiv.org/pdf/2412.09871
Vectorization 向量化處理
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinghttps://arxiv.org/pdf/1810.04805
  • IMAGEBIND: One Embedding Space To Bind Them Allhttps://arxiv.org/pdf/2305.05665
  • SONAR: Sentence-Level Multimodal and Language-Agnostic Representationshttps://arxiv.org/pdf/2308.11466
  • FAISS libraryhttps://arxiv.org/pdf/2401.08281
  • Facebook Large Concept Modelshttps://arxiv.org/pdf/2412.08821v2
    Infrastructure 基礎設施
  • TensorFlowhttps://arxiv.org/pdf/1605.08695
  • Deepseek filesystemhttps://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md
  • Milvus DBhttps://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf
  • Billion Scale Similarity Search : FAISShttps://arxiv.org/pdf/1702.08734
  • Rayhttps://arxiv.org/abs/1712.05889
    Core Architecture 核心架構
  • Attention is All You Needhttps://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
  • FlashAttentionhttps://arxiv.org/pdf/2205.14135
  • Multi Query Attentionhttps://arxiv.org/pdf/1911.02150
  • Grouped Query Attentionhttps://arxiv.org/pdf/2305.13245
  • Google Titans outperform Transformershttps://arxiv.org/pdf/2501.00663
  • VideoRoPE: Rotary Position Embeddinghttps://arxiv.org/pdf/2502.05173
    Mixture of Experts 專家混合模型
  • Sparsely-Gated Mixture-of-Experts Layerhttps://arxiv.org/pdf/1701.06538
  • GShardhttps://arxiv.org/abs/2006.16668
  • Switch Transformershttps://arxiv.org/abs/2101.03961
    RLHF 基於人類反饋的強化學習
  • Deep Reinforcement Learning with Human Feedbackhttps://arxiv.org/pdf/1706.03741
  • Fine-Tuning Language Models with RHLFhttps://arxiv.org/pdf/1909.08593
  • Training language models with RHLFhttps://arxiv.org/pdf/2203.02155
    Chain of Thought 思維鏈
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Modelshttps://arxiv.org/pdf/2201.11903
  • Chain of thoughthttps://arxiv.org/pdf/2411.14405v1/
  • Demystifying Long Chain-of-Thought Reasoning in LLMshttps://arxiv.org/pdf/2502.03373
    Reasoning 推理
  • Transformer Reasoning Capabilitieshttps://arxiv.org/pdf/2405.18512
  • Large Language Monkeys: Scaling Inference Compute with Repeated Samplinghttps://arxiv.org/pdf/2407.21787
  • Scale model test times is better than scaling parametershttps://arxiv.org/pdf/2408.03314
  • Training Large Language Models to Reason in a Continuous Latent Spacehttps://arxiv.org/pdf/2412.06769
  • DeepSeek R1https://arxiv.org/pdf/2501.12948v1
  • A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methodshttps://arxiv.org/pdf/2502.01618
  • Latent Reasoning: A Recurrent Depth Approachhttps://arxiv.org/pdf/2502.05171
  • Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlohttps://arxiv.org/pdf/2504.13139
    Optimizations 最佳化方案
  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bitshttps://arxiv.org/pdf/2402.17764
  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionhttps://arxiv.org/pdf/2407.08608
  • ByteDance 1.58https://arxiv.org/pdf/2412.18653v1
  • Transformer Squarehttps://arxiv.org/pdf/2501.06252
  • Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Stepshttps://arxiv.org/pdf/2501.09732
  • 1b outperforms 405bhttps://arxiv.org/pdf/2502.06703
  • Speculative Decodinghttps://arxiv.org/pdf/2211.17192
    Distillation 蒸餾
  • Distilling the Knowledge in a Neural Networkhttps://arxiv.org/pdf/1503.02531
  • BYOL – Distilled Architecturehttps://arxiv.org/pdf/2006.07733
  • DINOhttps://arxiv.org/pdf/2104.14294
    SSMs 狀態空間模型
  • RWKV: Reinventing RNNs for the Transformer Erahttps://arxiv.org/pdf/2305.13048
  • Mambahttps://arxiv.org/pdf/2312.00752
  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Dualityhttps://arxiv.org/pdf/2405.21060
  • Distilling Transformers to SSMshttps://arxiv.org/pdf/2408.10189
  • LoLCATs: On Low-Rank Linearizing of Large Language Modelshttps://arxiv.org/pdf/2410.10254
  • Think Slow, Fasthttps://arxiv.org/pdf/2502.20339
    Competition Models 競賽模型
  • Google Math Olympiad 2https://arxiv.org/pdf/2502.03544
  • Competitive Programming with Large Reasoning Modelshttps://arxiv.org/pdf/2502.06807
  • Google Math Olympiad 1https://www.nature.com/articles/s41586-023-06747-5
    Hype Makers
  • Can AI be made to think criticallyhttps://arxiv.org/pdf/2501.04682
  • Evolving Deeper LLM Thinkinghttps://arxiv.org/pdf/2501.09891
  • LLMs Can Easily Learn to Reason from Demonstrations Structurehttps://arxiv.org/pdf/2502.07374
    Hype Breakers
  • Separating communication from intelligencehttps://arxiv.org/pdf/2301.06627
  • Language is not intelligencehttps://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf
    Image Transformers 影像轉換器
  • Image is 16×16 wordhttps://arxiv.org/pdf/2010.11929
  • CLIPhttps://arxiv.org/pdf/2103.00020
  • deepseek image generationhttps://arxiv.org/pdf/2501.17811
    Video Transformers 影片轉換器
  • ViViT: A Video Vision Transformerhttps://arxiv.org/pdf/2103.15691
  • Joint Embedding abstractions with self-supervised video maskshttps://arxiv.org/pdf/2404.08471
  • Facebook VideoJAM ai genhttps://arxiv.org/pdf/2502.02492
    Case Studies 案例分析
  • Automated Unit Test Improvement using Large Language Models at Metahttps://arxiv.org/pdf/2402.09171
  • Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answeringhttps://arxiv.org/pdf/2404.17723v1
  • OpenAI o1 System Cardhttps://arxiv.org/pdf/2412.16720
  • LLM-powered bug catchershttps://arxiv.org/pdf/2501.12862
  • Chain-of-Retrieval Augmented Generationhttps://arxiv.org/pdf/2501.14342
  • Swiggy Searchhttps://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6
  • Swarm by OpenAIhttps://github.com/openai/swarm
  • Netflix Foundation Modelshttps://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39
  • Model Context Protocolhttps://www.anthropic.com/news/model-context-protocol
  • uber queryGPThttps://www.uber.com/en-IN/blog/query-gpt/

技術交流群邀請函

△長按新增小助手
掃描二維碼新增小助手微信
請備註:姓名-學校/公司-研究方向
(如:小張-哈工大-對話系統)
即可申請加入自然語言處理/Pytorch等技術交流群

關於我們

MLNLP 社群是由國內外機器學習與自然語言處理學者聯合構建的民間學術社群,目前已經發展為國內外知名的機器學習與自然語言處理社群,旨在促進機器學習,自然語言處理學術界、產業界和廣大愛好者之間的進步。
社群可以為相關從業者的深造、就業及研究等方面提供開放交流平臺。歡迎大家關注和加入我們。


相關文章