MLNLP社群是國內外知名的機器學習與自然語言處理社群，受眾覆蓋國內外NLP碩博生、高校老師以及企業研究人員。

社群的願景是促進國內外自然語言處理，機器學習學術界、產業界和廣大愛好者之間的交流和進步，特別是初學者同學們的進步。

來源 | 深度學習自然語言處理

專案地址：https://github.com/InterviewReady/ai-engineering-resources

Tokenization 分詞處理

Byte-pair Encodinghttps://arxiv.org/pdf/1508.07909
Byte Latent Transformer: Patches Scale Better Than Tokenshttps://arxiv.org/pdf/2412.09871

Vectorization 向量化處理

BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinghttps://arxiv.org/pdf/1810.04805
IMAGEBIND: One Embedding Space To Bind Them Allhttps://arxiv.org/pdf/2305.05665
SONAR: Sentence-Level Multimodal and Language-Agnostic Representationshttps://arxiv.org/pdf/2308.11466
FAISS libraryhttps://arxiv.org/pdf/2401.08281
Facebook Large Concept Modelshttps://arxiv.org/pdf/2412.08821v2

Infrastructure 基礎設施

TensorFlowhttps://arxiv.org/pdf/1605.08695
Deepseek filesystemhttps://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md
Milvus DBhttps://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf
Billion Scale Similarity Search : FAISShttps://arxiv.org/pdf/1702.08734
Rayhttps://arxiv.org/abs/1712.05889

Core Architecture 核心架構

Attention is All You Needhttps://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
FlashAttentionhttps://arxiv.org/pdf/2205.14135
Multi Query Attentionhttps://arxiv.org/pdf/1911.02150
Grouped Query Attentionhttps://arxiv.org/pdf/2305.13245
Google Titans outperform Transformershttps://arxiv.org/pdf/2501.00663
VideoRoPE: Rotary Position Embeddinghttps://arxiv.org/pdf/2502.05173

Mixture of Experts 專家混合模型

Deep Reinforcement Learning with Human Feedbackhttps://arxiv.org/pdf/1706.03741
Fine-Tuning Language Models with RHLFhttps://arxiv.org/pdf/1909.08593
Training language models with RHLFhttps://arxiv.org/pdf/2203.02155

Chain of Thought 思維鏈

Chain-of-Thought Prompting Elicits Reasoning in Large Language Modelshttps://arxiv.org/pdf/2201.11903
Chain of thoughthttps://arxiv.org/pdf/2411.14405v1/
Demystifying Long Chain-of-Thought Reasoning in LLMshttps://arxiv.org/pdf/2502.03373

Reasoning 推理

Transformer Reasoning Capabilitieshttps://arxiv.org/pdf/2405.18512
Large Language Monkeys: Scaling Inference Compute with Repeated Samplinghttps://arxiv.org/pdf/2407.21787
Scale model test times is better than scaling parametershttps://arxiv.org/pdf/2408.03314
Training Large Language Models to Reason in a Continuous Latent Spacehttps://arxiv.org/pdf/2412.06769
DeepSeek R1https://arxiv.org/pdf/2501.12948v1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methodshttps://arxiv.org/pdf/2502.01618
Latent Reasoning: A Recurrent Depth Approachhttps://arxiv.org/pdf/2502.05171
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlohttps://arxiv.org/pdf/2504.13139

Optimizations 最佳化方案

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bitshttps://arxiv.org/pdf/2402.17764
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionhttps://arxiv.org/pdf/2407.08608
ByteDance 1.58https://arxiv.org/pdf/2412.18653v1
Transformer Squarehttps://arxiv.org/pdf/2501.06252
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Stepshttps://arxiv.org/pdf/2501.09732
1b outperforms 405bhttps://arxiv.org/pdf/2502.06703
Speculative Decodinghttps://arxiv.org/pdf/2211.17192

Distillation 蒸餾

Distilling the Knowledge in a Neural Networkhttps://arxiv.org/pdf/1503.02531
BYOL – Distilled Architecturehttps://arxiv.org/pdf/2006.07733
DINOhttps://arxiv.org/pdf/2104.14294

SSMs 狀態空間模型

RWKV: Reinventing RNNs for the Transformer Erahttps://arxiv.org/pdf/2305.13048
Mambahttps://arxiv.org/pdf/2312.00752
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Dualityhttps://arxiv.org/pdf/2405.21060
Distilling Transformers to SSMshttps://arxiv.org/pdf/2408.10189
LoLCATs: On Low-Rank Linearizing of Large Language Modelshttps://arxiv.org/pdf/2410.10254
Think Slow, Fasthttps://arxiv.org/pdf/2502.20339

Competition Models 競賽模型

Google Math Olympiad 2https://arxiv.org/pdf/2502.03544
Competitive Programming with Large Reasoning Modelshttps://arxiv.org/pdf/2502.06807
Google Math Olympiad 1https://www.nature.com/articles/s41586-023-06747-5

Hype Makers

Can AI be made to think criticallyhttps://arxiv.org/pdf/2501.04682
Evolving Deeper LLM Thinkinghttps://arxiv.org/pdf/2501.09891
LLMs Can Easily Learn to Reason from Demonstrations Structurehttps://arxiv.org/pdf/2502.07374

Hype Breakers

Separating communication from intelligencehttps://arxiv.org/pdf/2301.06627
Language is not intelligencehttps://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf

Image Transformers 影像轉換器

Image is 16×16 wordhttps://arxiv.org/pdf/2010.11929
CLIPhttps://arxiv.org/pdf/2103.00020
deepseek image generationhttps://arxiv.org/pdf/2501.17811

Video Transformers 影片轉換器

ViViT: A Video Vision Transformerhttps://arxiv.org/pdf/2103.15691
Joint Embedding abstractions with self-supervised video maskshttps://arxiv.org/pdf/2404.08471
Facebook VideoJAM ai genhttps://arxiv.org/pdf/2502.02492

Case Studies 案例分析

Automated Unit Test Improvement using Large Language Models at Metahttps://arxiv.org/pdf/2402.09171
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answeringhttps://arxiv.org/pdf/2404.17723v1
OpenAI o1 System Cardhttps://arxiv.org/pdf/2412.16720
LLM-powered bug catchershttps://arxiv.org/pdf/2501.12862
Chain-of-Retrieval Augmented Generationhttps://arxiv.org/pdf/2501.14342
Swiggy Searchhttps://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6
Swarm by OpenAIhttps://github.com/openai/swarm
Netflix Foundation Modelshttps://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39
Model Context Protocolhttps://www.anthropic.com/news/model-context-protocol
uber queryGPThttps://www.uber.com/en-IN/blog/query-gpt/