
來源 | 深度學習自然語言處理
Vectorization 向量化處理

專案地址:https://github.com/InterviewReady/ai-engineering-resources
Tokenization 分詞處理
-
Byte-pair Encodinghttps://arxiv.org/pdf/1508.07909 -
Byte Latent Transformer: Patches Scale Better Than Tokenshttps://arxiv.org/pdf/2412.09871
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinghttps://arxiv.org/pdf/1810.04805 -
IMAGEBIND: One Embedding Space To Bind Them Allhttps://arxiv.org/pdf/2305.05665 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representationshttps://arxiv.org/pdf/2308.11466 -
FAISS libraryhttps://arxiv.org/pdf/2401.08281 -
Facebook Large Concept Modelshttps://arxiv.org/pdf/2412.08821v2 Infrastructure 基礎設施
-
TensorFlowhttps://arxiv.org/pdf/1605.08695 -
Deepseek filesystemhttps://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md -
Milvus DBhttps://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf -
Billion Scale Similarity Search : FAISShttps://arxiv.org/pdf/1702.08734 -
Rayhttps://arxiv.org/abs/1712.05889 Core Architecture 核心架構
-
Attention is All You Needhttps://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf -
FlashAttentionhttps://arxiv.org/pdf/2205.14135 -
Multi Query Attentionhttps://arxiv.org/pdf/1911.02150 -
Grouped Query Attentionhttps://arxiv.org/pdf/2305.13245 -
Google Titans outperform Transformershttps://arxiv.org/pdf/2501.00663 -
VideoRoPE: Rotary Position Embeddinghttps://arxiv.org/pdf/2502.05173 Mixture of Experts 專家混合模型
-
Sparsely-Gated Mixture-of-Experts Layerhttps://arxiv.org/pdf/1701.06538 -
GShardhttps://arxiv.org/abs/2006.16668 -
Switch Transformershttps://arxiv.org/abs/2101.03961 RLHF 基於人類反饋的強化學習
-
Deep Reinforcement Learning with Human Feedbackhttps://arxiv.org/pdf/1706.03741 -
Fine-Tuning Language Models with RHLFhttps://arxiv.org/pdf/1909.08593 -
Training language models with RHLFhttps://arxiv.org/pdf/2203.02155 Chain of Thought 思維鏈
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Modelshttps://arxiv.org/pdf/2201.11903 -
Chain of thoughthttps://arxiv.org/pdf/2411.14405v1/ -
Demystifying Long Chain-of-Thought Reasoning in LLMshttps://arxiv.org/pdf/2502.03373 Reasoning 推理
-
Transformer Reasoning Capabilitieshttps://arxiv.org/pdf/2405.18512 -
Large Language Monkeys: Scaling Inference Compute with Repeated Samplinghttps://arxiv.org/pdf/2407.21787 -
Scale model test times is better than scaling parametershttps://arxiv.org/pdf/2408.03314 -
Training Large Language Models to Reason in a Continuous Latent Spacehttps://arxiv.org/pdf/2412.06769 -
DeepSeek R1https://arxiv.org/pdf/2501.12948v1 -
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methodshttps://arxiv.org/pdf/2502.01618 -
Latent Reasoning: A Recurrent Depth Approachhttps://arxiv.org/pdf/2502.05171 -
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlohttps://arxiv.org/pdf/2504.13139 Optimizations 最佳化方案
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bitshttps://arxiv.org/pdf/2402.17764 -
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionhttps://arxiv.org/pdf/2407.08608 -
ByteDance 1.58https://arxiv.org/pdf/2412.18653v1 -
Transformer Squarehttps://arxiv.org/pdf/2501.06252 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Stepshttps://arxiv.org/pdf/2501.09732 -
1b outperforms 405bhttps://arxiv.org/pdf/2502.06703 -
Speculative Decodinghttps://arxiv.org/pdf/2211.17192 Distillation 蒸餾
-
Distilling the Knowledge in a Neural Networkhttps://arxiv.org/pdf/1503.02531 -
BYOL – Distilled Architecturehttps://arxiv.org/pdf/2006.07733 -
DINOhttps://arxiv.org/pdf/2104.14294 SSMs 狀態空間模型
-
RWKV: Reinventing RNNs for the Transformer Erahttps://arxiv.org/pdf/2305.13048 -
Mambahttps://arxiv.org/pdf/2312.00752 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Dualityhttps://arxiv.org/pdf/2405.21060 -
Distilling Transformers to SSMshttps://arxiv.org/pdf/2408.10189 -
LoLCATs: On Low-Rank Linearizing of Large Language Modelshttps://arxiv.org/pdf/2410.10254 -
Think Slow, Fasthttps://arxiv.org/pdf/2502.20339 Competition Models 競賽模型
-
Google Math Olympiad 2https://arxiv.org/pdf/2502.03544 -
Competitive Programming with Large Reasoning Modelshttps://arxiv.org/pdf/2502.06807 -
Google Math Olympiad 1https://www.nature.com/articles/s41586-023-06747-5 Hype Makers
-
Can AI be made to think criticallyhttps://arxiv.org/pdf/2501.04682 -
Evolving Deeper LLM Thinkinghttps://arxiv.org/pdf/2501.09891 -
LLMs Can Easily Learn to Reason from Demonstrations Structurehttps://arxiv.org/pdf/2502.07374 Hype Breakers
-
Separating communication from intelligencehttps://arxiv.org/pdf/2301.06627 -
Language is not intelligencehttps://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf Image Transformers 影像轉換器
-
Image is 16×16 wordhttps://arxiv.org/pdf/2010.11929 -
CLIPhttps://arxiv.org/pdf/2103.00020 -
deepseek image generationhttps://arxiv.org/pdf/2501.17811 Video Transformers 影片轉換器
-
ViViT: A Video Vision Transformerhttps://arxiv.org/pdf/2103.15691 -
Joint Embedding abstractions with self-supervised video maskshttps://arxiv.org/pdf/2404.08471 -
Facebook VideoJAM ai genhttps://arxiv.org/pdf/2502.02492 Case Studies 案例分析
-
Automated Unit Test Improvement using Large Language Models at Metahttps://arxiv.org/pdf/2402.09171 -
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answeringhttps://arxiv.org/pdf/2404.17723v1 -
OpenAI o1 System Cardhttps://arxiv.org/pdf/2412.16720 -
LLM-powered bug catchershttps://arxiv.org/pdf/2501.12862 -
Chain-of-Retrieval Augmented Generationhttps://arxiv.org/pdf/2501.14342 -
Swiggy Searchhttps://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6 -
Swarm by OpenAIhttps://github.com/openai/swarm -
Netflix Foundation Modelshttps://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39 -
Model Context Protocolhttps://www.anthropic.com/news/model-context-protocol -
uber queryGPThttps://www.uber.com/en-IN/blog/query-gpt/
技術交流群邀請函
△長按新增小助手
掃描二維碼新增小助手微信
請備註:姓名-學校/公司-研究方向
(如:小張-哈工大-對話系統)
即可申請加入自然語言處理/Pytorch等技術交流群
關於我們
MLNLP 社群是由國內外機器學習與自然語言處理學者聯合構建的民間學術社群,目前已經發展為國內外知名的機器學習與自然語言處理社群,旨在促進機器學習,自然語言處理學術界、產業界和廣大愛好者之間的進步。
社群可以為相關從業者的深造、就業及研究等方面提供開放交流平臺。歡迎大家關注和加入我們。

掃描二維碼新增小助手微信
關於我們
