Literature Review
/
Review
Search
Share
Review
List
Table
Search
BlogPost
19
How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel
How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel
How to Inference Big LLM? - Using Accelerate Library
How to Inference Big LLM? - Using Accelerate Library
Pyspark - How to preprocess Large Scale Data with Python
Pyspark - How to preprocess Large Scale Data with Python
Democratizing Large Language Models : From 175B to 7B
Democratizing Large Language Models : From 175B to 7B
Llama3 - Tokenizer
LLAMA3
Llama3 - Tokenizer
LLAMA3
Reflections on Optimizer and LM parameter values
Reflections on Optimizer and LM parameter values
DeepSpeed - Sharding Optimizer, Gradients, Parameters, and Reducing Activations for Efficient Training
DeepSpeed - Sharding Optimizer, Gradients, Parameters, and Reducing Activations for Efficient Training
KL Divergence
KL Divergence
Grouped Query Attention (Llama3)
Grouped Query Attention (Llama3)
Python Study & Review
Python Study & Review
Model Context Protocol (MCP) - provided by Antrophic
Model Context Protocol (MCP) - provided by Antrophic
Flash attention
Flash attention
Mixed Precision & QLORA & Gradient Checkpointing
Mixed Precision & QLORA & Gradient Checkpointing
NCCL Operation
NCCL Operation
LLAMA1 Simple Takeways
Auto-Regressive
LLAMA1 Simple Takeways
Auto-Regressive
Docker
Docker
(Basic Web Service Structure) & FastAPI
(Basic Web Service Structure) & FastAPI
네트워크 장치의 구조
네트워크 장치의 구조
Java (On Going…)
Java (On Going…)
PaperReview
60
SELECTION-INFERENCE: EXPLOITING LARGE LANGUAGE MODELS FOR INTERPRETABLE LOGICAL REASONING
LLM-7B
SELECTION-INFERENCE: EXPLOITING LARGE LANGUAGE MODELS FOR INTERPRETABLE LOGICAL REASONING
LLM-7B
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
T5
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
T5
GUESS THE INSTRUCTION! FLIPPED LEARNING MAKES LANGUAGE MODELS STRONGER ZERO-SHOT LEARNERS
T5
GUESS THE INSTRUCTION! FLIPPED LEARNING MAKES LANGUAGE MODELS STRONGER ZERO-SHOT LEARNERS
T5
Measuring Association Between Labels and Free-Text Rationales
T5-base
Measuring Association Between Labels and Free-Text Rationales
T5-base
Mutual Information Alleviates Hallucinations in Abstractive SummarizationType of the Paper
BART
Mutual Information Alleviates Hallucinations in Abstractive SummarizationType of the Paper
BART
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Survey of Hallucination in Natural Language Generation
None
Survey of Hallucination in Natural Language Generation
None
Mitigating Label Biases for In-context Learning
GPT3-J
GPT3-175B
Mitigating Label Biases for In-context Learning
GPT3-J
GPT3-175B
Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts
BERT-BASE
BERT-LARGE
Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts
BERT-BASE
BERT-LARGE
The False Promise of Imitating Proprietary LLMs
LLM-7B
The False Promise of Imitating Proprietary LLMs
LLM-7B
LIMA: Less Is More Alignment
LLM-65B
LIMA: Less Is More Alignment
LLM-65B
Rethinking the role of the demonstrations
GPT3-J
OPT
Rethinking the role of the demonstrations
GPT3-J
OPT
Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
GPT3-175B
Text-davinci-002
Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
GPT3-175B
Text-davinci-002
Explanation-based Finetuning Makes Models More Robust to Spurious Cues
GPT3-175B
T5
BART
OPT
Explanation-based Finetuning Makes Models More Robust to Spurious Cues
GPT3-175B
T5
BART
OPT
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
LLAMA
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
LLAMA
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
GPT3
OPT
LLAMA
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
GPT3
OPT
LLAMA
Self-Alignment with Instruction Backtranslation
LLAMA
Self-Alignment with Instruction Backtranslation
LLAMA
ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data
ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data
Faithful Low-Resource Data-to-Text Generation through Cycle Training
Seq2Seq
Faithful Low-Resource Data-to-Text Generation through Cycle Training
Seq2Seq
REPLUG: Retrieval-Augmented Black-Box Language Models
LLM-7B
REPLUG: Retrieval-Augmented Black-Box Language Models
LLM-7B
LongLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS
LLAMA2
LongLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS
LLAMA2
Are Emergent Abilities of Large Language Models a Mirage?
GPT3
Are Emergent Abilities of Large Language Models a Mirage?
GPT3
We’re Afraid Language Model Aren’t Modeling Ambiguity
GPT3
InstructGPT
GPT4
We’re Afraid Language Model Aren’t Modeling Ambiguity
GPT3
InstructGPT
GPT4
Mistral 7B & Mixtral (Mixtral of Experts)
Transformer
Mistral 7B & Mixtral (Mixtral of Experts)
Transformer
Can we edit Factual Knowledge by In-Context Learning?
GPT3-J
OPT
Can we edit Factual Knowledge by In-Context Learning?
GPT3-J
OPT
Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning
LLAMA
Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning
LLAMA
SILO LANGUAGE MODELS: ISOLATING LEGAL RISK IN A NONPARAMETRIC DATASTORE
SILO LANGUAGE MODELS: ISOLATING LEGAL RISK IN A NONPARAMETRIC DATASTORE
LLM Augmented LLMs: Expanding Capabilities through Composition
PaLM
LLM Augmented LLMs: Expanding Capabilities through Composition
PaLM
Rethinking Interpretability in the Era of Large Language Models
Rethinking Interpretability in the Era of Large Language Models
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning (Reasoning을 위한 Prompting)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning (Reasoning을 위한 Prompting)
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
Chain-Of-Thought
Chain-Of-Thought
OVERTHINKING THE TRUTH: UNDERSTANDING HOW LANGUAGE MODELS PROCESS FALSE DEMONSTRATIONS
GPT3-J
OVERTHINKING THE TRUTH: UNDERSTANDING HOW LANGUAGE MODELS PROCESS FALSE DEMONSTRATIONS
GPT3-J
Beyond Memorization: Violating Privacy Via Inferencing With LLMs
GPT4
Beyond Memorization: Violating Privacy Via Inferencing With LLMs
GPT4
Retrieval-Augmented Data Augmentation For Low-Resource Domain Task
LLM-7B
T5
Retrieval-Augmented Data Augmentation For Low-Resource Domain Task
LLM-7B
T5
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning
GPT2-XL
T5
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning
GPT2-XL
T5
Preference-free Alignment Learning with Regularized Relevance Reward
LLAMA2
Mistral
Preference-free Alignment Learning with Regularized Relevance Reward
LLAMA2
Mistral
Emergent Abilities of Large Language Models
GPT3-175B
LaMDA
Emergent Abilities of Large Language Models
GPT3-175B
LaMDA
Crafting In-context Examples according to LMs’ Parametric Knowledge
Crafting In-context Examples according to LMs’ Parametric Knowledge
Understanding Emergent Abilities of Language Models from the Loss Perspective
LLAMA
Understanding Emergent Abilities of Language Models from the Loss Perspective
LLAMA
Rethinking Data Selection for Supervised Fine-Tuning
LLAMA
Rethinking Data Selection for Supervised Fine-Tuning
LLAMA
Does Fine Tuning LLMs on News Knowledge Encourage Hallucinations?
PaLM
Does Fine Tuning LLMs on News Knowledge Encourage Hallucinations?
PaLM
Physics of Language Models: Part 2.1 - Grade-School Math and the Hidden Reasoning Process
GPT2
Physics of Language Models: Part 2.1 - Grade-School Math and the Hidden Reasoning Process
GPT2
Reasoning in Flux: Enhancing Large Language Models Reasoning through Uncertainty-aware Adaptive Guidance
Mistral
Reasoning in Flux: Enhancing Large Language Models Reasoning through Uncertainty-aware Adaptive Guidance
Mistral
SmolLM2: When Smol Goes Big Data-Centric Training of a Small Language Model
LLAMA
SmolLM2: When Smol Goes Big Data-Centric Training of a Small Language Model
LLAMA
Fine-grained Hallucination Detection and Editing for Language Models
LLAMA2
CHAT-GPT
Fine-grained Hallucination Detection and Editing for Language Models
LLAMA2
CHAT-GPT
Titans: Learning to Memorize at Test Time
Transformer
RNN
Titans: Learning to Memorize at Test Time
Transformer
RNN
LATENT ACTION PRETRAINING FROM VIDEOS
VLM
LATENT ACTION PRETRAINING FROM VIDEOS
VLM
ENCRYPTION-FRIENDLY LLM ARCHITECTURE
BERT-BASE
ENCRYPTION-FRIENDLY LLM ARCHITECTURE
BERT-BASE
LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision
LLAMA2
MetaMath
ORCA
LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision
LLAMA2
MetaMath
ORCA
Load more
PreviousPaperReview
23
Glove- Global vectors for word representation
Glove- Global vectors for word representation
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
Attenton is all you need (Transformer)
Attenton is all you need (Transformer)
ADAM- A METHOD FOR STOCHASTIC OPTIMIZATION (2014 ICLR)`
ADAM- A METHOD FOR STOCHASTIC OPTIMIZATION (2014 ICLR)`
A Robustly Optimized BERT Pretraining Approach (RoBERTa) & A Lite BERT for Self-supervised Learning of Language Representations (ALBERT)
A Robustly Optimized BERT Pretraining Approach (RoBERTa) & A Lite BERT for Self-supervised Learning of Language Representations (ALBERT)
Improving Language Understanding by Generative Pre-Training (GPT)
Improving Language Understanding by Generative Pre-Training (GPT)
Meet Your Favorite Character- Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances (2022 NAACL)
Meet Your Favorite Character- Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances (2022 NAACL)
Contrastive Decoding: Open-ended Text Generation as Optimization
Contrastive Decoding: Open-ended Text Generation as Optimization
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020 NIPS)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020 NIPS)
Transformer-XL- Attentive Language Models Beyond a Fixed-Length Context (2019 ACL)
Transformer-XL- Attentive Language Models Beyond a Fixed-Length Context (2019 ACL)
XLNet-Generalized Autoregressive Pretraining for Language Understanding (2019 NIPS)
XLNet-Generalized Autoregressive Pretraining for Language Understanding (2019 NIPS)
Longformer- The Long-Document Transformer (2020)
Longformer- The Long-Document Transformer (2020)
Don't Stop Pretraining- Adapt Language Models to Domains and Tasks (2020 ACL)
Don't Stop Pretraining- Adapt Language Models to Domains and Tasks (2020 ACL)
Bertscore- Evaluating text generation with bert (2020 ICLR)
Bertscore- Evaluating text generation with bert (2020 ICLR)
BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2020 ACL)
BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2020 ACL)
Improving neural networks by preventing co-adaptation of feature detectors
Improving neural networks by preventing co-adaptation of feature detectors
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FiD)
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FiD)
Learning to Repair- Repairing model output errors after deployment using a dynamic memory of feedback (2022 NAACL)
Learning to Repair- Repairing model output errors after deployment using a dynamic memory of feedback (2022 NAACL)
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization (2020 ACL)
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization (2020 ACL)
Focus Attention- Promoting Faithfulness and Diversity in Summarization (2021 ACL)
Focus Attention- Promoting Faithfulness and Diversity in Summarization (2021 ACL)
Long-Span Summarization via Local Attention and Content Selection (2020 ACL)
Long-Span Summarization via Local Attention and Content Selection (2020 ACL)
Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning (2021 ACL)
Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning (2021 ACL)
MuSeM- Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching (2020 ICMLA)
MuSeM- Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching (2020 ICMLA)
CodingTestReview
1
*****
*****
No Category
1
samples
samples