Review

List

Table

Search

BlogPost

21

How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel

How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel

How to Inference Big LLM? - Using Accelerate Library

How to Inference Big LLM? - Using Accelerate Library

Pyspark - How to preprocess Large Scale Data with Python

Pyspark - How to preprocess Large Scale Data with Python

Democratizing Large Language Models : From 175B to 7B

Democratizing Large Language Models : From 175B to 7B

Llama3 - Tokenizer

Llama3 - Tokenizer

Reflections on Optimizer and LM parameter values

Reflections on Optimizer and LM parameter values

DeepSpeed - Sharding Optimizer, Gradients, Parameters, and Reducing Activations for Efficient Training

DeepSpeed - Sharding Optimizer, Gradients, Parameters, and Reducing Activations for Efficient Training

DP/TP/PP vs. DeepSpeed

DP/TP/PP vs. DeepSpeed

Grouped Query Attention (Llama3)

Grouped Query Attention (Llama3)

Python Study & Review

Python Study & Review

Model Context Protocol (MCP) - provided by Antrophic

Model Context Protocol (MCP) - provided by Antrophic

Flash attention

Flash attention

Mixed Precision & QLORA & Gradient Checkpointing

Mixed Precision & QLORA & Gradient Checkpointing

LLAMA1 Simple Takeways

Auto-Regressive

LLAMA1 Simple Takeways

Auto-Regressive

Parquet / WebDataset / TorchData

Parquet / WebDataset / TorchData

(Basic Web Service Structure) & FastAPI

(Basic Web Service Structure) & FastAPI

네트워크 장치의 구조

네트워크 장치의 구조

Java (On Going…)

Java (On Going…)

PaperReview

65

SELECTION-INFERENCE: EXPLOITING LARGE LANGUAGE MODELS FOR INTERPRETABLE LOGICAL REASONING

SELECTION-INFERENCE: EXPLOITING LARGE LANGUAGE MODELS FOR INTERPRETABLE LOGICAL REASONING

Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

GUESS THE INSTRUCTION! FLIPPED LEARNING MAKES LANGUAGE MODELS STRONGER ZERO-SHOT LEARNERS

GUESS THE INSTRUCTION! FLIPPED LEARNING MAKES LANGUAGE MODELS STRONGER ZERO-SHOT LEARNERS

Measuring Association Between Labels and Free-Text Rationales

Measuring Association Between Labels and Free-Text Rationales

Mutual Information Alleviates Hallucinations in Abstractive SummarizationType of the Paper

Mutual Information Alleviates Hallucinations in Abstractive SummarizationType of the Paper

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

Survey of Hallucination in Natural Language Generation

Survey of Hallucination in Natural Language Generation

Mitigating Label Biases for In-context Learning

Mitigating Label Biases for In-context Learning

Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts

Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts

The False Promise of Imitating Proprietary LLMs

The False Promise of Imitating Proprietary LLMs

LIMA: Less Is More Alignment

LIMA: Less Is More Alignment

Rethinking the role of the demonstrations

Rethinking the role of the demonstrations

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

Text-davinci-002

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

Text-davinci-002

Explanation-based Finetuning Makes Models More Robust to Spurious Cues

Explanation-based Finetuning Makes Models More Robust to Spurious Cues

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning

What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning

Self-Alignment with Instruction Backtranslation

Self-Alignment with Instruction Backtranslation

ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data

ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Faithful Low-Resource Data-to-Text Generation through Cycle Training

REPLUG: Retrieval-Augmented Black-Box Language Models

REPLUG: Retrieval-Augmented Black-Box Language Models

LongLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS

LongLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS

Are Emergent Abilities of Large Language Models a Mirage?

Are Emergent Abilities of Large Language Models a Mirage?

We’re Afraid Language Model Aren’t Modeling Ambiguity

We’re Afraid Language Model Aren’t Modeling Ambiguity

Mistral 7B & Mixtral (Mixtral of Experts)

Mistral 7B & Mixtral (Mixtral of Experts)

Can we edit Factual Knowledge by In-Context Learning?

Can we edit Factual Knowledge by In-Context Learning?

Load more

PreviousPaperReview

23

Glove- Global vectors for word representation

Glove- Global vectors for word representation

Sequence to Sequence Learning with Neural Networks

Sequence to Sequence Learning with Neural Networks

Attenton is all you need (Transformer)

Attenton is all you need (Transformer)

ADAM- A METHOD FOR STOCHASTIC OPTIMIZATION (2014 ICLR)`

ADAM- A METHOD FOR STOCHASTIC OPTIMIZATION (2014 ICLR)`

A Robustly Optimized BERT Pretraining Approach (RoBERTa) & A Lite BERT for Self-supervised Learning of Language Representations (ALBERT)

A Robustly Optimized BERT Pretraining Approach (RoBERTa) & A Lite BERT for Self-supervised Learning of Language Representations (ALBERT)

Improving Language Understanding by Generative Pre-Training (GPT)

Improving Language Understanding by Generative Pre-Training (GPT)

Meet Your Favorite Character- Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances (2022 NAACL)

Meet Your Favorite Character- Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances (2022 NAACL)

Contrastive Decoding: Open-ended Text Generation as Optimization

Contrastive Decoding: Open-ended Text Generation as Optimization

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020 NIPS)

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020 NIPS)

Transformer-XL- Attentive Language Models Beyond a Fixed-Length Context (2019 ACL)

Transformer-XL- Attentive Language Models Beyond a Fixed-Length Context (2019 ACL)

XLNet-Generalized Autoregressive Pretraining for Language Understanding (2019 NIPS)

XLNet-Generalized Autoregressive Pretraining for Language Understanding (2019 NIPS)

Longformer- The Long-Document Transformer (2020)

Longformer- The Long-Document Transformer (2020)

Don't Stop Pretraining- Adapt Language Models to Domains and Tasks (2020 ACL)

Don't Stop Pretraining- Adapt Language Models to Domains and Tasks (2020 ACL)

Bertscore- Evaluating text generation with bert (2020 ICLR)

Bertscore- Evaluating text generation with bert (2020 ICLR)

BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2020 ACL)

BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2020 ACL)

Improving neural networks by preventing co-adaptation of feature detectors

Improving neural networks by preventing co-adaptation of feature detectors

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FiD)

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FiD)

Learning to Repair- Repairing model output errors after deployment using a dynamic memory of feedback (2022 NAACL)

Learning to Repair- Repairing model output errors after deployment using a dynamic memory of feedback (2022 NAACL)

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization (2020 ACL)

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization (2020 ACL)

Focus Attention- Promoting Faithfulness and Diversity in Summarization (2021 ACL)

Focus Attention- Promoting Faithfulness and Diversity in Summarization (2021 ACL)

Long-Span Summarization via Local Attention and Content Selection (2020 ACL)

Long-Span Summarization via Local Attention and Content Selection (2020 ACL)

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning (2021 ACL)

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning (2021 ACL)

MuSeM- Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching (2020 ICMLA)

MuSeM- Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching (2020 ICMLA)

CodingTestReview

1

No Category

1