Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning

1. Introduction

•

ICL setting에서의 Unsupervised Domain Adaption을 다룬 논문

◦

End Task의 label이 없는 상황에서 demonstration을 활용해

▪

target domain distribution 학습

▪

task signal 학습

◦

을 통해 cross domain adaptation을 할 수 있을까?

•

Real World의 End task(=target domain)에서 label이 존재하지 않는다. 

→ 다른 domain(=source domain)에서 sourcing해와서 (x,y) pair를 구축한 후 ICL

→ syntactic & semantic domain shift

→ undesired output

•

기존 이쪽 계열 연구는 대부분 metric learning & continual pre-training 계열.

◦

metric learning 제외 single training에서 LM을 source domain → target domain 이해시키려는 연구는 적었음.

→ Retriever로 End task(=target domain)와 유사한 domain(=source domain)의 (x,y)를 가져와서

•

target domain distribution 학습 → Language Modeling Loss

•

task signal 학습 → End Task Loss

2. Problem Definition

•

Source Domain DS={xiS,yi}1,...,nSource\ Domain\ D^{S}= \{x_{i}^{S},y_{i}\}_{1,...,n}Source Domain DS={xiS​,yi​}1,...,n​

•

Target Domain DS={xiT}1,...,MTarget\ Domain\ D^{S}= \{x_{i}^{T}\}_{1,...,M}Target Domain DS={xiT​}1,...,M​

•

 Unsupervised Domain Adaption: target domain에서도 generalize되도록 source domain에서 KG를 배우는 것

→ 이때, target에서도 domain distribution을 배우는 것이 핵심

•

Downstream Task

◦

NER (chatgpt-gpt3.5도 unstable prediction해서 선택한 것 같음)

◦

Sentiment Analysis

3. Method - Domain Adaptive In-Context Learning

•

기본전제

◦

(x,y) pair : Demonstration

◦

x: Context

◦

Unlabeled Target Domain: Target Distribution을 학습시키겠다.

◦

Labeled Source Domain: Task를 학습시키겠다.

◦

End-Task가 unlabel setting임으로 domain adaption 시킬 demonstration도 마찬가지로 input-only setting으로 구성함

•

Encoder-based Model & Decoder-based Model을 둘다 제시함

◦

사실상 Encoder-based Model은 그냥 FT고 Decoder-based Model이 ICL Setting임

Context Construction with Retrieval

•

Task를 학습시킬 (x,y) pair가 source domain으로부터 주어질때, dense retriever를 활용해 unlabeled target domain에서 유사한 unlabeled x를 retrieve해오는 것

◦

Sentiment - SimCSE

◦

NER - BERTScore

•

Source pair text: xS,yx^{S}, yxS,y

•

Retrieved Target unlabeled Text:  xˉT={x1T,...,xkT}\bar{x}^{T}=\{x_{1}^{T}, ... ,x_{k}^{T}\}xˉT={x1T​,...,xkT​}

Domain-Adaptive In-Context Learning

•

LM Input : [xS;xTˉ][\bar{x^{S};x^{T}}][xS;xTˉ​]

(1) In-context Task Learning - a supervised task to predict the task label

(2) In-context Language Modeling – a token prediction task to predict tokens from the target

Encoder-only LMs with ICL (Cross domain Fine-Tuning)

•

Target Domain 쪽에다가 MLM 수행

•

Source Domain에다가 Task-wise supervised learning 수행

◦

Sentiment Analysis

▪

average pooling on top of each token

◦

NER

▪

we apply an additional CRF layer on top of the LM feature which is a common practice for token-level classifications.

Decoder-only LMs with ICL

Few-Shot Inference

•

우리가 아는 ICL Setting

•

Target Domain Query를 가지고 Source Domain Demonstration (x,y)를 retrieve.

Cross Domain Fine-Tuning

•

Fine-Tuning Approach는 Encoder Approach의 대전제를 그대로 따름

•

LoRA를 활용해 FT

•

input = [prompt;xT;xS;y][prompt;x^{T} ; x^S; y][prompt;xT;xS;y]

•

Casual Language Modeling로 최적화

•

Target domain 부분이면 target domain distribution학습이고

•

Source domain 부분이면 task학습이다.

4. Results

•

Experimental Setting

◦

NER (4domain - 1 large dataset)

▪

News, Social media, Financial, and Biomedical

◦

Sentiment (4domain - 1 large dataset)

▪

Book (BK), Electronics (E), Beauty (BT), and Music (M).

•

Encoder

◦

XLM-RoBERTa-large (561M)

•

Decoder

◦

Llama1-7B (FT)

◦

ChatGPT and LLaMA-Alpaca (Inference)

•

Experimental Result 

•

NER (NEWS → Column Domain)

•

Sentiment (Source → Target)

•

Inference 

◦

No Demo : Zero Shot

◦

Rand demo : (x,y) random sampling

◦

retr demo : target domain x랑 유사한 (x,y) retrieve

•

Fine-tuning

◦

No-ICL: Source Domain으로 FT → Target Domain으로 Inference

◦

ICL-rand: Source → Target retrieve시 random하게

◦

ICL-sup: target domain token에 대해서 MLM X

◦

ICL-source:  Source → Source retrieve

•

Large-Scale LLM(LLAMA-Alpaca & ChatGPT)도 Cross Domain ICL 성능이 안나옴

•

작은 모델에 Task를 학습시키고 domain을 transfer시켜서 inference시키는게 성능이 더 좋음 (No-ICL)

•

결국, 아직 LLM이 domain generalization 능력이 부족하며 Unsupervised domain adaptation을 위해서는 domain과 task를 둘다 학습시키는 방법이 필요하다.

•

(준원) 논문에서는 명시적으로 언급하지 않지만 MLM이 domain generalization에 있어 엄청난 improvement를 가져다 주지는 않음. → 이건 선행연구들을 봐도 자명한듯

•

LLM이 gereralization 능력이 좋긴하나 NER에서는 RoBERTa, Sentiment에서는 Llama로 Auxiliary Loss function training하는게 더 효과적임.

•

2 task에서 모두 Chat-GPT가 rand-demo가 더 성능이 높음

◦

demonstration이 test input이랑 의미적으로 가까워야한다고 주장하는 이전연구들(Previous work reveals that choosing demonstration examples that are close to the test input significantly enhances the effectiveness of ICL (Liu et al., 2022; Rubin et al., 2022).)에 반함

◦

하지만 위의 연구들은 다음 2가지를 전제함

▪

demo-test input은 같은 데이터셋임 (in-distribution)

▪

label space를 공유함 (label space is identical to test input)

(논문에서 명확한 언급은 없지만 ‘OOD input-label pairs from another labeled dataset’한것으로 보아 retrievet시에 ‘input-label’ concat하고 가져온것으로 보임)

논문에서는 다음을 주장

‘We make a hypothesis regarding this observation, for crossdomain ICL, providing diverse and distinct OOD demonstrations is more beneficial for LLMs to understand the task and generalize.’

•

Ablation Study

•

Target Domain으로 Continual Pre-training → Source Domain으로 SFT → Target Domain Test

•

Source Domain으로 FT → Target Domain으로 Inference하는 No-ICL랑 비교했을 때 별차이 없다. 

◦

Sentiment에서 only gains marginal benefits