Can we edit Factual Knowledge by In-Context Learning?

Abstract

•

parameter update를 하지 않고 ICL로 model이 store한 knowledge를 edit할 수 있는지 연구한 논문.

•

ICL을 통한 editing인 IKE는 similar하지만 unrelated fact에 대한 over-edit가 일어날 확률이 낮고, catastrophic forgetting이 일어날 확률도 낮아서 parameter 건드는 기존의 방법에 비해서 side-effect가 적다고 한다.

•

OPT-175B로 제안한 방법론의 scalability를 증명.

1. Introduction

→ Knowledge editing의 Goal은 two-fold로 나눌 수 있다.

•

Generalization

◦

같은 KG를 기술하는 다양한 prompt에 대해서 동일한 edit 결과를 반영해야 한다.

•

Specificity

◦

edit하고자 하는 KG와  관련 없는 unrelated KG에 대해서는 interference가 발생하면 안된다.

→ 기존 Rome같은 gradient-based method는 원하는 model behavior (선거 이후에 바뀐 대통령)과 관련된 모델 파라미터를 업데이트 하기 위한 수정을 진행 :: over-editing & catastrophic forgetting

→ 반면 ICL은 (1) computation overhead 감소 (2) side effect 감소 이기 때문에 실제로 demonstration으로 editing이 가능한지 그 유효성만 검증해보면 된다.

→ 유효성 검증은 당연히 ICL setting에서의 다음 목표를 달성했는가 확인.

•

Generalization

→ updated된 1개의 KG의 다양한 text surface form에 generalization이 가능한가?

•

Specificity

→ preserving irrelevant fact이면서 accurate modification을 하는가?

→ 위 2가지 목표를 달성하면서 ICL로 Knowledge Editing을 하기 위한 Prompting 방법을 제안하였다.

2. Task Formulation

(x^{*},y^{*}) : new \ fact \\ maximize \ P_{\text{M}}(y^{*}|x^{*})\\ x^{*} : \text{The president of the US is} \\ y^{*} : \text{Joe Biden}

##### Generalization

x\in D_{x^{*}} \rightarrow y^{*}

##### Specificity

x\notin D_{x^{*}} \rightarrow y^{o}

3. Method

In-Context Learning

C=\{(x_{1},y_{1}),(x_{2},y_{2}),...,(x_{k},y_{k})\}

ICL: P_{M}(y|x,C)

In-Context Knowledge Editing

f=(x^{*},y^{*}) \\ ICE: P_{M}(y^{*}|x,f,C)

•

x가 editing scope에 있으면 maximize. (Generalization)

•

x가 non-editing scope에 있으면 PM(y∗∣x,f,C)랑 PM(y∣x)P_{M}(y^{*}|x,f,C)랑 \ P_{M}(y|x)  PM​(y∗∣x,f,C)랑 PM​(y∣x) 거리 minimize. (Specification)

→ demonstration construction을 2가지 하위 목표로 나눔.

#### Demonstration Formatting

•

Copy: copy the prediction of the target prompt in new facts. : 1

x_{i} = x^{*} \ and \ y_{i} = y^{*}

•

Update: The prediction of prompts in the editing scope should also be updated for the generalization of knowledge editing. : 3

x_{i} \in D_{x^{*}} \ and \ y_{i} = y^{*}

•

Retain: LMs should keep their original prediction in out-of-scope prompts. : 4

x_{i} \notin D_{x^{*}} \ and \ y_{i} = y^{o}

•

Template

T(f, x, y) = \text{New Fact: } f . \text{ Prompt: } x \quad y

#### Demonstration Organizations

→ 어떤 demonstration이 IKE를 하는데 있어서 좋은 in-context example들일까?

→ Sentence Encoder E에 new fact f의 prompt

x^{*}

, original answer

y^{o}

, targeted prediction

y^{*}

를 한번에 concat하고 training corpus에서도 마찬가지로 concat한 후 cos-sim 기준으로 아래와 같이 배치함.

\cos(c_0, f) < \cos(c_1, f) < \ldots < \cos(c_k, f)

#### Vs. Gradient Based Method

→ ICL로 Forwarding만 하면 되기 때문에 Model size가 커짐에 따른 computation overhead 및 modified parameter에 따른 side effect risk가 존재하지 않음.

→ demonstration C로 editing할 fact를 지정해주기 때문에 model behavior를 calibrating(교정)하는데 있어서 인간이 해석가능한 interface를 제공하는 방법론임.

4. Experiments

Experimental Setting

#### Baselines

•

Fine-tuning

•

MEND: transforms the fine-tuning gradient of an updated fact by decomposing the weight matrix into rank-1 form with the pre-trained hyper-network.

•

ROME: learns to locate factual retrievals of a specific set of MLP modules and update knowledge by directly writing in new key-value pairs in the MLP module.

•

PROMPT: PM(y∗∣x,f)P_{M}(y^{*}|x,f)PM​(y∗∣x,f)

#### Models

•

GPT2-XL, GPT-NEO, J, NEOX, OPT-175

#### Benchmark

•

Counterfact

◦

(s∗,r∗,oc)→(s∗,r∗,o∗)(s^*, r^*, o^c) \to (s^*, r^*, o^*)(s∗,r∗,oc)→(s∗,r∗,o∗)

◦

s∗and r∗s^* \text{and} \ r^* s∗and r∗ is the prompt x∗x^*x∗

◦

(s′,r∗,oc)(s^{'}, r^*, o^c) (s′,r∗,oc) in Neighborhood prompts

◦

First 20,000 → test set & Remaining → Test Prompt

#### Metric

•

Efficacy (post-editing accuracy for target prompts)

◦

Efficacy Score 

\mathbb{E}\left[\mathbb{I}\left[P(o^*) > P(o^c)\right]\right]

◦

Efficacy Magnitude

\mathbb{E}[P(o^*) - P(o^c)]

•

Generalization

◦

위의 metric 그대로 paraphrase prompts에 적용해서 측정.

•

Specificity (accuracy of neighborhood prompts)

◦

Specificity Score

\mathbb{E}[\mathbb{I}[P(o^c) > P(o^*)]]

◦

Specificity Magnitude

\mathbb{E}[P(o^c) - P(o^*)]

•

The Harmonic Mean of 6 metric

Experimental Results

→ FT가 높은 ES나 PS를 기록하긴 하나 Specificity Score 기록이 현저하게 낮음

→ Rome이 모든 Metric에서 좋은 performance를 보이기느 하나 computational overhead 때문에 size가 큰 모델로 scale-up하는데 한계가 있음.

→ IKE는 적은 computational cost로 ROME 대비 좋은 성능. (89.6 v.s. 91.5)

→ Prompt가 New Fact하나만 주면 Efficacy나 관련된 Generalization에서도 성능은 좋은데 (아마 attention으로 copy and paste 하기 때문으로 추정), similar but unrelated prompt는 specificity를 보면 알 수 있듯이 성능이 나오지 않음. :: 결국, demonstration이 new fact는 edit하지만 그렇지 않은 fact는 않은 그대로 유지할 수 있도록 억제제 역할도 한다.

→ OPT-175B로 올라가면 demonstration 쓰지 않은 model에 비해서 성능히 탁월하게 올라가며, Specificity 에서 ROME보다도 좋은 성능을 보임.

Analysis

#### Demonstration Numbers (적은 DEMO에서 copy, update, retain 구성이 어떤지 몰라서 확실한 해석은 못하겠음..)

→ demonstration가 0개이면 (=Prompt) Specificity가 낮음.

→ 4~8개의 적은 demonstration 구간에서도 LM이 edit하는 KG에 대한 confidence가 낮아서 efficacy랑 generalization 성능이 낮음

→ demo개수 증가함에 따라 Specificity가 뚜렸하게 증가함.

#### Demonstration Organization

→ test prompt앞에 New Fact F를 concat하기에 demonstration random selection을 해도 efficacy랑 generalization 성능은 유지, Specificity는 떨어짐

→ ordering은 성능에 크게 영향을 미치지 않음 (f랑 cos가 큰 demo들인게 성능에 더 큰 영향을 미침)

#### Demonstration Formatting

→ w/o copy: f가 있기 때문에 없어도 모든 efficacy에서 성능 유지가 됨.

→ w/o update: generalization 성능 저하, (copy, retain demo만 있기 때문에) retain은 성능이 가장 좋음 (OPT 175B 제외)

→ w/o retain: specificity 성능 저하. score 35.2 → -47.6

:: IKE의 가장 큰 contribution은 edit하고 싶지 않은 KG를 demonstration을 잘 설계하면 edit하지 않으면서 원하면 KG만 edit할 수 있다.

#### IKE Benefits from Model Scaling (32 demo except GPT2-XL due to max len)

→ IKE의 performance는 model scale에 따라 correlation을 보임 = 더 큰 LLM으로 확장 가능

(같은 family는 아니지만 OPT 175B가 가장 성능이 좋음)

#### Resilience to Over-Editing

•

Knowledge editing에서 발생하는 over-editing문제를 포괄적으로 평가하기 위해 contrastive knowledge evaluation을 진행함.

•

(s∗,r∗,oc)→(s∗,r∗,o∗)(s^*, r^*, o^c) \to (s^*, r^*, o^*)(s∗,r∗,oc)→(s∗,r∗,o∗) 랑 (s∗,r∗,oc)→(s∗,r′,o∗)(s^*, r^*, o^c) \to (s^*, r^{'}, o^*)(s∗,r∗,oc)→(s∗,r′,o∗) 일 때 LM의 확률 차이를 계산 (r’는 r이랑 similar하지만 unrelated한 relation)

•

(s∗,r∗,o∗)(s^*, r^*, o^*)(s∗,r∗,o∗) 을 inject 했을때 PM(y∗∣x∗,r′)P_{M}(y^{*}|x^*,r^{'})PM​(y∗∣x∗,r′)도 같이 따라 오르는지 확인하는 실험

•

Metric (CKA Score)

\frac{P(o^* | s^*, r^*)}{\mathbb{E}_{r' \in R}[P(o^* | s^*, r')]}

→ CKA Score가 낮다는 것은 edit해야하는 KG에 대한 확률은 낮고, unrelated KG에 대한 확률은 높다는 것을 의미함.

→ same subject, same object를 갖는 상황에서 IKE가 over-editing의 영향을 가장 덜 받는 것을 보여줌.

#### Maintenance for Original Knowledge

•

Edit 전후로 PM(oc∣s∗,r)P_{M}(o^{c}|s^*,r)PM​(oc∣s∗,r)의 drop rate을 측정

◦

The president of US is from Donald Trump → Joe Biden로 변경되었다고 해서

◦

2017년에 The president of US가 Donald Trump는 KG는 변경되면 안되어서 Appendix에 time-aware dataset도 따로 구축해서 실험을 진행했다고 함.

5. Conclusion

•

demonstraiton에 바꾸고자 하는 KG와 similar하지만 unrelated한 example을 넣어주면 over-editing하는 능력을 크게 방지할 수 있음.

•

Rome은 US 현재 president에 대한 문장을 바꾸고자 할때 US 1대 대통령에 대한 KG도 바꿀 확률이 높은데 ICL은 관련 demo 한번만 넣어주면 바꿀 확률이 현저하게 낮아짐. (US 1대 대통령?이라는 test prompt에 대한 정답 score상으로는 둘이 비등한데, confidence 찍으면 차이가 큼)

•

정리하면, ROME은 말그대로 KG를 지워버리고 overwrite하는데, ICL은 demo만 적절히 control할 수 있다면 edit할 수 있는 KG의 boundary를 해석가능하고 엄밀하게 지정할 수 있다는데에 있는데에 장점이 있다.