Self-Alignment with Instruction Backtranslation

#### Seed Data: 3200 from Open Assistant : Human-annotated Instruction, Response Pair.

#### Unlabelled data: Clubweb corpus → starts with 502k y_{i}

#### Self-Augmentation

•

Seed data를 이용해 Base model을 (y,x) fine-tuning.

•

모든 inference data y_{i}에 대해서 x^_{i}를 만들어 준다.

#### Self-Curation

•

Seed data를 이용해 Base model을 (x,y) fine-tuning 해서 M0(intermediate model)을 얻는다.

•

M0을 사용해 (x^_{i}, y_{i})에 prompt를 씌워 점수 a_{i}를 얻는다. a_{i} ≥ k인 데이터셋 A_{i}_k를 얻는다.

#### Iterative Self-Curation

•

Seed Data + A_{t-1}_k를 가지고 M0를 fine-tuning해서 M1을 얻는다.

•

M1을 가지고(x^_{i}, y_{i})를 다시 평가해 점수 a_{i}를 얻는다. a_{i} ≥ k인 데이터셋 A_{i}_k를 얻는다. 이번에 얻는 A_{t}_k이다.

•

Seed Data + A_{t}_k를 가지고 M1를 fine-tuning해서 M2을 얻는다.

Seed Data와 Augment Data의 Prefix를 다르게 두었음

•

Seed: “Answer in the style of an AI Assistant."

•

Augmentation: “Answer with knowledge from a web search.”

•

Base model 

◦

LLAMA

•

Training Details

◦

learning rate 1e − 5 which linearly decays to 9e − 6 at the end of training, weight decay 0.1, batch size 32 (examples) and dropout 0.1.

→ Batch-Size 32로 맞춰주면 될듯

◦

For finetuning with less than 3000 examples we use batch size 8 (more details in Table 18)

•

Generation Setting

◦

Nucleus Sampling: T = 0.7, p = 0.9.

→ 더 높은 점수 A_{5}_{k} > A_{4}_{k}일수록 Instruction / Output Length가 Seed를 따라가는 경향

→ Augmented Data가 Diversity 확장해준다.

#### 만들어야 하는 함수

LLAMA Fine-Tuning Code

→ 아마 건우 코드 그대로

LLAMA Back Translation Code 

→ Prompt 설계 중요할듯

LLAMA Instruction Inference Code

Data Augmentation Code 

→ Inference된거 Merge하는거

Prompt String

→ 2번에 쓸 Prompt

→ 6번에 쓸 Prompt(논문)

Data Curation Code (with argument k)

→ 4번에 생성한 것 중 K개 가져오기

Data Merging Code (Seed + Curation)

Iterative하게 성능이 증가했다는 어떻게 평가할 것인가?

→ 논문에서는

(1) human-annotated 된 high-quality dataset을 얼마나 잘 판별하는가?

(2) 100 examples of the selected data로 tuning된 애가 text-davinc003에 비해서 성능이 얼마나 더 향상되었는가?

→ 우리는 Loss로 확인할것인가?