Zephyr - Direct Distillation of LM Alignment

The paper: http://arxiv.org/abs/2310.16944

## Purpose 
The paper aims to produce a smaller language model (LM) that aligns well with user intent, using a method called distilled direct preference optimization (dDPO). This method improves intent alignment significantly without requiring human annotation, setting a new benchmark for 7B parameter chat models.

## Methods 
- Distilled Supervised Fine-Tuning (dSFT) using AI-generated dialogues.
- AI Feedback (AIF) for collecting preferences on model outputs.
- Distilled Direct Preference Optimization (dDPO) for refining the model based on AI feedback.

## Key Findings 
1. ZEPHYR-7B outperforms other 7B models and is competitive with larger models in chat benchmarks.
2. Preference learning is crucial for achieving alignment with user intent.
3. The approach does not require human annotation or additional sampling during fine-tuning.

## Discussion 
The paper highlights the effectiveness of dDPO in aligning smaller LMs to user intent, potentially reshaping the approach to training efficient and aligned LMs. It demonstrates that smaller models can achieve performance comparable to larger, human-feedback-aligned models.

## Critiques 
1. GPT-4, used as an evaluator, may be biased towards models distilled from it.
2. The scalability of the method to larger models like LLAMA2-70B is untested.
3. Safety considerations, such as the production of harmful outputs, are not addressed in this study.

## Tags
#AIAlignment #LanguageModels #dDPO #ZEPHYR7B #ChatModelBenchmarks

Leave a Comment