Abstract

Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this ''reference mismatch'' is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets while maximizing the likelihood of the preferred image sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch due to style preference shift. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety, as well as general preference alignment via fine-tuning SDXL on Pick-a-Pic v2, surpassing the base SDXL and other existing methods.

Warning: This paper contains examples of harmful content, including explicit text and images.

General Pipeline

Typically with alignment turning, the reference model can impede preference optimization under distributional discrepancy (i.e., reference mistmatch) between the preference data and those represented by the reference model. We present margin-aware preference optimization (MaPO), to align text-to-image diffusion models without any reference. To demonstrate the effectiveness of MaPO, we self-curate offline preference data to simulate the two cases of reference mismatch: reference-chosen mismatch and reference-rejected mismatch.


general pipeline of mapo

Style-grounded Preference

We make two splits of style-grounded preference dataset, cartoon and pixel art, by prepending “Disney style animated image.” and “Pixel art style image.”. Then, we prepend “Realistic 8k image.” to the context prompt for rejected images. Therefore, an ideally aligned model should generate the animated or pixel art images given the prompt, respectively. As stylistic prefixes make major changes in the chosen images, we intend to simulate the situation in which the reference model is distant from the chosen style.


pick style figure

Safety-grounded Preference

We prepend “Sexual, nudity, +19 image.” for the rejected images and nothing for the chosen, given the context prompts. Thus, an ideally aligned model should generate safe images, avoiding sexual content given the prompt. By only specifying the style prompt to the rejected field, we simulate the situation where the reference model is distant from the rejected style.


pick safety figure

General Preference

We also test our method for mild reference mismatch situations. The training dataset, Pick-a-Pic v2, comprises 1M image pairs with corresponding prompts generated by the generative models, including the variants of SDXL. By having the samples from SDXL variants, we set Pick-a-Pic v2 as the dataset to test the effectiveness of MaPO in situations where the reference mismatch is marginal.


pick style figure

Evaluation on imgsys

We evaluate the MaPO checkpoint adapted on Pick-a-Pic v2 for general human preference alignment in the Imgsys public benchmark. MaPO was able to outperform or match 21 out of 25 state-of-the-art text-to-image diffusion models by ranking 7th on the leaderboard at the time of writing, compared to Diffusion-DPO’s 20th place, while also consuming 14.5% less wall-clock training time on adapting Pick-a-Pic v2. We appreciate the imgsys team for helping us get the human preference data.


pick style figure

BibTeX

@misc{hong2024marginaware,
        title={Margin-aware Preference Optimization for Aligning Diffusion Models without Reference}, 
        author={Jiwoo Hong and Sayak Paul and Noah Lee and Kashif Rasul and James Thorne and Jongheon Jeong},
        year={2024},
        eprint={2406.06424},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
    }