FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval

1Louis Vuitton,

2Inria, École normale supérieure, CNRS, PSL Research University

3Courant Institute of Mathematical Sciences and Center for Data Science, New York University

Abstract

Composed image retrieval (CIR) aims to retrieve a target image that depicts a reference image modified by a textual description. While recent vision-language models (VLMs) achieve promising CIR performance by embedding images and text into a shared space for retrieval, they often fail to reason about what to preserve and what to change. This limitation hinders interpretability and yields suboptimal results, particularly in fine-grained domains like fashion.

In this paper, we introduce FIRE-CIR, a model that brings compositional reasoning and interpretability to fashion CIR. Instead of relying solely on embedding similarity, FIRE-CIR performs question-driven visual reasoning: it automatically generates attribute-focused visual questions derived from the modification text, and verifies the corresponding visual evidence in both reference and candidate images.

To train such a reasoning system, we automatically construct a large-scale fashion-specific visual question answering dataset, containing questions requiring either single- or dual-image analysis. During retrieval, our model leverages this explicit reasoning to re-rank candidate results, filtering out images inconsistent with the intended modifications.

Experimental results on the Fashion IQ benchmark show that FIRE-CIR outperforms state-of-the-art methods in retrieval accuracy. It also provides interpretable, attribute-level insights into retrieval decisions.

BibTeX

@article{garderes2026firecir,
  author    = {Garderes, Francois and Gauthier, Camille-Sovanneary and Ponce, Jean and Chen, Shizhe},
  title     = {FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval},
  journal   = {IEEE/CVF Conference on Computer Vision and Pattern Recognition- FINDINGS Track (CVPRF)},
  year      = {2026},
}