Lumina-OmniLV is a universal low-level vision model that supports 100+ sub-tasks with both textual and visual prompts, spanning restoration, enhancement, weak-semantic dense prediction, and stylization.
Click to jump to each section.
We present Lumina-OmniLV (abbreviated as OmniLV), a universal multimodal multi-task framework for low-level vision that addresses over 100 sub-tasks across four major categories, including image restoration, image enhancement, weak-semantic dense prediction, and stylization. OmniLV leverages both textual and visual prompts to offer flexible, user-friendly interactions.
Built on Diffusion Transformer (DiT)-based generative priors, our framework supports arbitrary resolutions and achieves optimal performance at 1K resolution while preserving fine-grained details and high fidelity. Through extensive experiments, we demonstrate that separately encoding text and visual instructions, combined with co-training using shallow feature control, is essential to mitigate task ambiguity and improve multi-task generalization.
We organize results into five categories. Each category shows featured examples by default and supports one-click expansion to all pairs.
Loading visual comparisons...
Additional analyses from the paper highlighting model behavior and design tradeoffs.
@article{pu2025luminaomnilv,
title = {Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision},
author = {Pu, Yuandong and Zhuo, Le and Zhu, Kaiwen and Xie, Liangbin and Zhang, Wenlong and Chen, Xiangyu and Gao, Peng and Qiao, Yu and Dong, Chao and Liu, Yihao},
journal = {arXiv preprint arXiv:2504.04903},
year = {2025}
}
We would like to thank the Cambrain authors for providing this webpage template.