Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision

Yuandong Pu1,2,   Le Zhuo2,   Kaiwen Zhu1,2,   Liangbin Xie3,4,   Wenlong Zhang2,   Xiangyu Chen2,6,   Peng Gao2,   Yu Qiao2,   Chao Dong4,5,2,   Yihao Liu2,†
1Shanghai Jiao Tong University,   2Shanghai AI Laboratory,   3University of Macau
4Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences,   5Shenzhen University of Advanced Technology,   6Institute of Artificial Intelligence (TeleAI), China Telecom
Corresponding author

Abstract

We present Lunima-OmniLV(abbreviated as OmniLV), a universal multimodal multi-task framework for low-level vision that addresses over 100 sub-tasks across four major categories, including image restoration, image enhancement, weak-semantic dense prediction, and stylization. OmniLV leverages both textual and visual prompts to offer flexible, user-friendly interactions. Built on Diffusion Transformer (DiT)-based generative priors, our framework supports arbitrary resolutions — achieving optimal performance at 1K resolution — while preserving fine-grained details and high fidelity. Through extensive experiments, we demonstrate that separately encoding text and visual instructions, combined with co-training using shallow feature control, is essential to mitigate task ambiguity and enhance multi-task generalization. Our findings also reveal that integrating high-level generative tasks into low-level vision models can compromise detail-sensitive restoration. These insights pave the way for more robust and generalizable low-level vision systems.

Method

Method

Overall framework of OmniLV. First, input images are encoded into latent space by VAE encoder. Then, we patchify the image latent and noise latent into visual tokens. Optionally, in-context pairs can be added to visual tokens to handle complex scenarios. At the same time, the instruction prompt and description prompt are processed by Gemma2B. Finally, we decode the denoised results to get the desired output images.

Image Restoration

Image Enhancement

Dense Prediction