news

ECCV 2024|BlazeBVD, a universal method for blind video flicker removal, is here, beautiful pictures

2024-07-23

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

  • AIxiv is a column where Synced publishes academic and technical content. In the past few years, Synced's AIxiv column has received more than 2,000 articles, covering top laboratories in major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work to share, please submit or contact us for reporting. Submission email: [email protected]; [email protected]

In recent years, the short video ecosystem has risen rapidly, and creative editing tools centered around short videos have continued to emerge. Wink, a professional mobile video editing tool under Meitu, has taken the lead with its original video quality restoration capabilities, and its number of users at home and abroad continues to rise.

The reason behind the popularity of Wink's image quality restoration function is Meitu's insight into the pain points of users' video creation, such as blurry video images, severe noise, and low image quality, against the background of the accelerated release of demand for video editing applications. At the same time, it is also based on the strong video restoration and video enhancement technical support of Meitu Image Research Institute (MT Lab). Currently, it has launched functions such as image quality restoration - HD, image quality restoration - UHD, image quality restoration - portrait enhancement, and resolution improvement.

Recently, Meitu Imaging Research Institute (MT Lab) and University of Chinese Academy of Sciences have jointly proposed a breakthrough new STE-based blind video deflickering (BVD) method, BlazeBVD, which is used to process low-quality videos with unknown illumination flicker degradation, while maintaining the integrity of the original video content and color as much as possible. It has been accepted by the top computer vision conference ECCV 2024.



Paper link: https://arxiv.org/pdf/2403.06243v1

BlazeBVD targets video flicker scenarios. Video flicker can easily affect temporal consistency, which is a necessary condition for high-quality video output. Even slight video flicker can seriously affect the viewing experience. The reason is generally caused by poor shooting environment and hardware limitations of shooting equipment. This problem is often further exacerbated when image processing technology is applied to video frames. In addition, flicker artifacts and color distortion problems also frequently appear in recent video generation tasks, including tasks based on generative adversarial networks (GANs) and diffusion models (DMs). Therefore, in various video processing scenarios, it is crucial to explore the use of Blind Video Deflickering (BVD) to eliminate video flicker and maintain the integrity of video content.

The BVD task is not affected by the cause and degree of video flicker and has broad application prospects. At present, the focus of such tasks mainly includes tasks that are not related to the type and degree of video flicker, such as old movie restoration, high-speed camera shooting, and color distortion processing, as well as tasks that only need to operate on a single flickering video without additional guidance information such as video flicker type and reference video input. In addition, BVD currently focuses on methods such as traditional filtering, forced temporal consistency, and atlases. Therefore, although deep learning methods have made significant progress in BVD tasks, they are greatly hindered at the application level due to the lack of prior knowledge, and BVD still faces many challenges.

BlazeBVD: Effectively improve the blind video flicker removal effect

Inspired by the classic flicker removal method scale-time equalization (STE), BlazeBVD introduces a histogram-assisted solution. Image histogram is defined as the distribution of pixel values. It is widely used in image processing to adjust the brightness or contrast of an image. Given an arbitrary video, STE can improve the visual stability of the video by smoothing the histogram using Gaussian filtering and correcting the pixel values ​​in each frame using histogram equalization. Although STE is only effective for some slight flickers, it verifies that:

Histograms are much more compact than pixel values ​​and can depict brightness and flicker information very well.

The video after histogram sequence smoothing has no visually obvious flicker.

Therefore, it is feasible to improve the quality and speed of blind video de-flickering by using the cues of STE and histogram.

BlazeBVD can achieve fast and stable texture recovery in the case of lighting fluctuations and over- or under-exposure by smoothing these histograms to generate singular frame sets, filtered illumination maps, and exposure mask maps. Compared with previous deep learning methods, BlazeBVD is the first to carefully use histograms to reduce the learning complexity of the BVD task, simplifying the complexity and resource consumption of learning video data. Its core is to use the flicker prior of STE, including filtered illumination maps for guiding the elimination of global flicker, singular frame sets for identifying flicker frame indices, and exposure maps for identifying local areas affected by overexposure or darkening.

At the same time, using flicker prior, BlazeBVD combines a global flicker removal module (GFRM) and a local flicker removal module (LFRM) to effectively correct the global illumination and local exposure textures of individual adjacent frames. In addition, in order to enhance the consistency between frames, a lightweight temporal network (TCM) is integrated to improve the performance without consuming a lot of time.



Figure 1: Comparison of the results of the BlazeBVD method and existing methods on the blind video flicker removal task

Specifically, BlazeBVD consists of three phases:

Firstly, STE is introduced to correct the histogram sequence of video frames in illumination space and extract flicker prior including singular frame set, filtered illumination map and exposure map.

Second, since the filtered illumination maps have stable temporal performance, they are used as cues for the global flicker removal module (GFRM) containing a 2D network to guide the color correction of video frames. On the other hand, the local flicker removal module (LFRM) restores the overexposed or darkened areas marked by the local exposure map based on the optical flow information.

Finally, a lightweight temporal matrix network (TCM) is introduced to process all frames, in which an adaptive mask weighted loss is designed to improve video consistency.

Through comprehensive experiments on synthetic, real, and generated videos, we demonstrate the superior qualitative and quantitative results of BlazeBVD, achieving a model inference speed 10 times faster than the state-of-the-art.



Figure 2: BlazeBVD training and inference process

Experimental Results

Extensive experiments show that BlazeBVD, a general method for blind video flicker tasks, outperforms previous work on both synthetic and real datasets, and ablation experiments also verify the effectiveness of the modules designed by BlazeBVD.



Table 1: Quantitative comparison with baseline methods



Figure 3: Visual comparison with baseline methods



Figure 4: Ablation experiment

Boosting productivity with imaging technology

This paper proposes a general method for blind video flicker tasks, BlazeBVD, which uses a 2D network to repair low-quality flickering videos affected by illumination changes or local exposure problems. The core is to preprocess flicker priors in the STE filter in the illumination space; then use these priors, combined with the global flicker removal module (GFRM) and the local flicker removal module (LFRM), to correct global flicker and local exposure textures; finally, use a lightweight temporal network (TCM) to improve the coherence and inter-frame consistency of the video, and also achieve a 10-fold acceleration in model reasoning.

As an explorer in China's imaging and design fields, Meitu continues to launch convenient and efficient AI functions to bring innovative services and experiences to users. Meitu Imaging Research Institute (MT Lab), as the core R&D center, will continue to iterate and upgrade AI capabilities, providing video creators with new ways of video creation and opening up a broader world.