news

Just one picture can restore the painting process. This paper was implemented earlier than Paints-UNDO

2024-07-30

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

AIxiv is a column where Synced publishes academic and technical content. In the past few years, Synced's AIxiv column has received more than 2,000 articles, covering top laboratories in major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work to share, please submit or contact us for reporting. Submission email: [email protected]; [email protected]

About the author: Song Yiren is a PhD candidate at ShowLab, National University of Singapore. His main research interests include image and video generation, and AI security.

Huang Shijie is a second-year master's student at the National University of Singapore. He is currently an algorithm engineer intern at Tiamat AI, with his main research direction being visual generation. He is currently looking for a PhD admission opportunity in fall 2025.

Recently, lvmin brought the latest model Paints-UNDO. This AI generation tool can restore the entire painting process based on pictures, which shocked the entire AIGC community.



Paints-UNDO demo.

As early as one month ago, NUS, SJTU, Tiamat and other institutions jointly published a work ProcessPainter: Learn Painting Process from Sequence Data to do a similar task. The Paints-UNDO technical report has not yet been released, let's take a look at how ProcessPainter is implemented!



  • Paper title: ProcessPainter: Learn Painting Process from Sequence Data
  • Paper link: https://arxiv.org/pdf/2406.06062
  • Code link: https://github.com/nicolaus-huang/ProcessPainter

Open any painting instruction book and you will find step-by-step instructions. However, in the era of generative AI, the image generation process through denoising is completely different from the painting process of human painters, and the AI ​​painting process cannot be directly used for painting teaching.

To solve this problem, ProcessPainter has for the first time achieved the goal of using a diffusion model to generate the painting process by training a time series model on synthetic data and human painters’ painting videos. In addition, the painting process of different subjects and painters varies greatly, and the styles are very different. However, there are currently few studies that take the painting process as a research object. Based on the pre-trained Motion Model, the authors of the paper train Motion LoRA on a small number of painting sequences of a specific painter to learn the painter's painting techniques.



In-depth understanding of ProcessPainter's core technology



1. Temporal Attention

The core innovation of ProcessPainter is to learn to generate the painting process with temporal attention. The key to generating a painting sequence is that the entire sequence is the process of the same picture changing from abstract to concrete, and the previous and next frames are consistent and related in content and composition. To achieve this goal, the author introduced the temporal attention module from AnimateDiff to Unet. This module is located after each diffusion layer and absorbs information from different frames through the inter-frame self-attention mechanism to ensure smooth transition and continuity of the entire sequence.

Experiments have shown that this training strategy can maintain consistent painting effects between frames. The difference between painting process generation and video generation tasks is that the painting process changes more dramatically. The first frame is a color block or line drawing with a low degree of completion, while the last frame is a complete painting, which brings challenges to model training. To this end, the authors first pre-trained the timing module on a large number of synthetic data sets to let the model learn the step-by-step painting process of various SBR (Stroke-based rendering) methods, and then trained the Painting LoRA model with the painting process data of dozens of artists.

2. Artwork Replication Network

In painting practice, we want to know how a work is drawn and how to further refine the semi-finished painting to achieve the desired finished effect. This leads to two tasks: painting process reconstruction and completion. Given that both tasks have image input, the authors of the paper proposed the Artwork Replication Network.

This network design can process image inputs of any frame and flexibly control the generation of the painting process. Similar to previous controllable generation methods, the authors introduced a variant of ControlNet to control specific frames in the generated results to be consistent with the reference image.

3. Synthetic Datasets and Training Strategies

Since real painting process data is difficult to obtain and the quantity is insufficient to support large-scale training, the authors of the paper constructed a synthetic dataset for pre-training.

Three synthetic data methods were used:

1. Use Learn to Paint to generate a painting sequence of semi-transparent Bezier curve strokes;

2. Generate oil painting style and Chinese painting style painting sequences using Neural style painting by customizing brush strokes.

3. The above-mentioned SBR (Stroke base painting) method fits a target image from coarse to fine, which means that the already painted parts can be covered and modified. However, many types of paintings, such as Chinese paintings and sculptures, cannot significantly modify the completed parts due to material limitations. The painting process is completed in different regions. To this end, the authors of the paper use SAM (segment anything) and saliency detection methods to add content to each sub-region of the blank canvas, first draw salient objects, and then gradually diffuse them to the background, thereby synthesizing a video of the painting process.

In the training phase, the authors first pre-trained the Motion Model on a synthetic dataset, then froze the parameters of the Motion Model and trained the Artwork Replication Network. When fine-tuning the painting LoRA model, the first step was to fine-tune the spatial attention LoRA using only the final frame to prevent the semi-finished painting training set from damaging the generation quality of the model.

After that, the authors froze the parameters of the spatial attention LoRA and fine-tuned the temporal attention LoRA using the complete painting sequence. During the inference phase, when generating a painting sequence from text, ProcessPainter does not use the artwork replication network. In the painting process reconstruction and completion tasks, ProcessPainter uses the artwork replication network to receive reference input for a specific frame. To ensure that the frames in the generated painting sequence match the input image as closely as possible, ProcessPainter uses the DDIM inversion technique to obtain the initial noise of the reference image and replace the initial noise of a specific frame in the UNet.

ProcessPainter effect display

The ProcessPainter base model trained on a synthetic dataset can generate painting sequences with different process styles.



By training Motion Lora on a small number of human painters’ painting sequences, ProcessPainter can learn the painting process and style of a specific painter.



Given a reference image, ProcessPainter can reverse engineer a finished artwork into its painting steps, or deduce a complete painting from a work-in-progress.



The combination of these technical components allows ProcessPainter to not only generate painting processes from text, but also convert reference images into painting sequences or complete unfinished paintings. This undoubtedly provides new tools for art education and opens up a new track for the AIGC community. Perhaps in the near future, there will be various Loras on Civitai that simulate the painting process of human painters.

For more details, please read the original paper or visit the Github project homepage.