news

“Jimeng AI” is launched, can ByteDance surpass Kuaishou?

2024-08-13

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Reporter of China Business Network: Yang Xinyi Editor of China Business Network: Wei Guanhong

"The pressure is put on Douyin and Jianying." When phenomenal generative AI (artificial intelligence) products emerged in June this year, represented by Kuaishou's self-developed video generation model KeLing, the outside world was all looking forward to ByteDance's further performance.

Recently, the mobile version of "Jimeng AI", a one-stop AI creation platform developed by ByteDance's Jianying team, was officially launched on the Apple App Store.

The reporter of Daily Economic News learned that the app currently has functions such as text-to-picture and text/picture-to-video. In addition, Jimeng has launched a membership system and introduced multiple subscription methods.

By comparing the actual applications of Jimeng, Keling and Sora, the reporter of Daily Economic News found that the three large video generation models are relatively accurate and complete in capturing and understanding the prompt words, but Jimeng's character creation, rich content and video fluency are relatively lacking. In terms of the length of generated content, Jimeng supports video generation of up to 12 seconds.

"The number of seconds it takes to generate a smooth video is a key factor in judging the quality of a video generation model," a big model engineer said in an interview with the Daily Economic News. "'Smoothness' needs to be viewed from multiple dimensions, such as whether the generated content has factual errors, memory ability, and spatial sense."


The mobile version of “Jimeng AI” is now available. Image source: App screenshot


“Jimeng AI” is on the market, will its effect surpass Keling?

At the beginning of this year, the emergence of Sora ushered in the "ChatGPT era of video". Subsequently, the "dark horse" Keling launched by Kuaishou made people have more and more expectations for the performance of domestic AI video models. ByteDance, the parent company of Douyin, which is also a short video giant, is regarded as one of the players with the greatest potential to catch up in the field.


Keling AI web page Image source: Screenshot of the official website

At the end of March, the AI ​​creation platform "Jimeng AI" developed by ByteDance's Jianying team opened for internal testing; on May 9, the web version of the application was launched. At the beginning of its launch, it only had three major functions: image generation, smart canvas and video generation. Now it has a new story creation function; on August 6, the mobile version of the application was officially launched on the Apple App Store, and now has functions such as text-to-picture and text/picture-to-video.


Image source of the Jimeng AI web page: screenshot of the official website

As for the actual use of Jimeng, in early July this year, the country's first AIGC generative continuous narrative science fiction short series "Sanxingdui: Future Revelation" was launched on Douyin. In this 13-episode short series, Jimeng, as the chief AI technical supporter, used 10 AI technologies including AIGC script creation, concept and storyboard design, image to video conversion, video editing and media content enhancement.

According to media reports, in the process of cooperating with Bona Film Group to launch "Sanxingdui: Apocalypse of the Future", Zhimeng AI improved the "video generation" function, including support for 24fps, 30fps, and 60fps frame interpolation and twice super-resolution capabilities, added horizontal and up and down movement of the lens, and supported control of the direction and amplitude of lens movement.


Image source: Screenshot of Jianying WeChat public account

After the launch of the Jimeng App, a reporter from the Daily Economic News selected several Sora video prompts officially released by OpenAI to conduct comparative tests on Jimeng, Keling and Sora.

Judging from the reporter’s test results, the three large video generation models are relatively accurate and complete in capturing and understanding the prompt words, and the generated video content is also coherent and smooth.

However, in terms of the refinement of character portrayal, Sora has certain advantages over Jimeng and Keling; in terms of the naturalness of movements, Jimeng is slightly inferior to the three tested products. For example, in the video presentation with the theme of "Ladies on the Streets of Tokyo", the head and neck of the character generated by Jimeng are slightly distorted when turning his head, and the hand movement of holding the bag is also deformed.


Image source: Screenshot of video generated by reporter

In terms of the richness of the elements in the content produced, Sora also performed better among the three. For example, in the generated video content with the theme of "astronaut", Sora provided multiple associated contents related to the prompt words such as spacecraft and scenes outside the cabin, while Jimeng and Keling only presented a male character wearing a space suit.


Image source: Screenshot of video generated by reporter

Chen Chen, partner of iResearch Research, said in an interview with the Daily Economic News that in terms of generation effect, the overall quality of Jimeng's AI images is good, but AI videos still lack details such as duration, richness of elements, and continuity of action.

"(The elements are not rich enough) It is more about the alignment of the model. However, if the ability to associate 'astronaut' with 'spaceship' is missing, it is a problem with the base model's capabilities." A large model engineer pointed out to the reporter of "Daily Economic News" that the number of seconds to generate a smooth video is a key factor in judging the quality of a video generation large model. "'Smoothness' needs to be viewed from multiple dimensions, such as whether the generated content has factual errors, memory ability, and whether the sense of space is correct."

A reporter from the Daily Economic News found through trials that when the same prompt words are input, the longer the video required to be generated, the more likely the accuracy and smoothness of the video's main image and movements will be damaged.

At present, Jimeng supports the generation of 3-second, 6-second, 9-second and 12-second video content, corresponding to different points consumption. As early as when Sora was released, it was already able to synthesize 1-minute long videos; on June 21, Keling launched the image-to-video function, which supports converting static images into vivid 5-second videos based on different text contents, and the continuation function can make the video continue for about 5 seconds, and the longest video can be generated is about 3 minutes.


Will AI video be the gold mine for big models?

The emergence of Sora has undoubtedly opened up a new arena for big models. In July this year, Alibaba Damo Academy released the one-stop AI video creation platform "Xunguang", SenseTime launched the first controllable character video generation big model Vimi for C-end users, and Zhipu also announced that the AI ​​generated video model Qingying (Ying) will be officially launched on Zhipu Qingyan...

As the leading AI players collectively launch a fierce attack on video generation big models, an unavoidable question is before us: Can AI videos make big model companies make money?

Take OpenAI, a star company in the industry, as an example. After launching a number of leading large models such as Sora, in July this year, some media quoted insiders and undisclosed internal financial data analysis as saying that OpenAI may face a huge loss of up to US$5 billion this year. The company's annual revenue is estimated to be between US$3.5 billion and US$4.5 billion, far lower than its operating costs.

At the same time, the domestic video model seems to be a bit "anxious" in commercialization. On July 30, Keling launched a global membership system, which is similar to the membership system it launched in the domestic market. For example, the monthly card is divided into three levels of US$10, US$37 and US$92, which can generate about 66, 300 and 800 5-second videos respectively.

The reporter of Daily Economic News noticed that Jimeng has also launched a membership system, with different subscription methods for basic membership, including 79 yuan per month, 69 yuan for continuous monthly subscription, and 659 yuan for annual subscription. Specifically, basic members can use 505 points to generate about 2020 pictures or 168 AI videos per month. In addition, standard membership with 2020 points per month and premium membership with 6555 points per month will be launched soon.

"Due to the high model training and inference costs of large AI models, coupled with the relatively scattered demand for AI tools among C-end users and their insufficient willingness to pay, the commercialization of large video models in the C-end market will still face a long incubation period." Chen Chen believes that for the C-end market, the commercialization of large video models still has a long way to go.

From the perspective of the B-end market, Chen Chen told the reporter of "Daily Economic News" that "for the B-end, the AI ​​technology revolution is reshaping the original workflow, compressing redundant links and triggering new demands for creative tools. In this process, AI video large models can be gradually combined with existing film and television production, advertising creativity, and media content planning to assist in the automation of complex processes and intelligent content production. Whether the model capabilities can be effectively embedded in the actual workflow, bringing substantial efficiency improvements and cost reductions, is a key factor in building commercial capabilities."

"We will gradually begin to explore the commercialization of Kimi, but it is not the current focus. The current focus is still on creating a next-generation model with more powerful capabilities." In August this year, Dark Side of the Moon said in an interview with a reporter from the "Daily Economic News" that it is not the time to focus on commercialization at this stage.

Perhaps, this is also true for the fledgling Jimeng, which still has milestones to reach and surpass. "Jimeng's current product functions and business models focus on serving UGC (user-generated content), and the ecological integration with Douyin will also be the focus of future development." Chen Chen said, "Perhaps direct benchmarking in technical parameters such as duration, frame rate, and picture details is not what Jimeng needs to pay most attention to at this stage. The key lies in the application landing and ecological integration capabilities."