Motiff released the first large UI model in China, with a performance score that surpassed Apple and GPT-4o

2024-08-17

The development speed of artificial intelligence may exceed your imagination. Since GPT-4 introduced multimodal technology to the public, multimodal large models have entered a rapid development stage, gradually shifting from simple model research and development to vertical field exploration and application, and deeply integrating with all walks of life. In the field of interface interaction, international technology giants such as Google and Apple have invested in the research and development of UI multimodal large models, which is regarded as the only way for the mobile phone AI revolution.

Against this backdrop, China’s first large UI model was born. On August 17, at the IXDC2024 International Experience Design Conference, Motiff, a design tool in the AI era, launched its self-developed multimodal UI large model, the Motiff Large Model. This is the world’s first large model developed by a UI design tool company, marking the rise of Chinese UI design power on the global stage.

At the IXDC conference, Motiff Vice President introduced the first large UI model in China - Motiff Large Model

The Motiff model has excellent UI understanding capabilities and the ability to execute open commands. In the five industry-recognized UI capability benchmark test sets, Motiff's various indicators exceeded GPT-4o and Apple's Ferret UI. At the same time, it also surpassed Google's ScreenAI in the two major indicators of Screen2Words (interface description and inference) and Widget Captioning (component description). Among them, the Widget Captioning indicator is as high as 161.77, refreshing SoTA. Compared with existing solutions such as Ferret UI and ScreenAI, the Motiff model can flexibly understand interface elements based on context, reaching the level of "design expert", which is closest to human understanding and expression of UI interfaces.

In the authoritative UI index comparison, all indicators of Motiff's model exceeded GPT-4o and Ferret UI

The big model that understands UI the most, with expressions that are highly close to humans, and the cornerstone of the future interface interaction revolution

At the IXDC conference, Zhang Haoran, vice president of Motiff, introduced the Motiff big model in detail. It has two major capabilities: understanding the user interface and interactive navigation, and is expected to lead the interface interaction revolution. "Human creation starts with cognition and understanding, and UI creation in the AI era will also start with a big model to fully understand the user interface." Zhang Haoran said.

Motiff's model is excellent in understanding user interfaces, comparable to a "design expert". It can not only identify all images, icons, texts and more than 40 fine-grained UI components in the interface, but also accurately mark the regional coordinates of different elements on the interface. In addition, it can also answer various questions related to the user interface, and infer functions based on interface information and describe interface content in detail.

Compared with large models such as GPT-4o, Ferret UI and ScreenAI, Motiff's large model also has significant advantages in interface analysis capabilities. For example, in the APP Store application interface, Motiff's large model can divide the page into multiple modules such as the top navigation bar and application information module from the perspective of UI design, and analyze the functions and layout of each module in detail, which helps to provide design suggestions, automatically generate UI design prototypes, etc. Motiff is at the industry-leading level in interface analysis capabilities and is the multimodal large model that understands UI design best.

Motiff's large model can answer a variety of questions about UI interfaces

The Motiff model is also the closest to humans in terms of understanding and expression capabilities. Previous solutions (such as Ferret UI and ScreenAI) have difficulty understanding the meaning of icons based on context. The Motiff model has collected a large amount of high-quality UI field data through manual annotation and other methods, and can understand and point out the multiple meanings of the same icon in different interfaces, significantly improving the accuracy and contextual relevance of the description.

The data in Figure 1 is generated by Google ScreenAI, which mistakenly interprets the heart-shaped icon as "heart" instead of "collection"; the data in Figures 2, 3, and 4 are generated by the Motiff model, which can accurately describe the meaning of the icons in combination with the interface information.

Motiff also has interactive navigation capabilities, which can prompt operation steps according to user needs and complete related operations on behalf of users after obtaining permission. This lays the foundation for the future interface interaction revolution. In the future, users will no longer need to click on the screen manually, but only need voice or image input to operate the device. Mobile assistants such as Siri may become the new entrance to all apps, and real smartphones and computers will be born, and a new paradigm of software application and a new era of interface interaction will also be opened.

In addition, the Motiff model also successfully controlled the error rate within the single digit. The industry believes that the sharp drop in the error rate marks the progress of AI from an auxiliary tool to the "technical singularity" of independently completing work. At present, one of the core problems facing large models is the high error rate. For example, GPT-4 has an error rate of 30% to 40% in multiple indicators, and the error rate in the UI field is even more than 70%. In contrast, the Motiff model has reduced the error rate to less than 15%, and the error rate of individual indicators is only 7%.

Why can AI application companies independently develop world-leading large models? Zhang Haoran said at the conference that this is due to the continuous pursuit of "making products better." "As a leader in AI products, Motiff is committed to breaking through technical bottlenecks based on application scenarios and continuously improving the requirements for AI capabilities." He said.

Motiff can "beat" GPT-4o, Apple Ferret UI and other international leading models in the UI field, thanks to its long-term technical accumulation. Since its establishment in 2021, Motiff has been focusing on interface interaction and design. Its parent company, Yuanfudao Group, established the AI Lab in 2014 to focus on the frontier exploration of AI technology. In 2018, Yuanfudao Group ranked first in the world in the well-known machine reading competition MSMARCO. At that time, the machine reading comprehension ability had surpassed Baidu and Microsoft.

AI efficiency improvement + generation creates the best assistant for designers and optimizes software development workflow

There is no doubt that the release of Motiff marks the GPT moment in the field of interface design. UI design requires precise processing of visual elements and a deep understanding of user interaction logic. Motiff significantly improves the efficiency and generation capabilities of design tools.

In Motiff's AI UI generation function, users only need to enter a command, and in less than 30 seconds, Motiff can generate two versions of design drafts. According to recent blind reviews by more than 30 industry professionals, the two versions of design drafts generated by Motiff are better than the previous leading AI UI generation tool Galileo AI. After the release of this function, it quickly became a leader in the market. This advantage is directly derived from the UI professional capabilities of Motiff's large model. It is reported that the AI UI generation function has been opened to global users on August 17. Just register a Motiff account to experience it for free.

In addition, the accuracy of component recognition in the AI design system function has also been significantly improved. With the support of the big model, designers can complete the work that used to take at least several weeks in just a few minutes. The AI copy function has an accuracy rate of over 97% for images and text in the design draft.

"Motiff will plan to open its large model capabilities to medium and large enterprises, and work with customers to create a new interface production relationship in the AI era." Zhang Haoran pointed out at the meeting that Motiff's large model can help optimize the software development workflow and effectively shorten the existing interface production process.

Motiff has previously created many "firsts" in the interface design industry. It not only pioneered multiple AI functions, but is also the first interface design software in China to use a self-developed graphics rendering engine, and is the only high-performance product in the world that can be smoothly edited with 1 million layers on a single canvas. Motiff is known as the "AI version of Figma" on overseas social media. While increasing productivity by more than 100 times, its price is more than 80% lower than Figma, and it has topped the daily and weekly lists of the most popular products in July on the well-known product release platform Product Hunt.

(There are a lot of posts comparing Motiff and Figma on overseas social platforms. Motiff is considered an AI tool that all designers need to know.)

news