news

Llama 3.1 is not selling well! Industry insiders: Open source models are more expensive

2024-08-27

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Yunzhong sent from Aofei Temple
Quantum Bit | Public Account QbitAI

Meta's open source large model Llama 3 has been coldly received in the market, further intensifying the debate over open source and closed source large models.

According to foreign media The Information, Meta's open source large model Llama 3 has been difficult to gain attention on Amazon's AWS, the world's largest cloud vendor. AWS's corporate customersI prefer to use Anthropic's closed-source large model Claude

According to a Microsoft insider,Llama is not Microsoft's first choice for sales, they prefer to promote Llama to companies with data expertise, such as those with in-house engineers and data scientists.

Meta is now facing challenges, which may inspire Meta to build its own sales team for AI products to meet the needs of enterprises. This series of problems also highlights the difficulties of commercializing large open source models. From the perspective of market selection, the actual effect and commercial returns of open source models may not meet the expectations of enterprise customers.

face"Open source or closed source", domestic major model manufacturers have formed completely different positions based on their own technical routes and business strategies. So, how should enterprises choose large models, and how to find the best balance between the two?

In this context,Xin Zhou, General Manager of Baidu Smart Cloud AI and Big Model PlatformHe accepted an interview with the media and gave a detailed analysis of the underlying logic and business strategies of the open source and closed source debate, as well as predictions for the future market.

Xin Zhou believes thatThere is an essential difference between open source of large models and open source of software——Since open source models do not open up key information that affects model effectiveness, such as training source code, pre-training and fine-tuning data, they cannot rely on community developers to improve effects and performance like open source software. The training of the base model can only be controlled by the manufacturer itself.

When asked whether the open source model or the closed source model is more expensive, Xin Zhou said that the open source model is free, which gives people the impression of low cost, butThe application of big models is not just a single technology, but a complete solution covering "technology + service". Enterprises need to calculate the "general account". When the business is actually implemented, if the open source model wants to achieve the same effect as the closed source model, it will require a lot of manpower, funds and time to invest in the future, and the overall cost will be higher.

What scenarios are the open source model and the closed source model suitable for? Xin Zhou believes thatThe open source model is more suitable for academic research, but not for large-scale commercial projects that provide external services. In some serious projects with millions or even tens of millions of dollars invested, the closed source model is still the main focus.

“The open source model is not cheap”

The following is an edited transcript of the conversation:

1. What roles do various model manufacturers play in the large model market? What is their business model?

Xin Zhou: In this big model feast, each manufacturer's positioning and business model are different, which can be roughly divided into three categories:

For the first type of role, for cloud vendors, the business model is actually to sell computing resources. Reducing costs and improving resource elasticity through scale to achieve profitability is a long-lasting model for cloud vendors. Regardless of whether it is an open source model or a closed source model, as long as it is hosted by a cloud vendor, the cloud vendor can make money.

The second type of role is both cloud vendors and model vendors. They hope to drive business to the cloud by calling models. Currently, the profit from calling model APIs alone is still very low. They currently hope to occupy a favorable share in the market and constantly look for new expansion opportunities on the big model table.

The third type of role is for entrepreneurial model manufacturers. After major cloud vendors announced model price cuts, their call volume dropped sharply. The field of large models will soon become a battle among several major cloud vendors. Large model startups will either focus on specific industries, do toB privatization projects, or transform to toC products.

2. Why is it said that “the open source model is not cheap and the technology will become increasingly backward”?

Xin Zhou: Let’s talk about the problem of backward technology first.

First, the open source of large models does not improve the model effect

contrastOpen Source SoftwareFor example, the Android mobile operating system and the MySQL database software are all open-source software, and developers from all walks of life can participate in the development of the code. This can not only reduce the cost of software development, but also speed up software iteration and improve software security. This is the value of open source to software.

The open source model is much more complicated, which can be open source including model training source code, parameter weights, training data, etc.However, currently model manufacturers usually only open source parameter weights, while training source code and training data are not open source. This makes it impossible for developers to improve it and contribute to the effectiveness of the open source model.

For example, for Llama, every improvement in its model performance is actually the result of Meta's own training, not the result of developer participation. Llama2 and Llama3 are not much different in network structure. What does it optimize? On the one hand, it optimizes the process of the training phase, such as multi-stage training; on the other hand, it adds a lot of data. The data of Llama2 and Llama3 differ by an order of magnitude. More data and training time bring better results to the model.

But these good effects are all created by Meta itself. It is impossible to utilize the power of developers, and there is no community feedback process like open source software.

Second, the open source model will become increasingly outdated because there is no sound business model to ensure the continuous iteration of the model.

Model training and data labeling are very expensive. Unless you have strong company resources like Meta to support the continuous development of open source models, if you are a startup company with open source models, you will not be able to form a closed business loop. At the same time, developers cannot contribute to the effectiveness of your model, so startups doing this will definitely fall further and further behind. From the results, the best model is actually Open AI, and the models at the top of the evaluation list are all closed source models.

Let’s talk about why the open source model is not cheap.The application of big models is a complete solution covering "technology + service". Enterprises need to "calculate the total account" when applying big models.. How to calculate the total bill?

The first layer is to calculate the hardware resource costBecause closed-source business models will be equipped with corresponding tool chains, including training tool chains and inference tool chains, the performance of these tool chains is better than that of open source ones. For customers, training can save about 10-20% of hardware costs, and more can be saved during inference. The larger the business scale, the more savings.

The second layer is to look at the business benefits brought by the modelFor models with the same parameter scale, closed-source models have better performance. Some customers are not so sensitive to 90% or 95% accuracy. However, for some businesses, such as commercial advertising, a difference of one point in CPM or CTR may result in a difference of tens of millions per day for advertising platforms. At this time, companies with higher requirements for model performance are more willing to buy a closed-source model with better performance.

The third layer is opportunity cost and human cost.. If you use a closed-source business model, you can converge faster and launch new products faster than your competitors. In a closed-source business model, the manufacturer has adapted the model and hardware to the optimal state, and customers can simply copy the mature experience. But if you use open source, you still have to adapt and adjust it yourself, and the cost of computing power and engineers is higher.

So when we say that the enterprise application model needs to "calculate the total ledger", the resulting total ledger will be very different.

3. Why is open source so much more expensive than closed source in terms of hardware costs?
Xin Zhou: Most corporate customers will purchase two or more types of hardware because they have to consider the security and flexibility of the supply chain. If the open source model needs to be adapted to each hardware, its cost will be very high.

This shows the advantage of the closed-source business model, because it can share the cost of hardware and software adaptation through large-scale sales. Moreover, multi-core adaptation is a very technical matter.Baige Heterogeneous Computing PlatformA lot of optimizations have been done specifically for multi-core heterogeneity, and it is adaptable to various hardware. Baige itself can shield various differences in the hardware layer, and has many acceleration libraries, reasoning libraries, and training libraries. Baige also provides end-to-end optimization for the Wenxin large model.

The benefit for customers is that no matter what hardware is used, it can run quickly, and the time and labor costs saved are very high.

4. In what scenarios are the open source model and the closed source model respectively applicable?

Xin Zhou: The general idea is: if you want to try and verify in individual business scenarios, you can first use a closed-source model to run it out of the box and quickly verify it; in some serious commercial projects with millions or tens of millions of yuan, in businesses with high requirements for scale and precision, closed-source business models are still the best choice for enterprises. Only in some business scenarios that do not require high effects and performance, but require private deployment and are particularly sensitive to price, consider using open-source models.

Open source is valuable for promoting academic and research, such as the optimization of engineering performance of reasoning, the impact of pre-training and fine-tuning data on results, etc. If he can open source more things, such as training code, training data, instruction fine-tuning data, etc., it will be more valuable for academic research and technological development. Even if only the model weights are open, it provides a good base model for researchers.

5. Some manufacturers hope to pursue both open source and closed source paths at the same time, that is, the open source model attracts users to expand the ecosystem, while the closed source model is specifically responsible for commercialization. Does this logic work?

Xin Zhou:If you haven't practiced it, it seems feasible. But the reality is:

In the public cloud, the call volume announced by various manufacturers shows that the closed-source model call volume is much higher than the open-source model, which shows that the open-source model does not play a role in attracting users to expand the ecosystem on the public cloud.. Moreover, fine-tuning can be done on public clouds, both open source and closed source models can be implemented, so customers will directly choose the best model on public clouds.

In private deployment, this logic makes sense to a certain extent.Many companies start by testing open source models, and then decide to buy the model if they think the results are good. They will choose the closed source model of the manufacturer corresponding to the open source model, because the same model has better adaptability to prompts. This logic is valid in this case.But this value is gradually decreasing.Because the universal capabilities of models from various manufacturers are rapidly improving and the switching costs are getting lower and lower, the inheritance of this model is gradually being eliminated.

Some manufacturers launch open source models to promote hardware. For example, NVIDIA launched an open source model. Its business logic is very simple. You have to buy a card to use the model.

6. Why hasn’t Baidu launched an open source model?

Xin Zhou: It is clear from the mobilization volume of various manufacturers that the commercial closed-source models have the largest number of calls on public clouds, and the open source model does not have much impact on public clouds.

In the private market, as customers' understanding of large models continues to improve, open source and closed source are no longer key factors.After communicating with many large enterprise customers, I found that there are many factors that determine whether a business manager uses a model. The priority order is usually: effect, performance, security, and price. Whether the model is open source or closed source is not a decisive factor.

7. You mentioned that when companies choose models, they value effect, performance, security and price the most. Is the "Qianfan Large Model All-in-One Machine" launched by Baidu Cloud trying a new business model that integrates software and hardware?

Xin Zhou:At present, enterprises are still in the exploratory stage of using large models, and they are in great need of low-cost, out-of-the-box products to quickly verify the use scenarios and effects of large models. The "Qianfan Large Model All-in-One Machine" is very suitable for the current stage, because there are many needs for private deployment in China. Our all-in-one machine is open and can be adapted to various hardware. It integrates all mainstream chips and models on the market. Baidu Smart Cloud's Qianfan Large Model All-in-One Machine provides two capabilities:

First, provide an integrated platform for software and hardware adaptationThis platform has built-in Wenxin big models and the industry's mainstream open source big models and scenario application showrooms. Popular open source models have also been adapted and optimized, and users can run them directly on the all-in-one machine without having to adjust the model themselves. At the same time, Qianfan big model all-in-one machine can provide a big model hardware and software integrated solution from basic management and control, AI framework, model training, prediction and reasoning, and scenario application, providing customers with full-process hardware and software services.

secondQianfan large model all-in-one machine has done end-to-end performance optimization, which can squeeze out all the hardware performance, so it has a very high cost performance. Customers can use it quickly at a low cost.

In terms of overall price, the price of Qianfan all-in-one machine is much lower than purchasing servers and large models and platforms separately, and it can be used out of the box for customers.

8. Many people now think that it is not enough to just use basic large models. We still need to create industry models to truly realize the industrial implementation of large models. So how much does it cost for enterprises to train an industry model themselves?

Xin Zhou: The cost is very high. First, it depends on the parameter scale of the model to be trained, and this cost increases linearly. Second, it depends on how much data there is. Finally, there is the cost of labeling your data.

If you want to train a 70-byte model from scratch, it may cost 30 million using cloud elastic resources. If you want to train a model with a larger number of parameters, the cost may be hundreds of millions. This is for experienced people to train. If you are inexperienced and take some detours in the middle, the cost will be even higher.

9. With such high costs, how do companies determine whether they need to create an industry model?

Xin Zhou: We do not recommend that customers build an industry base model from the beginning regardless of the consequences. The benefits are another matter, but the cost is bound to be very high. We will help customers do demand analysis first.

For example, let's draw a coordinate system, with the horizontal axis representing the sensitivity of the task and the vertical axis representing the demand for industry data. The so-called task sensitivity refers to whether the scenario is strongly related to the industry and business. For example, in the medical field, these are all very professional issues. The vertical axis represents the demand for industry data. The more closed the industry is and the less data is available on the public network, the more pre-training is needed. For example, in the medical field, some desensitized medical record information needs to be pre-trained into the model.

Through analysis, we can see that in this coordinate axis, the lower left corner has neither industry characteristics nor the need for industry data, so the general model can be used directly. However, the upper right corner is sensitive to the business attributes of the industry and requires a lot of industry data. At this time, an industry model is needed.

We usually recommend that companies proceed in three steps.

The first step is to verify the value. Initially build the software and hardware infrastructure of the big model and construct the preliminary industry big model. Combined with relatively mature generative AI applications, the results can be seen quickly. For example, through the lightweight version of Qianfan big model platform plus mature applications such as intelligent customer service, enterprise knowledge management, and digital human.

The second step is to deeply connect with various applications of enterprisesThe large model infrastructure has been further improved and upgraded to the Qianfan Large Model Flagship Edition. In addition to large model-related training and tuning, it also includes an application construction platform. Baidu and its ecological partners are deeply involved in the internal training and operation of large models in enterprises, building a technical atmosphere, training relevant talents, and working with enterprises to tackle the difficulties of deepening their business and bring more value to enterprises.

The third step is comprehensive innovation and independent controlThe company has mastered the relevant technologies for large-scale model and application development, and has a corresponding talent pool, which enables it to develop in a more autonomous and controllable manner and start comprehensive innovation. Baidu will serve as a long-term technical support and consultant to assist in development and continuously bring new technologies and solutions to the company.

10. What is your judgment on the large model market in the coming year?

Xin Zhou: I have three judgments about the development trend in the coming year:

First, multimodality will become a new hot spot in the market.

Second, applications based on large models will have a big explosion, and a very important direction is Agent (intelligent body)If a large model only performs the prescribed actions of "input and output", its value will be greatly limited. It should be more like a human being, using tools, collaborating with each other, planning, thinking, reflecting and iterating. It must be combined with various components and plug-ins to meet the needs of specific business scenarios, so Agent will become the key to the subsequent volume growth of various model manufacturers.

Third, there will be more opportunities for enterprise applications, such as knowledge bases, customer service, digital humans, and assisted code writing.For example, Baidu has a product called "Wenxin KuaiMa" that uses large models for code writing. It has been widely used within Baidu, with an adoption rate of 46% and a 30% generation rate in new code, which can help companies significantly improve development efficiency. At the same time, a large number of companies engaged in AI application development will emerge. These companies can reduce the deployment and replication costs of applications to a sufficiently low level. As long as the operational efficiency is high enough, they can stand out.