news

the price war of large models has resulted in negative gross profit

2024-09-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


the elimination of domestic large-scale models is accelerating. this round of elimination will last for one or two years, and only a few truly capable basic model companies will be able to survive.

text | wu junyu, special contributor to caijing

editor | xie lirong

the price war for large models in the chinese market has been going on for nearly half a year. this round of price war has already resulted in negative gross profit, and there is no sign of stopping for the time being. the leading cloud vendors are still planning a new round of price cuts. this round of price cuts will be implemented in late september this year.

in may this year, chinese cloud vendors started a price war for large-model inference computing power. bytedance's cloud service volcano engine, alibaba cloud, baidu smart cloud, and tencent cloud have successively reduced the price of large-model inference computing power by more than 90%.

to use the big model, you need to input the prompt language and get the content output after reasoning. this process will call the api (application programming interface, like a water and electricity switch), and pay according to the number of tokens consumed (token is the text unit of the big model, a token can be a word, punctuation, number, symbol, etc.). this is like paying for water and electricity according to usage.

after the price cut, the consumption of inference computing power is indeed growing rapidly. in august this year, baidu's second quarter earnings call disclosed that the average daily api call times of baidu wenxin big model in may was 200 million, which increased to 600 million in august; the average daily token consumption in may was 250 billion, which increased to 1 trillion in august. bytedance announced in august this year that as of july, the average daily token usage of bytedance doubao big model exceeded 500 billion. compared with may, the average daily token usage of each enterprise has increased by 22 times.

the token price has dropped by more than 90%. this will reduce the inference revenue of cloud vendors in the short term. however, cloud vendors hope to lower the trial and error threshold for enterprise customers in this way, form an exponential computing power consumption of more than 10 times, and ultimately achieve long-term revenue growth.

the price war for inference computing power in the domestic large model market has lasted for half a year. there are currently three basic facts:

first, the price war for inference computing power has already resulted in negative gross profit margins. recently, several cloud vendors, including alibaba cloud and baidu smart cloud, revealed to us that before may this year, the gross profit margin of domestic large-model inference computing power was higher than 60%, which was basically the same as that of international peers. after major manufacturers cut prices in may this year, the gross profit margin of inference computing power fell to negative numbers.

second, compared with openai’s models of the same specifications, the price of domestic models is generally only 20%-50% of that of openai. the gross profit margin of domestic large models is much lower than that of openai. according to a research report released by futuresearch, an international market research organization, in august this year, the gross profit margin of openai’s gpt-4 series flagship model is about 75%, and the gross profit margin of the gpt-4o series main model is about 55%. openai’s comprehensive gross profit margin is at least more than 40%.

third, insufficient model capabilities are an important cause of price wars. a core person in charge of the large model business of a cloud vendor believes that the current domestic flagship model capabilities are generally not as good as openai's gpt-4 series flagship models, so it is necessary to encourage customers to try and make mistakes by lowering prices. as model prices continue to fall, price is no longer the most important factor for corporate customers. the capabilities and effects of the model are what corporate customers care about most.

the inevitable price war

we checked the large model inference prices published on the official websites of alibaba cloud, volcano engine, baidu smart cloud, tencent cloud, and openai. compared with the same specification models of openai, the prices of domestic models are generally only 20%-50%.

for example, the three flagship models of alibaba's tongyi qianwen-max, baidu's ernie-4.0-8k, and tencent's hunyuan-pro have output prices of 120 yuan, 120 yuan, and 100 yuan per million tokens, respectively. the output price of the flagship model of openai, gpt-4-turbo, which they benchmark, is 210 yuan per million tokens (the price on the openai official website is 30 us dollars, which has been converted here at the exchange rate of 1:7 between the us dollar and the rmb). the prices of these three domestic large models are only about 50% of gpt-4-turbo.

for example, the three entry-level models of alibaba's qwen-long, baidu's ernie-speed-pro-128k, and tencent's hunyuan-embedding have output prices of 2 yuan, 0.8 yuan, and 5 yuan per million tokens, respectively. the output price of openai's cheap model openai gpt-4o-mini is 4.2 yuan per million tokens (the price on openai's official website is 0.6 us dollars, which has been converted here at the exchange rate of 1:7 between the us dollar and the rmb). alibaba and baidu's entry-level models are only 48% and 19% of the price of openai's entry-level models.

the price war for large models has already resulted in negative gross profit margins, but this has not stopped cloud vendors from continuing to lower prices.

we have heard that alibaba cloud and other leading cloud vendors are still planning a new round of price cuts, which will be implemented in late september this year. the high-performance flagship model is the focus of this round of price cuts.

the core person in charge of the large model business of the above-mentioned cloud vendor believes that there is not much room for price reduction for cheap small-size models at present, and the last round of price reduction has already reached the "psychological bottom line" of corporate customers. the next focus will be whether the flagship models of each company will continue to reduce prices. the flagship models will also be further subdivided to differentiate into cost-effective versions that can solve most problems, and high-quality, high-priced versions that solve extremely difficult problems.

the computing power for large-scale model inference has reached a negative gross profit, so why continue to reduce prices?

large cloud factories look at the long-term market trend - the computing power structure of cloud computing is undergoing dramatic changes. seizing more inference computing power means seizing more incremental market. international market research organization idc predicts that china's general computing power will grow at a compound annual rate of 16.6% from 2022 to 2027, and the compound annual rate of intelligent computing power will grow by 33.9%. from 2022 to 2027, within intelligent computing power, the proportion of inference computing power will rise to 72.6%, and the proportion of training computing power will decline to 27.4%.

cloud vendors are willing to give up short-term revenue for expected long-term growth. in the short term, the revenue that inference computing power can bring is not much. a technical person from a chinese cloud vendor explained that the revenue from model calls of each company will not exceed 1 billion yuan in 2024, which is limited in the market with tens of billions of revenue each year. cloud vendors are willing to accept short-term revenue losses and business losses in the next 1-2 years. what everyone is betting on is that the number of large model calls will increase exponentially by at least 10 times in the next 1-2 years. in the end, long-term revenue growth can make up for short-term revenue losses.

he further explained that in this process, the computing power cost will gradually be diluted as customer demand grows. the large model business will still have a chance to achieve positive profits in the end. even if the bet does not work, a group of model manufacturers will die in the price war, and the surviving manufacturers will pick up the pieces.

different cloud vendors have different competitive considerations when facing price wars - volcano engine, alibaba cloud, and baidu smart cloud are all participating in a price war that must be fought.

volcano engine is not currently among the top five in china's public cloud market share, but its revenue growth rate will exceed 150% in 2023. the big model is an important opportunity for it to catch up in the cloud market. tan dai, president of volcano engine, mentioned to us in may this year that in march this year he found in silicon valley that the us ai application startups showed the trend of the early days of china's mobile internet from 2012 to 2014. "small ai application startup teams quickly achieved revenue and financing. the chinese market may show this trend in the future. but the premise is that the price of inference must be lowered and the threshold for trial and error must be lowered."

alibaba cloud ranks first in china's public cloud market. faced with price cuts by competitors, alibaba cloud must follow suit. liu weiguang, general manager of alibaba cloud's public cloud division, analyzed to us in june this year that alibaba cloud had gone through multiple rounds of deductions and calculations and found two contradictions:

  • first, after the price cut, the stock revenue will decrease, and the incremental revenue will increase. ideally, the incremental revenue can cover the stock revenue.

  • second, if the price cuts by peers are more aggressive, how should we respond? the final conclusion is that scale is more important than profit now. alibaba cloud wants to use a big model to increase the penetration rate of cloud computing in the entire industry.

baidu smart cloud regards ai as its core strategy. a baidu big model technology leader told us in july this year that big models are a must-fight battle, and price wars must be fought. this strategy has achieved practical results. baidu smart cloud's revenue growth rate in the second quarter of 2024 has rebounded to 14%, the highest point in the past two years. baidu management disclosed in the second quarter 2024 earnings call that the proportion of baidu smart cloud's big model revenue has increased from 4.8% in the fourth quarter of 2023 to 9% in the second quarter of 2024.

an ai strategic planner at a leading chinese technology company analyzed that volcano engine is backed by bytedance, and the parent company's advertising business can provide blood transfusion. volcano engine is not among the top five in cloud market share, and hopes to grab more market share through price wars. alibaba cloud mainly comes from the four major public cloud components (computing, storage, network, and database). the low-price model will promote customer business data consumption, thereby driving the sales of the above basic cloud products. the big model is baidu's core strategy. baidu was the first to deploy the big model business in china. when other competitors decide to engage in price wars, baidu must follow up.

price is not the deciding factor

the other side of the negative gross profit margin of the big model inference price war is that low price is not the main factor in whether enterprise customers use big models.

the core person in charge of the large model business of the aforementioned cloud vendor believes that cloud vendors cannot rely on long-term losses to promote the implementation of the large model industry. low-performance, low-price models are of little significance. insufficient model capabilities are the main reason for the negative gross profit price war. with the sharp decline in domestic model call prices, price is no longer the most concerned factor for corporate customers. the capabilities and effects of the model are what corporate customers care about most.

an it director of an insurance company agrees with this. he bluntly stated that the current it expenditure of the financial insurance industry accounts for about 3%-5% of the company's revenue. excluding 80% of hardware it expenditure, only 20% of it expenditure is actually used for digital transformation. when using new technologies such as big models, the input-output ratio must be calculated. in addition to the explicit model costs, implicit costs must also be considered - big models must be compatible with existing it systems, data governance is required to prepare business data for big models, and a group of product managers who understand ai must be recruited. what he is most concerned about is the model capabilities and actual effects.

the center for foundational models (crfm) at stanford university has been conducting global large-scale model test rankings for a long time. the large-scale multi-task language understanding (mmlu) test ranking as of september 17 shows that the top ten model manufacturers include the claude 3.5 series of ai startup anthropic (invested by amazon), the llama3.1 series of meta, the gpt-4 series of openai (invested by microsoft), and the gemini 1.5 series of google. currently, only alibaba's tongyi qianwen 2 instruct (72b) has entered the top ten among china's large models.

many chinese cloud vendors' big model technology experts expressed the same view to caijing: in the big model market, the low performance and low price strategy is unsustainable. ideally, a healthy and lasting business closed loop should be established by relying on high performance and reasonable prices.

a more valuable benchmark is openai. as of september this year, openai has 1 billion monthly active users and 11 million paying users (including 10 million paying individual subscribers and 1 million corporate subscribers). in may this year, openai management announced that the company's annualized revenue (annualized revenue is the monthly revenue × 12, subscription software companies receive user subscription renewals every month, and have stable revenue expectations, so annualized revenue is often used) reached us$3.4 billion (converted at a 1:7 exchange rate between the us dollar and the rmb, equivalent to approximately rmb 24.1 billion).

the latest research report from futuresearch, an international market research organization, calculated the company's revenue structure based on the annualized revenue and paying user structure announced by openai - 10 million individual subscribers brought in $1.9 billion in revenue, accounting for 56%; 1 million corporate subscribers brought in $710 million in revenue, accounting for 21%; and api calls brought in $510 million in revenue, accounting for 15%.

even after multiple rounds of price cuts, openai can still maintain a relatively healthy gross profit margin. in april this year, the output price of openai's flagship model gpt-4-turbo was reduced by 67%. in august this year, the output price of openai's main model gpt-4o was reduced by 30%. according to a research report released by futuresearch in august this year, the gross profit margin of openai's gpt-4 series flagship model is about 75%, and the gross profit margin of the gpt-4o series main model is about 55%. openai's comprehensive gross profit margin is at least above 40%.

openai has a unique growth environment. it has sufficient computing power, a large number of to c (consumer-oriented) users, and is in the world's largest to b (enterprise-oriented) software market.

openai's success over the past two years has been achieved through "brute force" with huge computing power. chinese companies lack the computing power and financing environment like openai. computing power is the key shortcoming of chinese model manufacturers.

a model technology expert from a chinese cloud vendor explained that over the past year or so, chinese cloud vendors have paid more than 1.5 times the purchase cost for nvidia's ai chips, which has kept the model computing cost high. this will affect the performance ceiling of large models and hinder the industrial implementation of large models. a server dealer introduced that in 2023, the price of an eight-card server equipped with nvidia's h100/h800 series ai chips in the chinese market once exceeded 3 million yuan per unit, which is more than 1.5 times the official price of nvidia.

how can chinese companies find a development path that suits them when computing resources are limited and computing costs are high? this requires careful calculation and tailoring.

in the past two years, the development of large models has followed the scaling law (a law proposed by openai in 2020, literally translated as "scaling law") - model performance is mainly related to the amount of computing, the number of model parameters, and the amount of training data.

the core person in charge of the large model business of the above cloud vendor mentioned that the core principle is to improve data quality and quantity under the constraints of the scaling law, appropriately reduce model parameters, and adopt the moe (mixture of experts, a model design strategy that achieves better performance by mixing multiple professional models) architecture to improve model performance and reduce reasoning costs. there are two solutions for specific business strategies.

  • first, by increasing data quality/quantity, optimizing algorithms and architectures, improving model performance and reducing model size, this can effectively reduce computing power consumption, improve the main application effects, and adapt to mainstream market needs.

  • second, adopt a more precise and segmented model product strategy. do not expect to solve all problems with a few models, but let different models solve different problems. for example, let the cost-effective model cut into the economic market, and let the high-quality model cut into the high-end market.

openai's three models this year, gpt-4, gpt-4turbo, and gpt-4o, have evolved along this line of thought. the model parameters of gpt-4o are smaller than those of gpt-4, but it can accurately solve most daily problems. gpt-4 turbo is used to solve more difficult problems. openai's latest o1-preview has the strongest performance. it has undergone reinforcement learning and is no longer a single model. it will think repeatedly before outputting an answer to enhance the model's capabilities. the output prices of these three models for one million tokens are 70 yuan, 210 yuan, and 420 yuan respectively (openai's official website lists the prices as 10 us dollars, 30 us dollars, and 60 us dollars, which have been converted here at the exchange rate of 1:7 between the us dollar and the rmb).

knockout acceleration

the price war with negative gross profit is accelerating the elimination of large model market. many industry insiders expressed the same view to caijing that this round of elimination will last for one or two years, and only 3 to 5 basic model enterprises will continue to survive.

an xiaopeng, executive member of the china information technology hundred people's forum and director of the alibaba cloud intelligent technology research center, told caixin in july this year that large models require continuous investment, the ability to produce tens of thousands or even hundreds of thousands of cards, and commercial returns. many companies do not have such capabilities. in the future, there will only be three or five basic model manufacturers in the chinese market.

the development of large models requires the purchase of chips and servers, and the leasing of land to build data centers. this part of the investment can even reach tens of billions of yuan each year. these costs will be reflected in the capital expenditures of technology companies. microsoft's fourth-quarter earnings call for fiscal year 2024 disclosed that the $19 billion capital expenditure that month was almost entirely used for computing power investment. in the past year (third quarter of 2023-second quarter of 2024), the capital expenditures of alibaba, tencent, and baidu were as high as 23.2 billion yuan, 23.1 billion yuan, and 11.3 billion yuan, respectively, an increase of 77.1%, 154.1%, and 46.9%, respectively, all driven by computing power investment.

in addition to the continuous investment of tens of billions of yuan in computing power, the large model inference business also needs subsidies of tens of billions of yuan each year. an executive of a chinese cloud vendor analyzed that the negative gross profit of large model calls means that the more calls are made in the short term, the greater the loss. according to the current inference computing power usage, several leading cloud vendors participating in the price war will subsidize more than one billion yuan for large model inference computing power consumption in 2024.

alibaba cloud, volcano engine, baidu smart cloud, and tencent cloud can fight price wars with the help of the group's big models, but it is difficult for big model startups to persist. the ai ​​strategic planners of the above-mentioned chinese leading technology companies believe that in this round of price war, alibaba cloud and volcano engine have the most blood. alibaba can make profits from the cloud, and volcano engine has the blood transfusion of bytedance's advertising business. in terms of price wars, baidu is not as good as alibaba and bytedance. however, baidu's wenxin big model technology is strong, and there will be a group of customers willing to pay for the technology. this will help baidu withstand the price war.

in the short term, large-scale model startups need to rely on large companies and financing to survive. a technical person from a large-scale model startup told caixin in september this year that the "five little tigers" of domestic large-scale models, including zhipu ai, baichuan intelligence, dark side of the moon, zero one everything, and minimax, were all invested by alibaba. one of the investment methods is that the investment amount is paid in the form of computing power, and the invested companies use the computing power of alibaba cloud. whether the "five little tigers" can continue to survive depends to a certain extent on whether alibaba continues to invest.

the technical personnel of the above-mentioned leading cloud vendors and the technical personnel of the above-mentioned large model startups also believe that the large model startups in the chinese market will face challenges in the next two years. it is difficult for them to break through the basic model market. there may be three ways out in the future - either choose to become a government and enterprise project model development company, or turn to the to b vertical industry model, or turn to the to c application market. in fact, market differentiation has already begun. zhipu ai is winning a large number of government and enterprise projects, while dark side of the moon only focuses on the to c market.

editor | qin lixin