news

Li Mu's entrepreneurial year: Zhang Yiming and Su Hua gave advice, and Huang Renxun helped with the card

2024-08-15

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

This is almost the most sincere and informative review of big model entrepreneurship to date.

On the evening of August 14, 2024, Li Mu published a review article on one year of entrepreneurship in his own column on Bilibili and Zhihu: "One Year in Entrepreneurship, Three Years in the Real World", sharing his progress, struggles and reflections on the first year of his big model entrepreneurship.

In the article, he reviewed his entrepreneurial journey:

From the initial idea of ​​starting a business to making a productivity tool for large models, to meeting Zhang Yiming and being "enlightened" by him, I decided to directly make the model itself;

He was stood up during the financing process, and because he was a first-time entrepreneur, he was a little hesitant and was unable to "get 1 billion in cash" like some of his peers;

I contacted Huang Renxun directly to get the H100 that he "arranged" for me, but I found that these cards had a lot of bugs during training;

Then, we finally found a way to break even in commercialization and continued to move towards the goal of "intelligent entities accompanied by humans."

While sharing the pitfalls he encountered, Li Mu kept asking himself this year: "Why do you want to start a business?" He was "questioned" by Su Hua, and was inspired by Cai Haoyu's casual comments on his company. In the end, his answer to this question was very Li Mu:

If I were to answer this question again today, I would say, "I just lost my mind."

But he also said, "My deeper motivation is the fear that my life might have no meaning."

"So what is the meaning of a person's existence? When I was a child, I was depressed because I couldn't figure this question out. So I want to create value and gain the meaning of existence. I choose to be "progressive" to improve my ability to create value; I choose to record long videos and write teaching materials to create educational value; I choose to write summaries of my doctoral studies, work, and entrepreneurship, describing the entanglements and difficulties in them, creating the value of real cases; I choose to start a business and unite the strength of many people to create greater value."

The following is the full text of Li Mu's review, the article is reproduced from Li Mu.Editor's notes in brackets are some additional information:

Report to your friendsLLMProgress, struggles and reflections in the first year of entrepreneurship

I thought about starting my own business when I was in my fifth year at Amazon, but the pandemic delayed my decision. When I was in my seventh and a half year, I felt itchy and quit. Now I think if there is something I must try in my life, I should do it early. Because once I really start, I will find that there are too many new things to learn, and I will always lament why I didn't start earlier.


Li Mu is a well-known scholar in the field of AI. He left Amazon in 2023 to found Boson.ai. Previously, he served as Amazon's chief scientist and one of the authors of the artificial intelligence framework Apache MXNet. He studied undergraduate at Shanghai Jiao Tong University and Carnegie Mellon University.He graduated with a doctorate in AI and taught at Berkeley and Stanford University. He continues to update the series of videos "Learn AI with Li Mu" on Bilibili, and currently has 800,000 followers, which makes him known as the "cyber mentor" by many young people in the AI ​​field.
1
The origin of the name: BosonAI

Before starting my own business, I worked on a series of projects named Gluon. In quantum physics, Gluon is a boson that binds quarks together, symbolizing that this project was originally a joint project between Amazon and Microsoft. At that time, the project manager came up with the name on the spur of the moment, but naming is difficult for programmers, and we struggled with various file names and variable names every day. In the end, the new company was simply named Boson. I hope that everyone will smile when they get the joke that "Boson and fermions make up the world." But I didn't expect that many people would mistake it for Boston.

"I'm in Boston, let's meet up sometime?" "Huh? But I'm in the Bay Area."

1

Financing: Lead investor absconded the day before signing

At the end of 2022, I came up with two ideas to use the Large Language Model (LLM) as a productivity tool. I happened to meet Zhang Yiming and asked him for advice. After the discussion, he asked: Why not do LLM itself? I subconsciously backed off: Our team at Amazon had been doing this for several years, with tens of thousands of cards and a lot of difficulties.

Yiming chuckled and said: These are just short-term difficulties, we need to look at the long-term.

My advantage is that I listened to advice and actually went for LLM.After gathering the founding team of data, pre-training, post-training, and architecture leaders, we went to raise funds. Luckily, we got seed investment very quickly. But the money was not enough to buy cards, so we had to go for the second round. The lead investor in this round was a very large institution, which took several months to prepare documents and discuss terms. But the day before signing, the lead investor said he would not invest, which directly led to the withdrawal of several follow-up investors.I am very grateful to the remaining investors for completing this round and getting the admission ticket to become an LLM.

If I reflect on it today, I think we could have continued to raise funds while the capital market was still enthusiastic.Maybe, like other friendly competitors, he now has 1 billion in cash on hand.At the time, I was worried that if I raised too much money, it would be difficult to exit, or that I would be pushed to the sky. Now that I think about it, starting a business is about changing your fate, so why would I want to retreat?

1

Machines: The first to try out new things

When I had the money, I bought a GPU. I asked various suppliers, and they all said that H100 would be delivered a year later.I had an idea and wrote an email to Huang. Huang replied immediately and said he would come to take a look. An hour later, AMD's CEO called me. I paid a little more money, jumped the queue, and got the machine 20 days later. I was honored to have eaten crabs early.


This is not the first time that Li Mu has sent an email to Lao Huang. He has previously sent an email to Lao Huang to help his researcher friends attend the forum through the backdoor at a closed-door event at NIPS. It has been rumored among his "fans" that he is someone who can have a meal with Lao Huang at any time.

We ate so many crabs that we doubted our lives. We encountered all kinds of incredible bugs. For example, insufficient GPU power supply caused instability, and later Super Micro engineers modified the BIOS code to patch it. For example, the fiber cutting angle was wrong, which caused unstable communication. For example, Nvidia's recommended network layout was not optimal, so we made a new plan, and later Nvidia adopted this plan. I still don't understand it. We bought less than a thousand cards, so we are a small buyer. But didn't the big buyers encounter these problems? Why do we need debug?


When discussing with some people in the industry, I found that some people have already "answered" Li Mu: many large companies have experienced these problems earlier, but they did not report these problems and solutions, but treated them as a technical threshold, leaving the bugs to their competitors.

At the same time, we also rented the same number of H100s, and we also encountered various bugs. The GPU had problems every day, and we even wondered if we were the only ones who tried this cloud. Later, I saw the technical report of Llama 3, which said that after they switched to H100, the training of the model was interrupted hundreds of times. I empathized with the pain between the lines.

If you compare building your own cloud and renting a card, the cost of renting for three years is about the same as building your own cloud. The advantage of renting a card is that you don’t have to worry about it. There are two advantages to building your own cloud. One is that if Nvidia’s technology is still far ahead after three years, it can control the price so that the GPU still maintains its value. The other is that the cost of self-built data storage is low. Storage needs to be close to the GPU, and the storage price is high regardless of whether it is a large cloud or a small GPU cloud. However, a model training can use several TB of space to store checkpoints, and the training data storage starts at 10PB. If you use AWS S3, 10PB costs two million a year. If you use this money to build your own cloud, you can get up to 100PB.

1

Business: Grateful to customers, breaking even in the first year

Fortunately, our income and expenses broke even in the first year.

Our expenses are mainly on manpower and computing power. Thanks to OpenAI's financial resources and Nvidia's leading position, these two expenses are quite large. Our income comes from customizing models for major customers. Most of the companies that started LLM early on have CEOs who are very decisive. They are not scared by the high computing power and manpower costs, and they decisively push the internal team to cooperate and try new technologies. I am very grateful to the customers for giving us some time to breathe, otherwise I would have been running around to various investors in the past few months.

In the future, more companies will try to use LLM, whether it is to upgrade their own products or to reduce costs and increase efficiency. The reason is that on the one hand, the cost of technology is decreasing, and on the other hand, industry leaders (such as our customers) will release LLM-based products one after another, which will attract the industry.

We are also paying attention to the implementation of LLM in toC. The last wave of top players such as c.ai and perplexity are still looking for business models, but there are also about ten LLM native applications with good revenue. We provided a model to a role-playing startup. They focus on deep players and broke even on revenue and expenses, which is also impressive. Model capabilities are still evolving, and more modalities (voice, music, pictures, videos) are being integrated. I believe that more imaginative applications will appear in the future.

In general, the industry and capital are still impatient. This year, several companies that were established for more than a year but raised billions of dollars chose to exit. It is a long process from technology to product, and it is normal to take 2 or 3 years. Taking into account the emergence of user needs, it may take longer. We focus on the present and explore the way in the fog, and remain optimistic about the future.


Commercialization is a sharp sword hanging over the heads of almost all LLM companies. Li Mu casually revealed that the company has broken even. In the first year, BosonAI chose two types of business: customizing models for large customers and providing base models for startups. In fact, this idea is very pragmatic, go wherever there is money. The experience of being stood up by investors seems to have also affected Li Mu's choice of commercialization. He hopes to "support" himself and buy time and space for technological progress.

1

Technology: Four stages of LLM cognition

The understanding of LLM has gone through four stages. The first stage is from Bert to GPT3. The feeling is that it is a new architecture and big data. When we were at Amazon, we also did large-scale training and implemented it in products as soon as possible.

The second stage was when I just started my business and GPT4 was released, which shocked everyone.Most of the reasons are that the technology is not open to the public. According to rumors, it costs 100 million to train a model and tens of millions to label data.Many investors asked me how much it would cost to replicate GPT4, and I said 300-400 million. Later, one of them actually invested hundreds of millions.

The third stage is the first six months of entrepreneurship.We can’t do GPT4, so let’s start with specific problems.So we started looking for clients, including those in gaming, education, sales, finance, and insurance, and trained models based on their specific needs.At the beginning, there were no good open source models on the market, so we trained from scratch. Later, many good models came out, which reduced our costs.Then, we design evaluation methods based on business scenarios, label data, and find out where the model is failing, and make targeted improvements.


In half a year, Boson quickly changed from closed source to open source, with everything based on results and customers. On the contrary, practitioners like Li Mu who have a deeper understanding of AI development are less concerned about the so-called open source and closed source debate at this stage.

At the end of 2023, we were pleasantly surprised to find that our Photon (a type of Boson) series models outperformed GPT4 in customer applications. The advantage of custom models is that the inference cost is 1/10 of calling the API. Although APIs are much cheaper today, our own technology is also improving, and the cost is still 1/10. In addition, latency and other factors can be better controlled. At this stage, we know that for specific applications, we can outperform the best models on the market.

The fourth stage is the second half of entrepreneurship.Although the client got the model required in the contract, it was not what they wanted because GPT4 was far from enough. At the beginning of the year, it was found that it was difficult for the model to make another leap forward after training for a single application.Looking back, if AGI reaches the level of ordinary humans, customers want the level of professionals. Games need professional planners and professional actors, education needs gold medal teachers, sales need gold medal sales, and finance and insurance need senior analysts. This is all AGI plus industry professional capabilities. Although we were full of awe for AGI at the time, we felt it was unavoidable.

At the beginning of the year, we designed the Higgs (God particle, a type of Boson) series of models. The main general ability is to follow the best model, but excel in a certain ability. The ability we chose is role-playing: playing a virtual character, playing a teacher, playing a salesperson, playing an analyst, etc. In the middle of 24 years, it was iterated toSecond GenerationOn Arena-Hard and AlpacaEval 2.0, which test general ability, V2 competes with the best model on par, and is not far behind on MMLU-Pro, which tests knowledge.

Higgs-V2 is based on Llama3 base, and then complete post-training. We don’t have the resources to spend a lot of money to annotate data like Meta, so V2 is better than Llama3 Instruct, and the reason should still be mainly due to the innovation of the algorithm.

Then we made aA collection of benchmarks for evaluating role-playing, including role-playing according to the character setting and role-playing according to the scene. I feel a little embarrassed that my model took the first place on my own list. However, the evaluation data was not touched during the model training. Because this evaluation set is intended for personal use, I hope it can truly reflect the model's capabilities, so we must avoid overfitting the model to the data set. But the students who made the evaluation set wanted to write a technical report, so they just released it. Interestingly, the test samples for role-playing come from c.ai, but their model capabilities are at the bottom.

The fourth stage of cognition is that a good vertical model must have good general capabilities, such as reasoning and instruction following, which are also needed in the vertical. In the long run, both general and vertical models must move towards AGI. However, the vertical model can be slightly more specialized, with high scores in professional courses and okay general courses, so the R&D cost is slightly lower and the R&D method will be different.

What about the fifth stage of understanding? It is still in progress and I hope to share it soon.

1

Vision: Human companionship

To be honest, we focus on technology and customization for customers, and then slowly think about what vision we want to pursue. We look at what customers want, what we want, and what we may need in the future. For me, many years ago, I dreamed of having a robot nanny to help me take care of my children and accompany them, because I think it is difficult to do this, and I don’t quite understand the current cognition and thinking of children.

I hope to have a very powerful virtual assistant at work who can invent new things with me. When I get old, I also want to have interesting robots to accompany me. My prediction for the future is that as production tools become more and more advanced, one person can complete tasks that previously only a team could complete, which will lead to more individual independence of human beings. Everyone is busy pursuing their own things, which makes them more lonely.

Putting all these together, we have defined our vision as "intelligent agents that accompany humans". An intelligent agent with high emotional intelligence and online IQ. If it is replaced with a real person, it should be a professional team. For example, if you want it to play with you, then it is a professional planner + actor. If it wants to accompany you in sports, then it is a motivator + professional sports coach. If it wants to accompany you in learning, then it can explain things you don't understand. The advantage of the model is that it can provide long-term companionship and really understand you. And it can "sincerely serve you".

However, the current technology is still far from the vision. The current technology can accompany the chat. In many scenarios, the chat is not so good, the content is scarce, and the IQ and EQ are sometimes not online. These are the problems that need to be solved now. If there are friends who are doing overseas applications in this area, please contact us.

1

Team: Challenging things require teamwork

After starting a business, I really felt the importance of a team. When I was in a big company, I felt like a screw, the team members were screws, and even the team was a screw. But a startup team is like a car. The car is smaller, but it can run, carry loads, turn flexibly, and go to every corner.Not long after the company was founded, MiHoYo’s Mr. Cai came to take a look. Seeing that everyone was in one room, he said with emotion that it was great to have a small team.


The two are alumni of Shanghai Jiao Tong University. Li Mu is an undergraduate of 2004, and Cai Haoyu is an undergraduate of 2005. They both stayed in school to study for a master's degree.

Of course, there are some inconveniences. You have to check whether there is gas at all times. Be careful not to shake the car apart on difficult roads. Every member is important. There is no redundancy. If one person is not strong enough, it may be a flat tire. People are also precious. If one person leaves, it may be one less tire.

In the past, I would choose projects that I could lead the development of. But this also means that the problems would not be too far beyond my ability. When starting a business, I chose a big problem to work on, and I can only rely on the team. Although there are a lot of "I" in this article, in fact, the work is done by the team. Without the team, I might have to switch to selling courses (no applause is needed here).


Li Mu's AI courses on Bilibili are the most popular courses on the Chinese Internet. Many people call him "a mentor I have never met." These courses are easy to understand, full of patience, and often share industry gossip.

1

Personal pursuit: fame or fortune?

So far, I have made decisions based on my inner voice, such as studying for a doctorate, making videos, and starting a business after work. Entrepreneurship requires strong motivation to overcome the endless difficulties. This requires a deeper analysis of one's own motivations.

Motivation comes from either desire or fear. Ten years ago, I might have been more interested in fame and fortune, but at my age, I feel that the marginal utility of money is not high, and the emotional value of fame is also very small. My deep motivation comes from the fear that life may be meaningless. Not to mention the vastness of the universe, even in the long river of human history, a person is just a grain of sand. Accidents come and go quickly. One hundred billion people have lived on the earth, and most of them will not leave a mark in history. I hardly recognize the thousands of names in my family tree.

So what is the meaning of a person's existence? When I was a child, I was depressed because I couldn't figure this question out. So I wanted to create value and gain the meaning of existence. I chose to "be motivated" to improve my ability to create value; I chose to record long videos and write teaching materials to create educational value; I chose to write summaries of my doctoral studies, work, and entrepreneurship, describing the entanglements and difficulties in them, creating the value of real cases; I chose to start a business and unite the strength of many people to create greater value.

1

postscript

Last year, when I was walking with Su Hua in Stanford, he patted me on the shoulder and asked, "Tell me the truth, why do you want to start a business?" At that time, I didn't take it seriously: "I just want to do something different." Then Su Hua smiled.

Now I understand, because he has experienced the ups and downs of entrepreneurship. If I were to answer this question again today, I would say, "I just lost my mind." But I am also glad that I didn't expect it to be so difficult at the time, so I jumped in. Otherwise, what you might see is "Reflection on ten years of work". I think the story I wrote today is more interesting.

Salute to all entrepreneurs.

Finally, Li Mu also advertised. Currently, BosonAI’s headquarters is in Santa Clara, and recruitment includes the San Francisco Bay Area and Vancouver.