Li Mu: One year in entrepreneurship is like three years in the real world

2024-08-15

The Heart of the Machine is authorized to publish

Author: Li Mu

Report to my friends on the progress, struggles and reflections of LLM’s first year of entrepreneurship.

I thought about starting my own business when I was in my fifth year at Amazon, but the pandemic delayed my decision. When I was in my seventh and a half year, I felt itchy and quit. Now I think if there is something I must try in my life, I should do it early. Because once I really start, I will find that there are too many new things to learn, and I will always lament why I didn't start earlier.

Origin of the name: BosonAI

Before starting my own business, I worked on a series of projects named Gluon. In quantum physics, Gluon is a boson that binds quarks together, symbolizing that this project was originally a joint project between Amazon and Microsoft. At the time, the project manager came up with the name on the spur of the moment, but naming is difficult for programmers, and we struggled with various file names and variable names every day. In the end, the new company was simply named Boson. I hope everyone will smile when they get the joke that "Boson and fermions make up the world." But I didn't expect that many people would mistake it for Boston.

"I'm in Boston, let's meet up sometime?" "Huh? But I'm in the Bay Area."

Financing: Lead investor absconded the day before signing

At the end of 2022, I came up with two ideas to use the Large Language Model (LLM) as a productivity tool. I happened to meet Zhang Yiming and asked him for advice. After the discussion, he asked: Why not do LLM itself? I subconsciously backed off: Our team at Amazon has been doing this for several years, and it requires tens of thousands of cards and a lot of other difficulties. Yiming said with a chuckle: These are short-term difficulties, we have to look at the long term.

My advantage is that I listened to advice and really did LLM. I gathered a founding team with people in charge of data, pre-training, post-training, and architecture, and went to raise funds. Luckily, I got seed investment very quickly. But the money was not enough to buy a card, so I had to go for the second round. The leader of this round was a very large institution, and we worked on documents and negotiated terms for several months. But the day before signing, the leader said he would not invest, which directly led to the withdrawal of several follow-up investors. I am very grateful to the remaining investors for completing this round and getting the ticket to do LLM.

Reflecting on it today, I could have continued to raise funds while the capital market was still enthusiastic. Maybe I would have been like other competitors, with 1 billion in cash in hand. I was worried that I would have raised too much money and would have a hard time exiting, or that I would have been pushed to the sky. Now that I think about it, starting a business is about changing your fate, so why would you want to retreat?

Machines: The first to try out new things

After I had the money, I went to buy a GPU. I asked various suppliers, and they all said that H100 would be delivered a year later. I suddenly had an idea and wrote an email to Huang. Huang replied immediately and said he would come to take a look. An hour later, AMD's CEO called me. I paid a little more, jumped the queue, and got the machine 20 days later. I was honored to have eaten crabs early.

We ate crabs until we doubted our lives. We encountered all kinds of incredible bugs. For example, insufficient power supply of the GPU caused instability, and later Super Micro engineers modified the BIOS code and patched it. For example, the cutting angle of the optical fiber was wrong, which caused unstable communication. For example, Nvidia's recommended network layout was not optimal, so we made a new plan, and later Nvidia itself adopted this plan. I still don't understand it. We bought less than a thousand cards, so we are considered a small buyer. But didn't the big buyers encounter these problems? Why do we need debug?

At the same time, we rented the same number of H100s, and we also encountered various bugs. The GPU had problems every day, and we even wondered if we were the only ones trying this cloud. Later, I saw the technical report of Llama 3, which said that after they switched to H100, the training of the model was interrupted hundreds of times. I empathized with the pain between the lines.

If you compare building your own cloud and renting a card, the cost of renting for three years is about the same as building your own cloud. The advantage of renting a card is that you don’t have to worry about it. There are two advantages to building your own cloud. One is that if Nvidia’s technology is still far ahead after three years, it can control the price so that the GPU still maintains its value. The other is that the cost of self-built data storage is low. Storage needs to be close to the GPU, and the storage price is high regardless of whether it is a large cloud or a small GPU cloud. However, a model training can use several TB of space to store checkpoints, and the training data storage starts at 10PB. If you use AWS S3, 10PB costs two million a year. If you use this money to build your own cloud, you can reach 100PB.

Business: Grateful to customers, breaking even in the first year

Fortunately, our revenue and expenditure broke even in the first year. Our expenditures were mainly on manpower and computing power. Thanks to OpenAI's financial strength and Nvidia's leading position, both of these expenditures were quite large. Our income came from customizing models for major customers. Most of the companies that started LLM early on had CEOs who were very decisive. They were not scared by the high computing power and labor costs, and decisively pushed the internal team to cooperate in trying new technologies. I am very grateful to the customers for giving us some breathing time, otherwise I would have been running around to various investors in the past few months.

In the future, more companies will try to use LLM, whether it is to upgrade their own products or to reduce costs and increase efficiency. The reason is that on the one hand, the cost of technology is decreasing, and on the other hand, industry leaders (such as our customers) will release LLM-based products one after another, which will attract the industry.

We are also paying attention to the implementation of LLM in toC. The last wave of top streams such as c.ai and perplexity are still looking for business models, but there are also about ten LLM native applications with good revenue. We provided a model to a role-playing startup. They focus on deep players and broke even on revenue and expenses, which is also impressive. Model capabilities are still evolving, and more modalities (voice, music, pictures, videos) are being integrated. I believe that more imaginative applications will appear in the future.

In general, the industry and capital are still impatient. This year, several companies that were established for more than a year but raised billions of dollars chose to exit. From technology to product is a long process, and it is normal to take 2 or 3 years. Taking into account the emergence of user needs, it may take longer. We focus on the present and explore the way in the fog, and remain optimistic about the future.

Technology: Four stages of LLM cognition

The understanding of LLM has gone through four stages. The first stage is from Bert to GPT3. The feeling is that it is a new architecture and big data. This is possible. When we were at Amazon, we were the first to do large-scale training and implement it in products.

The second stage was when GPT4 was released when I just started my business. It was a big shock. Most of the reason was that the technology was not open to the public. According to rumors, it costs 100 million to train a model and tens of millions to label data. Many investors asked me how much it would cost to reproduce GPT4. I said 300-400 million. Later, one of them really invested hundreds of millions.

The third stage is the first half year of starting a business. We couldn’t do GPT4, so we thought about starting from specific problems. So we started looking for customers, including those in games, education, sales, finance, and insurance. We trained models based on specific needs. At the beginning, there were no good open source models on the market, so we trained from scratch. Later, many good models came out, which reduced our costs. Then we designed evaluation methods for business scenarios, labeled data, and saw where the model was not working, and made targeted improvements.

At the end of 2023, we were pleasantly surprised to find that our Photon (a type of Boson) series models outperformed GPT4 in customer applications. The advantage of custom models is that the inference cost is 1/10 of calling the API. Although APIs are much cheaper today, our own technology is also improving, and the cost is still 1/10. In addition, QPS, latency, etc. are better controlled. At this stage, we know that for specific applications, we can outperform the best models on the market.

The fourth stage is the second half of the startup. Although the client got the model required in the contract, it was not what they wanted, because GPT4 was far from enough. At the beginning of the year, we found that it was difficult for the model to make another leap forward for single application training. Looking back, if AGI reaches the level of ordinary humans, the client wants the level of professionals. Games need professional planners and professional actors, education needs gold medal teachers, sales need gold medal sales, and finance and insurance need senior analysts. These are all AGI plus industry professional capabilities. Although we were in awe of AGI at the time, we felt it was unavoidable.

At the beginning of the year, we designed the Higgs (God particle, a type of Boson) series of models. The main general ability is to follow the best model, but it excels in a certain ability. The ability we chose is role-playing: playing a virtual character, playing a teacher, playing a salesperson, playing an analyst, etc. In the middle of 24, it was upgraded to the second generation. On Arena-Hard and AlpacaEval 2.0, which test general abilities, V2 is on par with the best model, and is not far behind on MMLU-Pro, which tests knowledge.

Higgs-V2 is based on Llama3 base, and then complete post-training. We can't spend a lot of money to annotate data like Meta, so V2 is better than Llama3 Instruct, and the reason should still be mainly due to the innovation of the algorithm.

Then we made a role-playing evaluation set, including role-playing according to the character setting and role-playing according to the scene. I am sorry that my model took the first place on my own list. But the evaluation data was not touched during the model training. Because this evaluation set was originally intended for personal use, I hope it can truly reflect the model's capabilities, so we must avoid overfitting the model to the data set. But the students who did the evaluation wanted to write a technical report, so they released it. Interestingly, the role-playing test samples came from c.ai, but their model capabilities were at the bottom.

The fourth stage of cognition is that a good vertical model must have good general capabilities, such as reasoning and instruction following, which are also needed in the vertical. In the long run, both general and vertical models must move towards AGI. However, the vertical model can be slightly more specialized, with high scores in professional courses and okay general courses, so the R&D cost is slightly lower and the R&D method will be different.

What about the fifth stage of understanding? It is still in progress and I hope to share it soon.

Vision: Human companionship

To be honest, we focus on technology, customize products for customers, and then slowly think about what vision we want to pursue. We look at what customers want, what we want, and what we may need in the future. For me, many years ago, I dreamed of having a robot nanny to help me take care of my children and accompany them, because I think it is difficult to do this, and I don’t quite understand the current cognition and ideas of children. I hope to have a very powerful virtual assistant at work who can invent new things with me. When I get old, I also want to have interesting robots to accompany me. My prediction for the future is that production tools will become more and more advanced, and one person will complete things that a team could only complete before, which will lead to more individual independence of human beings. Everyone is busy pursuing their own things, and thus more lonely.

Putting all these together, we have defined our vision as "intelligent agents that accompany humans". An intelligent agent with high emotional intelligence and online IQ. If it is replaced with real people, it should be a professional team. For example, if you want it to play with you, then it is a professional planner + actor. If it wants to accompany you in sports, then it is a motivator + professional sports coach. If it wants to accompany you in learning, then it can explain what you don't understand. The advantage of the model is that it can provide long-term companionship and really understand you. And it can "sincerely serve you".

However, the current technology is still far from the vision. The current technology can accompany the chat. In many scenarios, the chat is not so good, the content is scarce, and the IQ and EQ are sometimes not online. These are the problems that need to be solved now. If there are friends who are doing overseas applications in this area, please contact us.

Team: Challenging things require teamwork

After starting my own business, I really felt the importance of a team. When I was in a big company, I felt like a screw, the team members were screws, and even the team was a screw. But a startup team is like a car. The car is smaller, but it can run, carry loads, turn flexibly, and go to every corner. Not long after the company was established, Mihoyo’s Lao Cai came to take a look. Seeing that everyone was in one room, he said with emotion that it was great to have a small team.

Of course, there are some inconveniences. You have to check whether there is gas at all times. Be careful not to shake the car apart on difficult roads. Every member is important. There is no redundancy. If one person is not strong enough, it may be a flat tire. People are also precious. If one person leaves, it may be one less tire.

In the past, I would choose projects that I could lead the development of. But this also means that the problem is not very challenging. When starting a business, I choose a big problem to work on, and I can only rely on the team. Although there are a lot of "I" in this article, in fact, the work is done by the team. Without a team, I may have to switch to selling courses.

Personal pursuit: fame or fortune?

So far, I have made decisions based on my inner voice, such as studying for a doctorate, making videos, and starting a business after work. Entrepreneurship requires strong motivation to overcome the endless difficulties. This requires a deeper analysis of one's own motivations.

Motivation comes from either desire or fear. Ten years ago, I might have been more interested in fame and fortune, but at my age, I feel that the marginal utility of money is not high, and the emotional value of fame is also very small. My deep motivation comes from the fear that life may be meaningless. Not to mention the vastness of the universe, even in the long river of human history, a person is just a grain of sand. Accidents come and go quickly. One hundred billion people have lived on the earth, and most of them will not leave a mark in history. I hardly recognize the thousands of names in my family tree.

So what is the meaning of a person's existence? When I was a child, I was depressed because I couldn't figure out this question. So subconsciously, I wanted to create value and gain the meaning of existence. I chose to "be motivated" to improve my ability to create value; I chose to record long videos and write teaching materials to create educational value; I chose to write summaries of my doctoral studies, work, and entrepreneurship, describing the entanglements and difficulties in them, creating the value of examples; I chose to start a business and unite the strength of many people to create greater value.

postscript

Last year, when I was walking with Su Hua in Stanford, he patted me on the shoulder and asked, "Tell me the truth, why do you want to start a business?" At that time, I didn't take it seriously: "I just want to do something different." Then Su Hua smiled.

Now I understand, because he has experienced the ups and downs of entrepreneurship. If I were to answer this question again today, I would say, "I just lost my mind." But I am also glad that I didn't expect it to be so difficult at the time, so I plunged into it. Otherwise, everyone might see "Reflections on ten years of work." I think the story I wrote today is more interesting.

Salute to all entrepreneurs.

(The last advertisement is for our recruitment information (Bay Area and Vancouver) https://jobs.lever.co/bosonai. If you are interested in developing overseas applications, please contact us at [email protected])

news

Li Mu: One year in entrepreneurship is like three years in the real world

Introduction

My contact information