2024-09-27
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
the content of this article is"what to invest in agix"of the 6 articles.it is a combination of 40 "high ai purity" companies selected from thousands of technology listed companies around the world. the agix index is the coordinate for positioning the agi process, and also provides a valuable tool for investors to capture ai-alpha. in the "what to invest in agix" section, we will conduct in-depth analysis of the portfolio companies of the agix index and provide a comprehensive ai investment reference for the market.
Tesla it is one of the top 10 holding companies in the agix index portfolio. the company's large-scale investment in autonomous driving and robots in the past 10 years has given it the opportunity to become the strongest agi player in the physical world.recently, tesla ushered in the second wave of stock price rise in 2024, not only reaching the highest level in the past two months, but also erasing all the declines this year and turning upward. ai is the most important factor driving this round of growth.
on october 10, tesla will officially release robotaxi. according to ark’s analysis, it is expected that by 2029, nearly 90% of tesla's enterprise value and earnings will be attributed to the self-driving taxi business.this week, tesla also lowered the price of fsd options to boost new car sales. at the same time, the increase in fsd option rates will also help tesla collect more data to improve fsd performance. although the optimus robot is still far from large-scale commercialization, if optimus is used to replace tesla factory workers and improve human efficiency, profits can be greatly improved. referring to ark's modeling of tesla, if optimus is implemented in the tesla factory in the next five years the deployment can save up to us$3-4 billion in costs.
01.
autonomous driving is very similar to agi
Sarah Guo: what do you think of the development of autonomous driving today? how long until we see autonomous driving becoming widespread?
Andrej Karpathy: i have been working in the field of autonomous driving for 5 years and i find this field very interesting. judging from the current development of this field, autonomous driving and agi are very similar. it may also be because i am familiar with the field of autonomous driving, but i do feel that we are close to agi in the field of autonomous driving. for example, there are already formed products can be used by users for a fee. waymo is a good example. waymo is now very common in san francisco and many people have experienced it. i myself often experience waymo and it has become a commercial product.
my first experience with waymo was almost 10 years ago. a friend worked for waymo at the time and he took me on a waymo ride around the entire block. from a technical perspective, waymo 10 years ago it's already very good, but the process from demo to becoming a product deployed on a large scale in cities took 10 years. of course, waymo is still expanding today.
Elad Gil: it took 10 years from demo to successful paid product. to what extent is it due to regulation? when do you think autonomous driving technology will be ready?
Andrej Karpathy: i think autonomous driving has actually reached a fairly mature level 10 years ago, but a 30-minute demo cannot fully demonstrate all the challenges they have faced in the past 10 years. there is a big gap between the demo and the actual product. of course, there will be there are some regulatory reasons.
but i think we have reached agi in the field of autonomous driving to a certain extent. at the same time, there is a big gap between demoing and being promoted globally.although waymo is already running in san francisco, it has not yet had a very substantial impact and results from the perspective of popularization in the global market. this is where i think agi and autonomous driving are similar.
back to the field of autonomous driving,many people think that waymo is technologically ahead of tesla, but i personally think that tesla is actually further ahead than waymo. this view may not be the same as the current mainstream voice, but i do have confidence in tesla's autonomous driving.
tesla faces software-level problems, while waymo’s challenges come from hardware. in comparison, software problems are easier to solve. tesla has deployed vehicles on a large scale around the world, while waymo has not yet reached that scale. therefore, i believe that once tesla's system can be implemented on a large scale and run efficiently, the results will be amazing. i just test drove the latest version of fsd yesterday and the driving experience was very smooth. a series of operations of tesla’s autonomous driving system make me feel that tesla has achieved quite good results in autonomous driving today.
overall, i think the biggest challenge for tesla's autonomous driving is from the software perspective, while waymo's challenges come more from the hardware. from today's perspective, waymo seems to be in a strong position, but i believe that if you look at it over a 10-year period, tesla will be further ahead in terms of scale and revenue model.
Elad Gil: how long do you think it will take to solve a software problem? you just mentioned that waymo's vehicles have many expensive lidars and sensors. these hardware provide support for the software system. if, like tesla, it only relies on the camera system, it can not only significantly reduce costs, but also reduce the complexity of the system. , and applicable to more models. when will this change likely be realized?
Andrej Karpathy: i personally hope it will be resolved within the next few years. in fact, tesla also used a lot of expensive sensors in the training phase, and also made many technologies that cannot be promoted on a large scale, such as wirelessline sensor trust model research and map mapping, etc.during the testing phase, tesla streamlined this data into a test package that relied only on the vision system and deployed it on production vehicles. many people may not realize that this is actually a very smart "arbitrage" between sensors and costs. because the camera can capture enough information, the neural network is also capable of processing this information. during the training phase, these sensors are very useful, but during the testing phase, their role is not that important. so, i think just relying on the camera is enough.
Elad Gil: a recent trend in the field of autonomous driving is to gradually shift from heuristic algorithms designed based on edge cases to end-to-end deep learning. what are the reasons and logic behind it?
Andrej Karpathy: end-to-end is actually what we wanted to do from the beginning. when i first joined tesla, we discussed that neural networks would eventually replace the entire technology stack. there was a lot of c++ code in the system at that time, but today there is very little c++ code running in the test suite. neural networks gradually replaced them. at first, neural networks were only used for image recognition processing, and later expanded to process multiple frames of images and generate prediction results. over time, c++ codes were gradually replaced. ultimately, the system only needs to give driving instructions, and the neural network can output the results.
so what tesla is doing is end-to-end ai driving, but waymo probably did not choose this technical route. although they have tried, the results are not satisfactory.
i personally believe that the end-to-end route is correct and the inevitable direction for future development.if you look at it from this perspective, the tesla system in ten years is likely to develop into an end-to-end neural network, where the video stream is input and the driving instructions are directly output. of course, this process requires the gradual improvement of each module of the system. i don't think all the current intermediate predictions are misleading in the development process; on the contrary, they are an important part of the system. because when training a completely end-to-end neural network, the supervision signals for simulating human driving are very limited and cannot support the training of such a large network. intermediate predictions can help develop features and detectors, making the end-to-end problem more feasible. so my guess is that they are doing a lot of pre-training to allow for end-to-end fine-tuning in the future.
overall, i think the process of neural networks replacing the entire technology stack is necessary, but the process needs to be gradual. tesla's current attempts have shown initial results, making people full of expectations for the future.
💡
intermediate predictions:the non-final results or output generated during model training or inference. these predictions serve as intermediate steps in a multi-step calculation process, helping the model gradually approach the final result. they are useful in complex tasks, such as hierarchical decision-making, machine translation, or multi-task learning, where these intermediate results can be evaluated to optimize model performance, correct biases, or improve model training. additionally, intermediate predictions help explain the inner workings of the model and may provide a reference for model tuning.
02.
tesla is also a robotics company
Sarah Guo: before leaving tesla, you also participated in tesla's humanoid robot project. from autonomous driving to robots, what technologies can be transferred?
Andrej Karpathy: basically all technologies can be migrated. but i think people may not realize this yet.there is not much difference between robots and cars. i think simply understanding tesla as a car company is actually a misunderstanding of it.
tesla is actually a large robotics company that not only produces cars, but also manufactures automated machines. mass production is a very different field, and i think tesla is a company that specializes in large-scale robotics.
migrating from automotive technology to humanoid robotics doesn't actually require much extra work. in fact, the early optimus robot even thought it was a car because it used the exact same computer and cameras as a car. interestingly, we were running a neural network designed for cars on the robot, and when the robot walked around the office, the "driving space" it identified actually became a "walkable space." while some fine tuning is required, this does demonstrate the versatility of the technology.
Sarah Guo: from a certain perspective, tesla can indeed be regarded as a robotics company, and many core technologies can be migrated across platforms. the key part that production robots lack is actually the execution mechanism and related action data.
Andrej Karpathy: yes, although some places are not perfect yet, i want to emphasize that many technologies can be directly migrated. for example, the optimus project started very quickly. after elon musk announced the project, the relevant teams and tools were quickly put in place. resources like cad models, supply chains, etc. were quickly prepared. at that time, i felt that tesla actually already had quite a wealth of robot manufacturing resources internally, all of which were taken from tesla cars. this feeling is somewhat similar to what is shown in "transformers". after the car transforms into a robot, everything is the same, but some things need to be slightly adjusted and reconfigured. in addition to hardware, the entire way of thinking, annotation teams, coordination between various component sections, etc. will change. but in general, some experience and resources can be transferred.
Elad Gil:what do you think the first application scenario of humanoid robots will be?
Andrej Karpathy: many people would think that robots can help us with daily tasks like doing laundry. but i think it may take a long time for these technologies to actually be implemented. i don't think direct-to-consumer is a good starting point for humanoid robots, because we still can't fully ensure the safety of robots when interacting with people such as the elderly, such as avoiding accidents such as "knocking down the old lady" , this kind of situation will bring huge legal risks, so i think this direction is not suitable. even in many simple interaction scenarios, robots are likely to knock people over directly.
but today's technology is not mature enough and needs further improvement. therefore, i think that for robot developers, the best customer in the first stage is the robot itself. if robot developers can realize this, the first thing to do is to use these technologies internally for incubation, and then it can be applied in factories, such as material handling (material handling) and other fields, so that there is no need to sign a contract with a third party, avoiding the cumbersome process involving lawyers and contracts.
after internal incubation and success, you can enter the to b market and cooperate with some companies with large warehouse businesses to perform tasks such as material handling. in these cooperations, robotics companies can build a market security system, and after successful implementation by multiple companies, they can gradually transition to consumer-oriented applications. i believe we will see many robots developed for consumers in the future. for example, the products developed by unitree are worth looking forward to. i would like to buy a unitree g1 myself.
when robots become popular in various scenarios, there will be a complete ecosystem, that is, everyone will develop various types of robots based on the robot platform. but from a scale perspective, i think the path of gradual advancement is the most reasonable.
it may start by handling some material handling (material handling) related work, and then gradually expand to more niche and high-demand areas. one item that i'm particularly interested in personally is the "leaf blower". for example, one day we can see optimus robots walking on the streets and gently picking up every fallen leaf, so that we no longer need to use leaf blowers. i think this is a great project and i hope this can become an early application scenario.
Sarah Guo: in terms of robot form, some people think that humanoid robots will be a better choice, because many designs in the physical world today are based on human behavior habits, so a unified hardware form development model based on humanoid robots can complete more and more tasks. task, another view is that humanoid robots are not necessarily the only answer to universal robots. what do you think about this issue?
Andrej Karpathy: i think many people actually underestimate the complexity of the fixed costs of different robot platforms. each robot platform requires a high fixed cost, so the route of a universal robot will be more reasonable. we will do various tasks based on a unified platform. an attempt.
so i think humanoid robots actually have great potential, and humans can easily control them remotely to help collect data. at the same time, just like one of the perspectives you just mentioned, the entire world revolves around human behavior and habits, which is another reason why humanoid robots are important.
of course, there may be various changes in humanoid robots in the future, but for any new robot platform, fixed cost is an important issue that needs to be considered.
i also want to emphasize that you will gain more by sharing information and learning from each other between different tasks.
in the field of ai, we want to build a neural network that can handle multiple tasks and learn from each other through multiple tasks to improve the overall intelligence level. the interesting thing about language models is that they serve as multi-task models for processing text, able to handle many different types of problems while also sharing information between these tasks. but all of these tasks are actually performed through a single neural network.
likewise, we hope that the data collected during the leaf-picking task will help you complete other tasks, but if you develop a system specifically for a specific task, your profit margin may be narrowed.
Sarah Guo: robots like unitree g1 are currently priced at around us$300,000. it seems that the field of humanoid robots has achieved low cost at present., high-function flatit’s difficult to balance, but if we adopt a wheeled structure and add a robotic arm to complete specific tasks, wouldn’t we have a better chance of realizing a more cost-effective general-purpose robot?
unitree g1 robot
Andrej Karpathy:it makes sense from a hardware perspective to look for cheaper general-purpose platforms. in some circumstances, it may be a more efficient choice to use wheels and other structures instead of feet to complete tasks, but i think this may be pursuing a local optimal solution. in the long run, i think it's probably wiser to pick one form and polish it to perfection. and from a human psychological perspective, the advantages of humanoid robots will be more obvious. they feel familiar and make people want to interact with them.
of course, considering the uncanny valley effect, perhaps abstract forms will be more popular with users. because i'm actually not sure how people will react to different forms of robots. if we end up with an eight-wheeled monster to get the job done, i'm not sure people will like it or be more scared.
Elad Gil: mechanical dogs are also a form route, and dogs are also forms that are more familiar to humans.
Andrej Karpathy: yes, but many people who have watched "black mirror" may combine mechanical dogs with certain horror scenes, so everyone's psychological acceptance will be different. in comparison, the humanoid form may be easier for people to acceptance also makes it easier for people to understand its functions and behaviors.
Elad Gil: if we want to achieve humanoid form, what key advances need to be achieved from a technical perspective?
Andrej Karpathy: i don't think there is a clear answer to this question yet. one of the more interesting discussions here is that in the design of humanoid robots, the lower body is not suitable for imitation learning. this part involves more inverted pendulum control. for the upper body (the upper body), it relies more on remote control, data collection, and end-to-end learning. in some sense, robotic systems need to bring together multiple technologies, but i'm not quite sure how these systems work with each other yet.
💡
inverted pendulum:involving keeping a pendulum in an unstable upright position, it is a classic control problem with wide applications in robotics, aerospace and other fields. traditional inverted pendulum control methods include pid control, linear quadratic regulator (lqr), sliding mode control, etc.
with the development of ai, reinforcement learning methods are gradually introduced into the control of inverted pendulums. under the rl path, it has attracted much attention because of its ability to learn optimal strategies without accurate models. the inverted pendulum balance control algorithm based on reinforcement learning is a very practical technology and has been widely used in robotics, automation and other fields.
Elad Gil: when communicating with some people in the field of robotics, i found that they are very concerned about issues such as power drive, control, and digital manipulation.
Andrej Karpathy: yes, i think in the early stages, there will indeed be many remote control scenarios, such as letting robots imitate humans picking up items from the ground, until the system can run autonomously 95% of the time. then gradually increase the proportion of robot work, allowing humans to change from operators to supervisors.
in fact, i think there are no special technical obstacles. it is more that a lot of basic work needs to be done.we already have the appropriate tools and resources, such as the transformer architecture. such technology is like an excellent "coordinator". we only need to prepare the correct data, train and experiment, and finally implement deployment. although the process is complicated, there are actually not many essential technical bottlenecks.
03.
synthetic data, small models, llms companies
Sarah Guo: where do you think we are at in terms of large blobs research?
💡
Large blobs research :usually refers to a research direction or technology in the fields of deep learning and computer vision. blob is "binary large Object, which stands for "binary large object", is a large contiguous region in an image or feature map that may contain important visual information or represent a specific object or scene part. studying these large regions can help improve the model the ability to understand and process large-scale visual features.
Andrej Karpathy: i feel like we are in a phase of rapid development now. transformer is not just a neural network, but a powerful and versatile neural network.
for example, when everyone discusses scaling law, they often refer to the characteristics of the transformer architecture. before transformer, people mainly used stacked lstm to do some work, but no clear scaling law was found. transformer is the first model that makes this clear and scales effectively.
💡
stacked lstm refers to a deep neural network structure formed by stacking multiple lstm (long short-term memory) layers together.
transformer is like a general computer, more specifically a differentiable neural computer (dnc). we can make it do very large-scale input and output, and train this computer through the backpropagation method,eventually, it will become a self-evolving mission-completion system.
💡
differentiable neural computer (dnc):a special type of neural network capable of storing and retrieving information, similar to the memory system in a computer. it is "differentiable," meaning its parameters can be optimized through backpropagation to make it perform better at solving complex tasks.
although transformer is a miracle that we accidentally discovered in the field of algorithms, there are indeed many key innovations behind it, such as residual connections, layer normalizations, and attention blocks. unlike traditional methods, transformer does not use nonlinear activation functions that cause gradients to disappear. instead, it integrates innovative technologies as mentioned in their technical papers, which greatly improves training efficiency and performance.
Sarah Guo: during this period, there has been discussion about the data wall, and the cost of scaling up the next generation model will be extremely high. what do you think about data issues?
Andrej Karpathy: this is what we discussed from the beginning. i feel that the architecture of neural networks itself is no longer a bottleneck today. although before the birth of transformer, architectural issues were indeed an obstacle. now the new bottlenecks are mainly focused on the loss function and data set.therefore, many companies and researchers no longer focus on changes in the transformer architecture. for example, llama has no particularly obvious architectural innovation. the only big change may be "rotational positional encodings" (rope positional encodings).transformer itself has not changed much in the past five years. everyone just focuses on the innovation of training, data sets and loss functions based on the existing foundation.
💡
"rotary positional encodings" (rope, rotary positional encodings):a positional encoding technique for transformer models. it represents position information in the input sequence by rotating vectors. compared with traditional position encoding, rope can give the model more advantages when processing long sequences. its key feature is to encode the position of each element in the sequence by rotating the angle of the vector while maintaining relative distance information. this approach allows the model to have better flexibility and scalability in different locations, and is especially suitable for tasks that deal with long-distance dependencies.
Sarah Guo:when there is not enough data on the internet, will we start using synthetic data, or similar more expensive methods of data collection?
Andrej Karpathy: a lot of research currently focuses on language models. although internet data is not the most ideal data source for transformer, they can be used as a tool to continuously improve model capabilities. internet data is just a collection of web pages, but what’s really valuable is what’s in our brainsinner monologue”——those complex and deep thinking trajectories.
if we can have billions of data similar to "thought tracks", then we may be close to agi to some extent. but these data currently do not exist, so current research is mainly focused on reorganizing existing data sets into a format similar to "inner monologue (inner monologue)". this is the importance of synthetic data. today's models can help us generate the next generation of models. this is a process of continuous iterative progress, just like climbing a ladder, getting closer to the goal step by step.
Elad Gil:how useful is synthetic data? as you said, each model can help us train the next model, or at least provide tools for tasks such as data annotation, part of which may be synthetic data.
Andrej Karpathy: i think synthetic data is essential to improve model capabilities.but be careful when using synthetic data, because the model "collapses" without knowing when. for example, when we ask chatgpt to tell us jokes, if we try a few more times, we will realize that it may only know 3 jokes. although it seems to know a lot, in fact it only knows those few. this is "collapse" ", that is, there is no problem with a single output, but if the output in this specific direction, the diversity and flexibility of the model are greatly reduced, this is a problem when generating data, especially when generating synthetic data, it is easy to "collapse" this is because we actually need the diversity and richness of data, that is, "entropy", to avoid problems caused by too single a data set.
💡
mode collapse:this is a phenomenon in generative adversarial networks (gans) where the generative model starts generating very similar or repetitive samples instead of diverse samples. this is often seen as a problem because it indicates that the model is not able to learn the rich diversity of the data.
for example, someone released a character-related data set containing 1 billion fictional character backgrounds, such as "i am a teacher" or "i am an artist, i live here, i do this job" and so on.when generating synthetic data, you actually let it imagine the process of interacting with a specific person. this can give the model more space to explore, thereby outputting more information and increasing the diversity of the data set.therefore, we need to carefully inject entropy while maintaining the stability of the data distribution, which is the biggest challenge in generating synthetic data.
Sarah Guo: what do you think we can learn about human cognition from this research? for example, some people believe that understanding the formation process of thinking trajectories will help us understand how the brain works.
Andrej Karpathy: research models and human cognition are two completely different things, but in some cases they can be compared. for example, i think transformer is stronger than the human brain in some aspects, and the model is a more efficient system than the human brain, but due to data limitations, their current performance is not as good as the human brain. but this is only a rough explanation.
for example, in terms of memory capabilities, transformers perform better than the human brain when processing long sequences. if you give it a sequence and ask it to perform a forward and backward calculation, it can remember the front and back parts of the sequence and complete the task, which is difficult for human memory to do. therefore, in some aspects, i think the training method based on gradient optimization is indeed more efficient than the human brain, and even in the future, the model may really surpass humans at some cognitive levels.
Elad Gil: memory capacity is one of the strengths of computers.
Andrej Karpathy: yes, i think the human brain actually has many limitations. for example, the capacity of working memory is very limited, while the working memory of transformers is much larger in comparison, and the gap between them is still widening. in addition, transformers learn more efficiently. the operation of the human brain is limited by many hidden factors, such as background, responsibility, environment, etc., which makes the human brain system more random and limited. therefore, i feel that in some aspects these models are already stronger than the human brain, but they have not yet reached their full potential.
Elad Gil: regarding the relationship between humans and ai, one argument is that we use it as an external tool, while others say there will be a deeper integration of humans and ai models. what do you think about this issue?
Andrej Karpathy: i think we have achieved the integration of humans and ai to a certain extent. technical tools have always been a derivative of human abilities. as people often say, "computers are the bicycles of the human brain." it’s just that the problem with today’s models lies in the bottleneck in the information input and output process, so the integration of humans and ai still requires continuous attempts. however, when the models have been perfected, using these models is very simple and can be achieved with just a few simple moves. so, although there are some obstacles, the current technology has made this integration relatively easy and feasible.
Elad Gil: some people in the ai field believe thatif there is a conflict between us and ai in the future, it is okay
solved by some form of fusion of humans and ai.
Andrej Karpathy: yes, this is very similar to neuralink's philosophy. while i'm not sure exactly what this fusion will look like, what is clear is that we want to reduce input and output latency between humans and tools. you can think of it as adding a new cortex to our cerebral cortex. this new cortex may be cloud-based and is essentially the next layer of the brain.
Elad Gil: exist Accelerando the book does have a similar premise, where everything is delivered to the brain through a wearable smart glasses. if you lose these glasses, it's like losing a part of your personality or memory.
Andrej Karpathy: i think this is likely to happen. today’s mobile phones have almost become a part of our lives, like an external device to the brain. every time we put our phones down, we feel like we're back to our original state.
for another example, if we have a "universal translator" and rely on it for a long time, then when we suddenly don't have it, we may lose the ability to communicate directly with people who speak different languages. as shown in a video, a child is holding a magazine and trying to slide it with his finger. he can't tell what is natural and what is brought about by technology. it makes me think that as technology becomes more and more ubiquitous, people may grow dependent on these tools, only to realize they can't tell what is technology and what isn't until they disappear. especially devices like translators that always help you perform tasks will greatly reduce people's sensitivity to the boundaries between technology and nature.
Sarah Guo: the "exocortex" sounds like a very important thing, and it is important to everyone. today, llm research is led by a few ai labs, and only they have the resources to promote the development of next-generation model training. what do you think of this structure in llm research today? what impact will it have on the popularity of ai technology in the future?
Andrej Karpathy: the ecosystem of llm is indeed monopolized by several closed platforms today, while meta llama, which ranks at the bottom, is relatively open. this phenomenon is also a reflection of the open source ecosystem to a certain extent. when we think of llm as the "outer layer", issues of information and data privacy are involved. there is a saying in the encryption field that is "not your keys, not your tokens". maybe in the future in the llm field we will emphasize "not your weights, not your brain". if ai is the new cerebral cortex for everyone in the future, and if this cortex is controlled by a certain company, people will feel that they are "renting" a brain instead of actually owning it.
Sarah Guo: are you willing to give up ownership and control of your own brain to rent a more powerful one?
Andrej Karpathy: i think this is a critical trade-off. the future trend may be that most people will use the powerful closed source model as the default option, but in some specific cases, open source systems will become the alternative. just like now, when some closed source model providers have problems with their apis, people turn to the open source ecosystem and therefore feel more in control.
this may also be the direction of future brain technology development: when problems arise, we can switch to open source systems, while in most cases we still rely on closed systems. it is important to keep open source systems moving forward, but today perhaps not everyone is aware of this issue.
Elad Gil: what do you think of the miniatures? what level of performance can today’s small models achieve?
Andrej Karpathy: i think the model could be scaled down even smaller. because of the problem with the data set, we feel that the current model wastes a lot of capacity in storing some irrelevant information. the key to a small model is to focus on core cognition, and this core can actually be very small. it is more like a way of thinking. when we need to find information, we can flexibly use various tools to obtain it, rather than letting the model store a lot of unnecessary details.
in terms of parameters, i think we may only need 100 million parameters to achieve our goal. efficient compression technology can make the model very small. the principle of compression is simple: use a very large model or a lot of computing resources to supervise a smaller model. this process can pack a lot of capabilities into the small model.
the essence of this matter is that today's big models deal with internet data sets, and only about 0.001% of the content is related to cognition, and the remaining 99.99% is actually some irrelevant information, such as copy right text. most information does not play a substantial role in improving thinking patterns.
Elad Gil: can this process be explained by mathematics or some kind of informatics theory? can the relationship between model size and cognitive power be quantified? for example, in the future, only a 1 billion parameter model may be needed to achieve good understanding.
Andrej Karpathy: it may even cost less than 1 billion, and the model can have this kind of cognitive ability, taking into account the cost of the model, end-side equipment, etc. and what we are going to discuss may not be a single cognitive model. i think the model should have the ability to process in parallel, rather than just relying on sequential processing. it's just like a company, a lot of work can be done in parallel, but a hierarchical structure is also needed to better process information. therefore, i think there may be a model of "companies for llms" in the future: different models focus on their respective fields, such as one is a programmer model and the other is a project manager model, and everyone handles a lot of work in parallel, each other they can also collaborate to form a "group brain" composed of llms.
Elad Gil: this cluster of llms is like an ecosystem, each part of which has its own unique expertise and position.
Andrej Karpathy: i think the future will definitely develop in this direction. the cloud model is the most intelligent and can be regarded as the ceo. there are many cheaper and open source models that are employees in this group. but when the system encounters very complex problems, tasks are automatically escalated and assigned to other parts of the group.
04.
education in the ai era
Sarah Guo: you started working on your own education project after leaving openai. why did you choose education?
Andrej Karpathy: i have always loved the education industry, i like learning and teaching, and i am very passionate about this field.
💡
karpathy founded Eureka Labs, which is an education platform with ai as its core, aiming to revolutionize learning methods through artificial intelligence technology. eureka labs’ first course LLM101n students will be guided to build their own large-scale language models, with the goal of making ai education more interactive and popular. this platform plans to enhance the learning experience by integrating ai teaching assistants and human course design, reflecting his vision of integrating ai and education over the years.
an important reason that pushed me to enter this field is that i feel that many ais are trying to replace humans, causing many people to lose their jobs, but i am more interested in technologies that can enhance human capabilities. overall, i stand on the side of humanity and hope that ai can help humanity become more powerful rather than marginalized.
in addition, i think it is a pretty good idea to have a "perfect tutor" that can achieve tutoring tasks in all subjects. if everyone has such an ai tutor to guide them in learning all subjects, i believe everyone can achieve greater results.
Elad Gil: since the 1980s, literature has clearly stated that one-on-one tutoring can improve an individual's performance by 2 standard deviations. there are also many cases around personalized tutors. how do you think ai and tutors can be combined?
Andrej Karpathy: i do get a lot of inspiration from these examples. now i am building a complete course with the goal of making it the first choice for people to learn ai. i previously taught stanford's first deep learning course. although the number of students was only 20 to 30, the results were good. the challenge now is how to scale this kind of course to cover 8 billion people around the world. considering the differences in language and ability, this is difficult to achieve with a single teacher.
therefore, the key is how to use ai to expand the role of good teachers. the core task of teachers should be course design and writing materials, while ai can interact with students on the front end and teach content. the current ai cannot create complete courses independently, but it is enough to help explain and transfer knowledge. this way, teachers can focus on back-end design, while ai uses multiple languages on the front-end to interact with students and help them complete their learning.
Sarah Guo: can ai be compared to a teaching assistant?
Andrej Karpathy: teaching assistant is one of the directions i am considering. i see it as a front-end that directly interacts with students and leads them to complete the course. i think this is a feasible solution under the current technology, and there is no similar product on the market, so i think there is a lot of potential in this area, and as technology advances, we can make various adjustments to it. i feel that many companies today do not have an intuitive enough understanding of model capabilities, and as a result, the products they develop are too advanced or not accurate enough. so i think this field has great potential.
Sarah Guo: with good tools, to what extent can the limits of human capabilities be reached? for example, if we compare it to the olympic games, because of the advancements in training science and technology in the past 10 years, the performance of top runners is better than it was in 10 years.
Andrej Karpathy: i feel like we haven't hit the fullest potential yet today. we can think about this issue from two perspectives. the first is globalization. i hope everyone can receive a high level of education, and the second is the limit of individual abilities. both perspectives are valuable.
Elad Gil: usually when we discuss 1-on-1 learning guidance, we will mention personalization and adaptation, that is, giving corresponding learning challenge tasks according to each person's level. do you think ai can do this today?
Andrej Karpathy: i think the "low-hanging fruit" in today's ai education field is translation applications. the current models are very good at such tasks, and the things they can do are still basic tasks.
it is difficult to achieve personalization that adapts to each person's level, but it is not impossible. i think this should also be the focus of ai development, and it obviously has the potential to do this. but this may involve new fields. a simpler model may be implemented through the prompt project, but i think the really useful way is to make the model itself have such capabilities, so that it can work like a teacher.
i think this does touch on some areas that are currently underdeveloped. while simple versions may not be far away, such as getting some help by giving hints to the model, i'm talking about solutions that actually work, not just look good in a demo. what i'm talking about is the ability to work as effectively as a real teacher, understanding each person's context and providing personalized guidance, which requires further development.
Elad Gil: can we achieve this adaptation by introducing other models?
Andrej Karpathy: i think this is also a characteristic of ai. i think many functions can actually be implemented with just one prompt. so we often see a lot of demos, but can we finally deliver an actual product? so it may not be difficult to make some demos, but there is still a long way to go before it can be developed into a product that can be used on a large scale.
Sarah Guo: a few weeks ago you mentioned that learning and entertainment are different. learning should be challenging and require a certain incentive system, such as social status, idol effect, etc. to what extent do you think the incentive system can change people's motivation to learn? are you more concerned with providing resources to allow people to go as far as they can within their capabilities? or do you want to change the number of people who are willing to learn and guide more people to start learning?
Andrej Karpathy: i hope to make learning a little easier, since some people may not be naturally interested in learning. many people study out of practical needs, such as to find a job, which is very reasonable. education plays an important role in our society because it not only provides knowledge but also improves a person's economic status, which is why people want to be motivated by education.
Sarah Guo: what will our future look like in a post-agi society?
Andrej Karpathy: in the post-agi era, i think education will become more like entertainment. successful education lies not only in the transfer of knowledge, but also in the in-depth understanding and application of this knowledge.
Sarah Guo: who was eureka’s first audience?
Andrej Karpathy: the primary audience for this first course is undergraduate students, particularly those pursuing degrees in technical fields. if you are studying a technology-related undergraduate course, you are the ideal target group for this course.
Andrej Karpathy: i think our current concept of education is somewhat outdated. the old way of going to school, graduating and working all the time will be broken under today's changes. technology is changing rapidly and people need to keep learning. so although the course is for undergraduates, it actually has a wide audience. for example, i think people of any age can participate. especially for those with a technical background who want to gain a deeper understanding of relevant knowledge, there will be something to gain.
i plan to offer the course later this year, early next year may be a suitable time, but before then i will work hard to ensure that the quality of the course is up to the expected standard.
Elad Gil: if you had children, what knowledge and skills would you want them to learn?
Andrej Karpathy: the answer i would give is mathematics, physics, computer science and other subjects. these subjects actually provide very core training for the cultivation of thinking ability. of course, this perspective is influenced by my background, but i believe these areas are very helpful in terms of problem-solving skills. even as the future approaches the era of agi, these skills will still be important. during this critical period when people have a lot of time and attention, i think we should focus mainly on tasks that are relatively simple to perform rather than tasks that require a lot of memory. while i also recognize the importance of learning other subjects, i believe that 80% of the time should be focused on these core areas because they are more practical and have long-term value.
typesetting: fia