meta connect 2024: llama 3.2 is here, ar glasses orian debuts

2024-09-26

tencent technology author wu bin hao boyang

editor: zheng kejun

at 1:00 am beijing time on september 26, the annual meta connect 2024 was held in menlo park, california. zuckerberg opened with the expected new product quest 3s, announcing that meta's dream of the metaverse began to spread to the entry-level consumer market.

at the same time, meta also announced the latest 3.2 version of its ai big model llama, which provides visual multimodal capabilities and combines the latest ai voice dialogue capabilities ai voice. the greatest significance of these model releases is that meta has completed all mainstream multimodalities, which lays a solid foundation for its integration of ai and xr hardware. at the same time, several ai applications announced by meta, such as ai real-time translation and real-time ai digital humans comparable to "her 2.0", have also taken a step further in industry solutions and have the potential to become killer applications.

in addition, zuckerberg called the ar glasses orion released this time the most powerful ar glasses on the planet, which also provides more entrances to the future metaverse world. although it is still not perfect, it is also a product that carries the ambition of the final form of ai space computing equipment. according to foreign media reports, the cost of this product exceeds 10,000 us dollars.

mate continues to move forward on the two paths of the metaverse and ai, and strives to combine virtual reality and artificial intelligence through product integration.

at the connect conference in 2024, zuckerberg once again showed the world the entrance form of the virtual reality world he has planned. we saw the integration of virtual and reality, the integration of autonomous control and artificial intelligence, and the integration of real product forms and future operating experience.

zuckerberg summarizes the conference: launching 5 new products, meta is working hard to build a more open future

meta quest 3s is launched, with a price cut but no reduction in features

meta quest 3s was the first to appear, and this time it rarely released its product price before its parameters and new features.

zuckerberg was eager to let everyone know that this new vr glasses only costs $299.99, which is $200 cheaper than the quest 3 released last year, and it has not reduced much compared to the quest 3.

the meta quest 3s has the same qualcomm snapdragon xr2 gen 2 chip and 8gb of ram as its "big brother" quest 3, which means that they both have exactly the same processing power for computing data.

at the same time, quest 3s uses the same touch plus controller as quest 3, supporting eye and hand motion capture technology.

even more, quest 3s has a longer battery life. according to official data from meta, the quest 3s with a built-in 4324mah battery can achieve a battery life of 2.5 hours, while the quest 3 with a larger battery capacity (5060 mah) can only work for a maximum of 2.2 hours.

as a cheap virtual reality glasses product, the pity of quest 3s is that it does not use the more mainstream pancake optical structure. it uses the same fresnel lens as the quest 1 and 2 series. this design is relatively more mature and low-cost, but it also means that its overall size and weight will be larger than quest 3.

but the real difference between the two products is the display effect: quest 3s uses an 1832 x 1920 (20 ppd pixels per degree) fast switching lcd with a refresh rate of 90/120 hz.

its horizontal and vertical fields of view are only 96 and 90 degrees, respectively, while the quest 3 has a 2064 x 2208 (25 ppd pixels per degree) and a 110-degree horizontal and 96-degree vertical field of view.

obviously, quest 3s demonstrates meta's ambition in its years-long dream of the metaverse. it uses virtual reality glasses with core configurations close to mainstream products but at a cheaper price to promote the popularization of vr glasses to a wider user group.

another driving force behind meta's creation of such a product is its virtual reality ecosystem, which is still being gradually improved.

at the press conference, zuckerberg said that quest 3s will support dolby vision technology and add screen recognition function, which can recognize the computer being used and cast the screen with one click. this expands the use scenarios of virtual reality headsets such as quest 3s.

however, meta messed up one thing. when demonstrating the technology on site, quest 3s unexpectedly crashed, which caused cheers from the audience. zuckerberg had no choice but to quickly brush it off.

later, zuckerberg demonstrated the latest version of horizon worlds. although it still looks very much like a "house-playing" game, it is slowly making progress: this year, avantar's function of allowing multiple people to watch youtube together was launched, hoping to attract more youtube users.

in terms of third-party applications, the biggest surprise is batman: arkham shadow, which will be released on october 22 and will be bundled with the new quest 3 and 3s until april next year. previously announced aliens: rogue invasion and zombie game arizona sunshine will also be available on the quest platform. in addition, meta also announced that it will launch wordle (a word game updated daily by the new york times) for quest.

oh, by the way, in order to get you to buy the new quest 3s, meta has proactively discontinued the production of quest 2 and quest pro. after selling out the stock of these two products, your only choice is the cheaper quest 3s or the higher-configured quest 3.

zuckerberg says the most powerful end-to-end model llama 3.2 is here

meta launches llama 3.2 multimodal large model, lightweight version can run on mobile phones

like the previous connect conference, ai is also the protagonist of this conference.

zuckerberg announced the 3.2 version update of his basic model llama. the large model is available in two versions: 90b and 11b, while the end model is available in two sizes: 1b and 3b.

zuckerberg demonstrated a new product feature developed for llama 3.2: by uploading pictures, not only can you clear and add magic brush functions, but you can also directly change the character's clothing based on the text description, and even replace the current background with a rainbow.

according to the technical documentation provided by meta, llama 3.2 can be directly understood as a llama 3.1 version that supports multimodality. this is because meta did not update the parameters of its language model during the image recognition training process.

meta used a more conventional approach in terms of training methods. it added an image adapter and encoder to llama3.1, trained the corresponding text and images with the diffusion model, and then refined the domain content.

finally, in the post-training stage of adjusting the model, llama 3.2 also aligns the model through multiple rounds of supervised fine-tuning, rejection sampling (using an auxiliary distribution to generate samples and accept or reject samples with a certain probability) and direct preference optimization.

interestingly, during this process, meta uses llama 3.1 to generate multiple sets of image captions to optimize the model’s description of the image.

meta uses llama 3.1 to generate multiple sets of image captions to optimize the model's description of the image

in the test results given by meta, the graph reasoning capability of the 90b version of llama 3.2 is ahead of gpt 4o-mini in multiple tests, while the 11b version completely surpasses the small version of claude 3, the haiku version.

zuckerberg said that the edge versions 1b and 3b of llama 3.2 will be the strongest edge ai.

it currently accepts text input and output and supports a maximum context length of 128k tokens. the two end-side models are trained by pruning (pruning the less utilized parameters in the large model) and distilling (using the large model as a teacher and the core parameter training mode for small model learning) llama 3.1 8b and 70b. synthetic data provided by llama 3.1 405b is also added during the fine-tuning training process to optimize its performance in multiple capabilities such as summarization, rewriting, instruction following, language reasoning, and tool use.

the press conference showed that the 3b version of llama 3.2 surpassed the gemma 2 2b model released by google in june and the phi 3.5 3.8b model released by microsoft in august in many indicators, especially in tasks commonly used on the end side such as summarization, instruction following and rewriting.

for example, on the ifeval test set that tests the ability to follow user instructions, llama 3.2 3b is more than 20% better than phi 3.5 of the same size. llama 3.2 also has a clear advantage on the two benchmarks that test the ability to call tools.

this makes llama 3.2 the "strongest" in terms of actual application experience on the end side, as zuckerberg said. however, in terms of basic capabilities such as reasoning and mathematics, llama 3.2 3b mostly lags behind phi 3.5 mini.

additionally, these models support qualcomm and mediatek hardware on the day of release and are optimized for arm processors.

in addition to llama 3.2, which supports multimodal understanding of images, meta also launched meta ai voice at this connect. it has all the mainstream multimodal functions at once. in the live demonstration, it can support conversation interruption like gpt-4o, and the voice is also very natural, but unfortunately it did not show the rich intonation and emotional expression of gpt-4o.

although its performance is only on par with gpt-4o, meta ai voice has found a new selling point: it provides voice options of 5 celebrities, such as judi dench, who played the cold-faced female boss in 007, and lin jiazhen, the heroine of crazy rich asians.

compared to openai, which was sued for allegedly stealing scarlett johansson's voice, meta is clearly more reliable in this regard. according to the wall street journal, meta has paid "millions of dollars" for each celebrity's voice. some celebrities want to limit the scope of use of their voices and ensure that they will not be held responsible when meta ai is used.

according to reuters, the celebrity voices will be available this week in the united states and other english-speaking markets through meta's family of apps, which include facebook, instagram and whatsapp.

in addition to supplementing the basic model capabilities, meta also demonstrated some new features in ai applications. these features are largely supported by existing ai solutions, but meta has gone a step further. they are also more suitable for the use scenarios of social media or ai glasses.

for example, meta ai studio now supports the direct construction of ai digital human systems. in the live demonstration, the delay of talking to the digital human was very low, and its movements and sounds were very real and natural.

meta ai studio supports direct construction of ai digital human systems

imagine having an ai that talks to you with such a real voice and face as your emotional companion. i would like to call it the "visible" her 2.0.

whether it will usher in the golden age of ai companion products remains to be further tested by users.

another amazing product is meta live translation, which can directly recognize and replace the mouth shape of the original language with the mouth shape of the target language with meta ai's new multimodal capabilities. this function has actually been realized by companies such as heygen, but based on the breadth of meta's application, it may become the first fully popular related product.

although llama 3.1 is already the most widely used open source model among developers, in order to better expand it at the application layer, meta also released the first official llama product development tool llama stack at the connect conference. it can greatly simplify the workflow of developers using llama models in different environments, and can also achieve one-click deployment of tooled applications with retrieval enhancement generation (rag) and integrated security functions.

the release of llama 3.2 is of great significance to meta. it fills the core gap of llama in cutting-edge multimodal models and also provides a foundation for its subsequent ai hardware products, such as the multimodal functions of ai-enabled ray-ban glasses.

the best-selling product "ray-ban glasses" is launched while the iron is hot

at last year's meta connect conference, probably no one expected that the most popular product would not be quest 3, but the second-generation ai glasses product launched by meta and eyewear manufacturer ray-ban.

although the first generation is unknown, it does not prevent european and american technology enthusiasts from rushing to buy the second generation of ray-ban smart glasses. according to idc statistics, meta has shipped more than 700,000 pairs of ray-ban glasses, especially in the second quarter of this year, the order volume has more than doubled compared with the first quarter. during the life cycle of ray-ban meta glasses, as of may 2024, its global sales have exceeded 1 million units, and the market expects that the annual shipment volume in 2024 is expected to exceed 1.5 million.

meta struck while the iron was hot and immediately launched its new product this year.

rather than a new product, it is more accurate to say that this is a brand new translucent style, as its overall design is exactly the same as last year.

but it has a transparent glasses body with a stronger sense of technology - it turns out that hardware companies all over the world have a similar understanding of "sense of technology", which is that it must be translucent.

meta has added more ai features to this generation of glasses. the biggest improvement is the addition of real-time artificial intelligence image recognition, which allows users to ask ray-ban meta glasses about the scene or object they are currently seeing. users can also scan qr codes directly through the glasses and call the phone number they see in their sight.

the sunglasses also support smartphone-like reminders, real-time language translation between english, french, italian or spanish, and integration with music streaming apps like amazon music, audible and iheart radio.

orian, the ultimate form of ar glasses described by meta?

orian should have been mass-produced a long time ago, but because of the epidemic, meta tightened its budget across the board, and zuckerberg decided to postpone the release, which resulted in meta not launching its first ar glasses product until 2024.

this is a pair of ar glasses that weighs only 98 grams, which is not particularly light among ar glasses products.

orian's frames are made of magnesium alloy, which is lighter and easier to dissipate heat than aluminum alloy. the lenses are made of silicon carbide, which is durable, lightweight, and has a high refractive index, which allows the light emitted by the projector on the glasses to be expanded to a larger field of view.

but calling orian a pair of ar glasses doesn’t seem to be rigorous. to work properly, it needs to work with a wristband and a computing body.

the computing body provides more processing power, and the glasses cannot work independently away from it. if you want to use orian normally, you have to wear the computing body with you all the time.

the wristband is more interesting. it is made of high-performance textile materials and uses electromyography (emg) to understand the neural signals related to gestures. within milliseconds, these signals are converted into input signals and conveyed to the computing subject, which is a bit like a science fiction movie.

in terms of display, orion has a 70-degree field of view and is equipped with a micro led projector inside the frame, which can project images onto the silicon substrate of the lens. this is similar to the working principle of all current ar glasses.

zuckerberg said he hopes people will use orion for two main purposes: interacting with digital information overlaid on the real world and interacting with artificial intelligence.

the latter is easier to understand. orion has the same ai capabilities as ray-ban meta glasses, including newly added image recognition and language interaction capabilities.

the former is more abstract. at the scene, meta demonstrated the combination of holographic images and the real world. meta developed an ar version of the messenger application for this pair of glasses, which can realize real-time holographic projection video calls, as if the other party is standing next to you.

in order to promote ar glasses, meta also brought out huang renxun as the first wave of users to experience orion. zuckerberg said: "huang renxun said it was good after trying it!"

in zuckerberg's view, the road to maturity of ar glasses will be a gradual process. on the one hand, it will be more rapidly popularized through non-display artificial intelligence glasses, such as ray-ban meta.

on the other hand, it will be popularized through glasses with small displays, such as meta's upcoming hypernova, which can provide easier touch interactions, such as interacting with meta ai or communicating with friends.

zuckerberg said that orion represents the final form of ar glasses: mature ar glasses have enough computing power so that you can leave your smartphone at home.

having said that, although we are separated from our mobile phones, we still have to bring a computing entity with us when we go out, which is somewhat different from the final form we imagined.

in addition, there is a basin of cold water poured down in reality: orion's battery life is only 2 hours. at most, orion can only let you be a superhero in the virtual world for 2 hours.

and achieving the ultimate freedom of ar glasses may not be cheap. according to foreign media such as the verge and techcrunch, when showing the test machine, meta staff said that the current hardware cost of orion exceeds $10,000. this means that the price of this product is much higher than apple's vision pro.

conclusion

from being ridiculed by many for failing in the metaverse in 2022, to becoming the king of open source ai in 2023, and then to opening the door to a new generation of ai hardware with smart glasses this year, zuckerberg has accomplished an almost impossible jedi counterattack in the past three years.

during this period, he made two important decisions: to do open source ai and to develop lightweight smart glasses, both of which came to fruition at today's connect.

from the display of ray-ban glasses equipped with ai functions, we can indeed see the advantages of glasses as a carrier in the ai era: it can not only use voice to call large models, but also directly utilize the potential of multimodal ai. direct "looking" is much more natural in user experience than "scanning" with a mobile phone. and this directness is likely to determine the transfer of the next generation of smart devices.

the last released orion is a future work that carries the ambition of the final form of ai spatial computing devices. compared with the heavy and uncomfortable vision pro, zuckerberg's lightweight mr vision is more like the future of spatial computing. and now this vision has taken shape. if the migration of smart devices in the ai era is destined to happen, meta is the company closest to its threshold compared to small attempts such as ai pin.

if you talk about all this with someone in 2022, he will definitely not believe it.zuckerberg, a seemingly stubborn tech geek, really keeps his promise and brings us closer and closer to the entrance to the metaverse.

news