news

before apple's fall conference, we deeply combed through the apple intelligence panorama

2024-09-06

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

tencent technology author guo xiaojing wu bin

video planning wu bin

editor: zheng kejun

before the apple conference, we sorted out the panoramic view of apple intelligence

at wwdc24 (apple worldwide developers conference) in june this year, the world heard the term "apple intelligence" for the first time. apple used the homophonic pun of artificial intelligence (ai) to define its own apple ai. previously, apple avoided using the term "ai" when discussing its machine learning related functions.

to this day, although there is no "ai" in the latest apple fall conference invitation, it still reveals the various flavors of "ai".

note: the left one is the invitation letter for apple's 2024 autumn new product launch conference, and the right one is the invitation letter for wwdc24

for example, the theme of the invitation is "highlight moments", and the main color and the colorful halo around the apple logo are very consistent with the main color of the wwdc24 invitation.

figure note: siri's new icon and the iphone display effect after waking up siri

the color of siri is also changing. you can see that its new logo and the color scheme after waking up siri are completely different from the old siri.

note: 2023 apple fall conference, wwdc23 invitation, old siri logo, the color schemes show obvious differences

i even asked an ai product to help me analyze what ai-related elements were in the invitation to the autumn conference, and it told me:

graphic design: the logo is composed of lines surrounding the classic outline of apple, with a dynamic sense and ai intelligent interaction simulation. this dynamic graphic design may be simulating the smooth intelligent interaction experience brought by ai technology;

color tone: the invitation logo uses neon tones such as blue, purple, orange and pink. the combination of rich colors is very similar to the generative art of ai. these colors themselves may be related to the characteristics or application scenarios of ai-related functions in apple products.

visual connection: the visual connection between the invitation and siri may hint at the core position of ai technology in apple products, and that siri may be further upgraded and optimized in the future, better integrated with other ai functions, and provide users with smarter and more convenient services;

from not mentioning ai at all, to even using the homophonic pun of "ai" in the name, and even the color scheme of the invitation letter must be consistent with "ai". obviously, these designs are not coincidental, and apple is hinting that apple intelligence will still be the highlight of this fall's release.

so, what is the strength of apple intelligence? in this article, we will talk to you in detail about the following questions:

background: based on the existing information, what does the overall picture of apple intelligence look like?

progress review: how has apple intelligence progressed from june to august?

guess what we can expect from apple intelligence soon?

deep thinking: with apple's "ai", how will the future ecosystem evolve?

backstory: completely private and secure

at wwdc24, cook’s speech was very important:

“we are pleased to launcha new chapter in apple innovation.Apple Intelligencewill change the way users use our products——and what our products can do for users,our unique approachcombining generative ai and user needs to provide realuseful intelligenceit cancompletely private and secureaccess that information to help users do what matters most to them.ai that only apple can providewe can’t wait for users to experience what it can do.”

there are three key points in this passage (note the blue keywords):

1. combine generative ai with user needs to provideusefulintelligent

2. yesuniquethis is the ai ​​that only apple can provide

3、completely private and secureway

to do this, we need to solveall edge ai faces four core problems:

first, usefulness: perfectly meet the needs of users, rather than creating needs, and also consider how to interact

second, security: the model needs to be run locally

third, smoothness: problems that cannot be handled by local hardware can be solved by using larger models in the cloud.

fourth, security and complete privacy: once the cloud model is used, a large amount of personal information on the mobile phone is at risk of being leaked.

obviously, the repeatedly mentioned security and privacy are the biggest prerequisites for apple to provide any service to users.

it can be said that before apple, no manufacturer had come up with a very complete solution.

so what does apple think? let's take a look at the apple intelligence panorama, which may give us a glimpse into apple's overall logic in ai.

caption: apple intelligence panorama, translated by tencent technology

the personal intelligent system layer connects the top application layer that users are familiar with and apple's self-developed chip layer.

the personal intelligence system layer can be said to be the core structure of apple intelligence, and we can regard it as several parts.

the first part is afm-on-device (apple fondation model).this is a 3 billion parameter model, the most important part of edge ai. due to the high privacy and security requirements, running the edge model locally has the highest priority, and only things that it cannot do can be sent to the cloud.

however, the edge model has an impossible triangle: performance, number of parameters, and memory and power consumption, which is also the most troublesome problem for manufacturers.

excellent performance requires a large number of parameters, which means a large memory footprint and high power consumption. excessive power consumption may affect performance. so how should these three be balanced?

apple's solution is as follows:

low-bit palettization: this technology makes the model lighter, just like compressing high-definition photos so that they don’t take up too much space on the phone.

lora adapters: these gadgets allow the model to quickly learn new skills as needed, similar to lego blocks, to build various shapes.

talaria tool: this tool helps monitor and adjust the energy consumption of the model to ensure that it does not consume too much power.

grouped query attention: allows the model to quickly focus on important information, just like using tags to quickly find books.

shared vocabulary: by sharing vocabulary, memory usage is reduced, just like using one dictionary for everyone to look up words, saving space.

in short, these optimization techniques make afm-on-devicekeep smart while saving power and responding quickly

the lora adapter (low-rank adaptation adapter) is the biggest highlight. it is a technical tool used to optimize machine learning models, especially large language models and generative models. it is like adding a special "summarize information" or "reply email" gadget to the basic model, allowing the model to better complete these specific tasks.

video: lora adapter working instructions

in this way, we can achieve performance comparable to other mainstream models with a scale of more than 7 billion using only 3 billion parameters (according to official test results from apple).

the talaria tool combines other technologies (such as group query attention, shared input-output vocabulary, low-bit quantization, hybrid configuration strategies, activation quantization, and embedding quantization). apple's model can achieve a latency of about 0.6 milliseconds and a generation rate of 30 tokens per second on the iphone 15 pro. while meeting performance requirements, it can also reduce phone power and memory pressure.

but even so, according to apple's official introduction, running apple intelligence still requires a minimum iphone 15 pro configuration. and according to tencent technology, at least 8g of memory is required to support running apple intelligence.

therefore, at this stage, the capabilities of the edge model with 3 billion parameters are the upper limit of the problems that the local ai brain of iphone and mac can solve.

therefore, complex calculations still need to be sent to the cloud and processed through models with larger parameter counts.

this is the second important component of apple intelligence, the cloud model (afm server).

it is worth mentioning here that the role of the orchestration layer is to determine whether the user's needs should be solved on the client side or uploaded to the cloud, similar to a commander. apple does not perform any manual intervention here, relying entirely on the algorithm to make its own judgment. users cannot decide whether their data should be stored only on the client side.

apple did not disclose the exact number of parameters in the cloud model, but unlike the client-side model, which is a distillation of a larger model, it is trained from scratch and uses some advanced training methods common to the client-side model.

the most important feature of this cloud model is to realize what cook mentioned“completely private and secure”, which provides protection through private cloud computing pcc (private cloud compute).

how is it possible to be so secretive? there are many professional techniques involved. let's first simply reproduce the whole process:

user-initiated request: for example, you ask siri on your phone: "what time should i pick up my kids from school so i can make it to the office meeting?"

secure encapsulation: your phone immediately encrypts the request into a “secret package” that only the pcc can decrypt.

secret delivery: this "secret package" is sent to pcc through a secure secret channel. this channel is like a tunnel with a password lock, and only those who know the password can pass through.

pcc decrypts and processes: after receiving the package, pcc uses its super brain (powerful ai model) to understand your request and find the answer. in this process, pcc will use various technologies to ensure that your data will not be leaked or abused.

data is deleted after use: after pcc finds the answer, it will immediately delete all your temporarily saved data, just like erasing words on a blackboard with water, leaving no trace.

return result: pcc then re-encrypts the answer and sends it back to your phone through a secure channel. you unlock your phone and get the conclusion.

in short, pcc encrypts your needs for you, transmits them to ai for processing in a secret channel, and deletes them after processing. so what technologies does pcc use to ensure that user data will not be leaked or abused?

to use a vivid but not completely rigorous metaphor to understand: pcc is like a specially designed safe. even if you can get into the safe, it is difficult to find where specific valuables are placed because they are randomly stored in different places. this is the so-called "target diffusion" technology.

in addition, the doors and locks of the safe are very strong and can only be opened by verified employees (that is, pcc nodes). moreover, a special key (encryption technology) is required to open the door each time. these keys are one-time and will become invalid after use. in this way, even if someone steals the key, they cannot open the door.

most importantly, even if you can get into the vault, your valuables will be quickly put back and locked away after being taken out, leaving no trace, so that even if someone wants to find their previous location, they cannot do so.

currently, only apple is able to implement this layered security solution.

this brings us to the third important part of apple intelligence, the self-developed chip layer.these security-enhancing servers all use apple’s self-developed m2 ultra chip.

these chips provide strong encryption capabilities and are capable of executing complex encryption algorithms to ensure the security of data during transmission and processing. they also integrate multiple hardware security features, such as the secure enclave, which is an isolated hardware area dedicated to securely processing encryption keys and sensitive operations, ensuring that user data is protected even at the server level.

additionally, the chips support secure boot technology, which ensures that servers can only run software signed by apple, preventing malware from loading when the system starts.

whether it is the iphone's a series chips or the mac's m series chips, they are all completely self-developed by apple. this means that these chips are born with the apple system in mind, and perfectly support apple intelligence in terms of performance, power consumption, security, etc. this is an advantage that other manufacturers do not have at all.

these three characteristics constitute the entire core of apple's intelligence: the terminal model that runs locally and a series of basic models with specific functions, the cloud model provided through private cloud computing services, and a powerful self-developed chip layer.

at this point, you may ask, where is openai? wasn’t it rumored that apple intelligence’s basic model was from openai?

apple has not announced too many details about its cooperation with openai. judging from the results shown so far, chatgpt is not pre-installed in the system in the form of an independent app. apple only gave openai an application programming interface and integrated chatgpt into the system, just like the cooperation between apple and google search. openai will not obtain higher permissions in it.

during this process, users have the right to actively choose whether to use openai's services. after the data is handed over to openai, apple will not be responsible for the subsequent data security.

moreover, openai will not be the only partner. according to foreign media reports, apple is also discussing cooperation with google on large models. therefore, openai should be one of the partners in the apple intelligence ecosystem.

figure: apple's official diagram of gpt embedded in iphone

progress review: from june to august,

how is apple intelligence progressing?

with such base support, users can fully experience the product functions.

let’s first review the specific features of apple intelligence announced at wwdc24 in june, which can be divided into the following categories:

writing tools can help you proofread, rewrite content according to style, summarize text content, etc.

image generation (image playground) generates interesting and fun images based on prompt words

genmoji generates interesting and personalized emojis

a more advanced siri with more natural and personalized interactions and deeper integration with the system.

just this past august, developers in north americai have started using eligible apple hardware devices to experience some of the features of apple intelligence.

the existing functions are more finely divided than those announced at wwdc24, but the overall functions are still text assistance (writing, summarizing, email replying, etc.), image generation (photo processing, genmoji), phone recording and organization, etc. the following is an incomplete list of functions compiled by tencent technology based on public information:

we can see that the functions related to text processing have made the fastest progress.

in terms of multimodal processing, it seems that only the client-side functions, such as photo search and call recording, have been launched, while the image generation function has not yet been launched. multimodal functions that require the use of cloud capabilities do not seem to be ready yet.

simple functions integrated into the system, such as focus modes and prioritizing important notifications, should all be processed purely on the client side and have already been launched. simple functions such as siri's new ui effects have already been launched.

however, the much-anticipated siri upgrade that is deeply integrated with the system, which can be transferred between different apps through voice commands, and the integration of the third-party openai large model gpt, have not yet been launched. according to apple's official website, updates to other languages ​​(non-english) and software platform capabilities will be completed in the next year.

judging from the progress bar, apple has only completed about 30% of the promise it made at wwdc24.

however, if users want to use it, there are strict prerequisites: the device used must be a subsequent model of iphone 15 pro, iphone 15 pro max, or an ipad and mac with a later m1 chip. in addition, the language of siri and the device needs to be set to english (united states).

figure note: apple's official website announced the hardware and system conditions for experiencing apple's smart

this year's autumn new product launch conference,

what are you most looking forward to?

the prototype of apple intelligence has been formed, but it will take a long time before all apple users can use it.

if your product meets the hardware, language, and regional requirements set by apple, you will have your first encounter with apple intelligence in the fall.of course, the new product launch may not be in september. according to foreign media reports, it should be in october.

in addition to apple intelligence, we can also look forward to the a18 chip on the iphone 16 at this fall new product launch conference.

according to the information currently exposed,apple will use tsmc's n3e process for the a18 chip, the same as the m4compared with the n3b process used on a17 pro last year, n3e has greater advantages in improving energy efficiency.

what is the n3e process?

it is equivalent to an internal upgrade of the mobile phone chip. although the size of the house (basic unit) has not changed, the roads (circuits) and facilities (transistors) have been redesigned to make the entire area (chip) run more efficiently. this is like making urban traffic smoother and residents save electricity. therefore, the a18 chip using this process can make users feel that the phone responds faster, the battery lasts longer, and it is smoother to handle multiple tasks at the same time.

it is rumored that the npu of the a18 soc will be greatly upgraded, and the overall computing power will be higher than the 38tops of the m4.

this means that the npu computing power of the upcoming iphone 16 is already comparable to apple's best desktop system. in order to meet the memory threshold of apple intelligence, the iphone 16 will also increase the body running memory to 8gb for the first time.

from memory to power consumption to computing power, everything seems to be designed for apple intelligence.

the npu computing power of the a17 pro launched by apple last year is 35 tops, and the a18 will only be higher

obviously, starting from september this year, all hardware products launched by apple in the future will actively embrace apple intelligence.

not only limited to iphone, mac and ipad, in the future even products such as apple watch, homepod and vision pro may become part of apple's ai strategy.

at wwdc24, apple's senior vice president craig demonstrated a fast and relevant apple ai usage scenario: receiving a temporary notification of a meeting time change, asking siri if he could still catch up with the child's original activity.

he continued to talk to siri, and the phone app smoothly jumped between multiple apps such as email, calendar, maps, etc. in the end, he didn't lift a finger, and the phone gave a reasonable suggestion.

this may be the ideal state of future ai-side devices: with just one command, it can automatically call out the required app and complete the desired task.

in this demonstration scenario, we can see that the apps called by siri are all official apple apps, that is, it mobilized "family members" to cooperate in doing one thing. in this case, the system, architecture, interface, everything is not a problem, and there is no issue of any distribution of benefits.

to be a little more exaggerated, perhaps in the future siri will not need any command words. it will always be waiting for conversations in the background, and can even interrupt people when they are communicating, just like a real friend.

Apple Intelligence

can it continue to be the ecological king?

apple has demonstrated the ideal ai mobile phone interaction method in the future by using the method with the least resistance. but if it is not its own app, can it still freely mobilize the data in its app?

there is a sentence on the official introduction page of apple intelligence: "if you use standard ui frameworks, apis, and development kits, you can easily give your developed apps these ai features."

even developers can see in the wwdc24 workshop that they can integrate ai functions into their own apps with just three or four lines of code.

there are two pieces of information here: third-party applications are welcome to join ai, and apple will prepare all the kits and tools to allow developers to use apple intelligence in the easiest way. this is indeed a delicious "appetizer".

but if they want to access apple intelligence, apps must hand over their “data” and become a small member of the apple ecosystem. is this really that simple?

for apple, the technical breakthroughs at the beginning of the article are actually the simplest problems, while the ecological difficulties are the huge mountains standing in apple's way.

if the interaction of the mobile phone is really as craig demonstrated, apple will become the "king" with the only entrance, and siri will become the only "powerful" person around the king, responsible for deciding which app the "king" wants to meet with users, or does he only want to meet with "family members"?

in that case, apple will become the king of all kinds of super apps.

who will decide the distribution of benefits? will the business models once established by super apps be wiped out overnight by ai? these questions are still waiting for us to think about.

final thoughts

siri carries jobs' romantic dream. dag kittlaus, one of the founders of siri, once said that he and jobs "chatted for three hours" at home when describing his contact with jobs. jobs was full of longing for the future of artificial intelligence, and he made siri's initial members believe that "siri will eventually leave its mark in the universe."

caption: dag kittlaus, one of the founders of siri

however, the day after siri was released, jobs passed away, leaving behind this ai dream and siri, which had been "wandering for many years".

today, cook has put siri back in the spotlight with apple intelligence, but the dream may be different.

today's apple is full of realism - maintain technological leadership, effectively defend, keep the stock price unpressured, and wait until the super applications of the ai ​​era take shape before considering whether to take action.

however, while everyone is looking forward to ai phones and ai pcs, there may be more surprising ai-native hardware products that will completely rewrite the story.

references:

Introducing Apple’s On-Device and Server Foundation Models - Apple Machine Learning Research

Apple Intelligence - Apple Developer

Introducing Apple Intelligence for iPhone, iPad, and Mac - Apple

Blog - Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research

https://mrmad.com.tw/ios-18-new-function