news

equipping robots with "worm brains"? non-transformer liquid neural network!

2024-10-01

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

machine heart report

machine heart editorial department

a new architecture inspired by c. elegans, all three "cup shapes" can achieve sota performance, and can be deployed in highly resource-constrained environments. mobile robots may need the brains of a bug.

in the era of large models, the transformer proposed in google's 2017 seminal paper "attention is all you need" has become a mainstream architecture.

however, liquid ai, a startup just co-founded by former researchers from mit's computer science and artificial intelligence laboratory (csail), has taken a different route.

liquid ai says their goal is to "explore ways to build models beyond the base generative pre-trained transformer (gpt)."

to achieve this goal, liquid ai launched its first multi-modal ai models: liquid foundation models (lfm). this is a new generation of generative ai models built from first principles, with 1b, 3b and 40b lfms achieving sota performance at all scales while maintaining a smaller memory footprint and more efficient inference.

liquid ai post-training director maxime labonne said on x that lfm is the version he is most proud of in his career. the core advantage of lfm is that they can outperform transformer-based models while taking up less memory.

some people say that lfm is the terminator of transformer.

some netizens praised lfm as a game changer.

some netizens believe that "it may be time to abandon transformers. this new architecture looks very promising."

liquid ai releases three models

the lfm range is available in three different sizes and variants:

  • intensive lfm 1.3b (minimum), ideal for highly resource-constrained environments.

  • dense lfm 3b, optimized for edge deployment.

  • lfm 40.3b moe model (the largest, mistral-like expert hybrid model), designed to handle more complex tasks.

sota performance

comparison of lfm-1b with equivalent scale models. the lfm-1b achieved top scores in every benchmark test, making it the most advanced model in its scale. this is the first time that a non-gpt architecture significantly outperforms transformer-based models. for example, lfm 1.3b outperformed meta's llama 3.2-1.2b and microsoft's phi-1.5 in third-party benchmarks.

lfm-3b achieves incredible performance, ranking first in comparisons with 3b transformer models, hybrid models, and rnn models. it is also comparable to phi-3.5-mini in multiple benchmark tests, while being 18.4% smaller. it can be seen that lfm-3b is ideal for mobile and other edge text applications.

lfm-40b achieves a new balance between model size and output quality. it can activate 12b parameters at runtime, with performance comparable to larger models, while the moe architecture enables higher throughput and can be deployed on more cost-effective hardware.

memory efficient

lfm takes up less memory compared to the transformer architecture. this is especially true for long inputs, since the kv cache in transformer-based llm grows linearly with the sequence length. by efficiently compressing the input, lfm can process longer sequences on the same hardware. lfm occupies the least memory compared to other class 3b models. for example, the lfm-3b requires only 16 gb of memory, while meta's llama-3.2-3b requires more than 48 gb of memory.

lfm really takes advantage of context length

the table below compares the performance of several models under different context lengths.

this efficient context window enables long-context tasks on edge devices for the first time. for developers, it unlocks new applications, including document analysis and summarization, more meaningful interactions with context-aware chatbots, and improved retrieval-augmented generation (rag) performance.

these models are competitive not only in raw performance benchmarks, but also in operational efficiency, making them ideal for a variety of use cases, from enterprise-grade applications to the edge in financial services, biotech and consumer electronics. deployment of equipment.

users can access it through lambda chat or perplexity ai, etc.

how liquid goes beyond generative pre-trained transformer (gpt)

liquid uses a hybrid of computational units that are deeply rooted in the theories of dynamic systems theory, signal processing, and numerical linear algebra. the result was the development of general-purpose ai models that can be used to simulate any type of sequence data, including video, audio, text, time series and signals, to train its new lfm.

as early as last year, liquid ai used a method called lnn (liquid neural networks). unlike traditional deep learning models that require thousands of neurons to perform complex tasks, lnn shows that fewer neurons (combined with innovative mathematical formulas) can achieve the same results.

liquid ai’s new models retain the core benefit of this adaptability, allowing for real-time adjustments during inference without the computational overhead associated with traditional models. can efficiently handle up to 1 million tokens while minimizing memory usage.

for example, in terms of inference memory footprint, the lfm-3b model outperforms popular models such as google's gemma-2, microsoft's phi-3, and meta's llama-3.2, especially when the token length is extended.

while other models experience a dramatic increase in memory usage when processing long contexts, lfm-3b takes up much less space, making it ideal for applications that require heavy sequential data processing, such as document analysis or chatbots.

liquid ai has built its foundational model as a universal model across multiple data modalities, including audio, video, and text.

with this multi-modal capability, liquid aims to solve a variety of industry-specific challenges ranging from financial services to biotech and consumer electronics.

liquid ai is optimizing its models for products from multiple hardware manufacturers, including nvidia, amd, apple, qualcomm and cerebras.

liquid ai is inviting early users and developers to test their new models and provide feedback. while the model isn't perfect yet, the company plans to use the feedback to improve the product. they will hold an official launch event on october 23, 2024 at mit.

in an effort to maintain transparency and advance science, the company plans to publish a series of technical blog posts in advance of the launch. they also encourage users to conduct red team testing to explore the limits of the model to help improve future versions.

lfm introduced by liquid ai combines high performance and efficient memory usage, providing a powerful alternative to traditional transformer-based models. this makes liquid ai expected to become an important player in the field of basic models.

liquid ai: starting with a tiny bug

this startup, which openly competes with openai and other large language model companies, was incubated by mit's computer science and artificial intelligence laboratory csail and was founded in march 2023.

in december 2023, the company received us$37.5 million in seed round financing, with a valuation reaching 300 million.

investors include github co-founder tom preston werner, shopify co-founder tobias lütke, red hat co-founder bob young, etc.

daniela rus, director of mit csail, is one of the company's founders. this famous roboticist and computer scientist is also the first female director of the laboratory.

in addition to daniela rus, the other three co-founders of liquid ai were all postdoctoral researchers at mit csail.

co-founder and ceo ramin hasani was the chief artificial intelligence scientist at vanguard, one of the largest fund management companies in the united states, before engaging in postdoctoral research at mit csail.

co-founder and cto mathias lechner had studied the neural structure of nematodes with hasani as early as when they were students at the technical university of vienna.

co-founder and chief scientific officer alexander amini was a phd student of daniela rus.

the four founders (from left to right) ceo ramin hasani, daniela rus, chief scientific officer alexander amini and cto mathias lechner

in 2017, daniela rus "digged" hasani and lechner to mit csail, and rus and her doctoral student amini also joined the research on liquid neural networks.

daniela rus pointed out that generative ai has obvious limitations in terms of safety, interpretability and computing power, making it difficult to be used to solve robot problems, especially mobile robots.

inspired by the neural structure of the nematode caenorhabditis elegans, a "frequent guest" in the scientific research community, daniela rus and postdoctoral researchers in her laboratory developed a new type of flexible neural network, also known as a liquid neural network.

caenorhabditis elegans is also the only organism for which connectome determination has been completed (as of 2019). although the brain is simple, it is also much better at learning and adapting to the environment than any current artificial intelligence system.

caenorhabditis elegans is only 1 mm long, has only 302 neurons, and 96 muscles, but it is capable of complex intelligent behaviors such as sensing, escaping, foraging, and mating.

it is the simplest living intelligent agent and the smallest carrier for realizing general artificial intelligence through simulation of biological neural mechanisms.

in recent years, scientific researchers have also used research results on c. elegans nerves to conduct computer biological simulations. by studying how the c. elegans brain works, daniela rus and others designed a "liquid time-constant networks":

a continuous-time model consisting of multiple simple dynamic systems that regulate each other through nonlinear gates.

if we say that a standard neural network is like a layer of evenly spaced dams, with many valves (weights) installed on each layer of dams, the calculated torrent must pass through these valves every time it passes through a layer of dams, and then rush to the next level.

well, liquid neural networks don't need dams because each neuron is controlled by a differential equation (ode).

this type of network is characterized by variable time constants and the output is obtained by solving differential equations. research shows that it outperforms traditional models in terms of stability, expressiveness, and time series prediction.

later, daniela rus and others proposed an approximation method that can use closed-form solutions to efficiently simulate the interaction between neurons and synapses (closed-form continuous-time neural networks), which not only greatly improved the calculation of the model speed, also shows better scalability, and performs well in time series modeling, outperforming many advanced recurrent neural network models.

liquid ai team members have claimed that the architecture is suitable for analyzing any phenomenon that fluctuates over time, including video processing, autonomous driving, brain and heart monitoring, financial trading (stock quotes), and weather forecasts.

in addition to being flexible like a liquid, another characteristic of liquid neural networks is that they are much smaller in scale than generative ai models that often have billions of parameters.

for example, lfm 1.3b, which can be deployed in highly resource-constrained environments, has only 1.3b parameters (similar to the gpt-2 maximum version 1.5b), while maintaining a smaller memory footprint and more efficient inference, and can be used in various run on the robot hardware platform.

in addition, liquid neural networks also have the advantage of interpretability due to their small size and simple architecture.

however, it remains to be seen how the new architecture will compete with mainstream models from competitors such as openai.

hasani has said that liquid ai currently has no plans to develop applications like chatgpt for consumers. the company is first focusing on corporate clients looking to model financial and medical research.

reference links:

https://venturebeat.com/ai/the-tireless-teammate-how-agentic-ai-is-reshaping-development-teams/

https://arxiv.org/abs/2106.13898

https://arxiv.org/abs/2006.04439

https://www.jiqizhixin.com/articles/2023-12-12?from=synced&keyword=Liquid%20AI