amd releases the first ai small language model: 690 billion tokens, speculative decoding speed up 3.88 times

amd releases its first ai small language model: 690 billion tokens, speculative decoding speed up 3.88 times

2024-10-01

kuai technology news on october 1,amd released its first small language model (slm), named "amd-135m".

compared with the increasingly large large language model (llm), it is smaller, more flexible, and more targeted, and is very suitable for deployment in private and professional enterprises.

the amd-135 small model belongs to the llama family and has two versions:

one is the basic type "AMD-Llama-135M”, with as many as670 billion tokens were trained for six days on eight instinct mim250 64gb accelerators.

the second is the extended "AMD-Llama-135M-code”, with additional chapters specifically focused on programming20 billion tokens, trained on the same hardware for four days.

creation and deployment process

it uses a method called"speculative decoding"this method generates multiple candidate tokens in a single forward pass through a smaller draft model, and then sends them to a larger and more accurate target model for verification or correction.

this method can generate multiple tokens at the same time without affecting performance and can also reduce memory usage. however, because there are more data transactions, power consumption will also increase.

amd also used amd-llama-135m-code as a draft model for codellama-7b to test performance with or without speculative decoding.

for example, on the mi250 accelerator, the performance can be improved by up to about 2.8 times, on the ryzen ai cpu, it can be improved by up to about 3.88 times, and on the ryzen ai npu, it can be improved by up to about 2.98 times.

speculative decoding

the training code, data sets and other resources of the amd-135m small model have been open source and follow apache 2.0.

according to amd,its performance is basically equivalent to or slightly ahead of other open source small models, for example, hellaswag, sciq, arc-easy and other tasks exceed llama-68m and llama-160m, while hellaswag, winogrande, sciq, mmlu, arc-easy and other tasks are basically similar to gtp2-124mn and opt-125m.

news

amd releases its first ai small language model: 690 billion tokens, speculative decoding speed up 3.88 times

introduction

my contact information