news

openai upgrades whisper speech transcription ai model, making it 8 times faster without sacrificing quality

2024-10-03

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

it house news on october 3rd, openai announced the launch of the whisper large-v3-turbo speech transcription model at the devday event held on october 1st, with a total of 809 million parameters, with almost no loss in quality and speed. 8 times faster than large-v3.

the whisper large-v3-turbo speech transcription model is an optimized version of large-v3 and has only 4 decoder layers (decoder layers). in contrast, large-v3 has a total of 32 layers.

the whisper large-v3-turbo speech transcription model has a total of 809 million parameters, which is slightly larger than the 769 million parameter medium model, but much smaller than the 1.55 billion parameter large model.

openai says whisper large-v3-turbo is 8 times faster than the large model and requires 6gb of vram compared to 10gb for the large model.

the whisper large-v3-turbo speech transcription model is 1.6gb in size, and openai continues to provide whisper (including code and model weights) under the mit license.

it house cited awni hannun test results, which showed that on the m2 ultra, 12 minutes of content was transcribed into 14 seconds.