news

nvidia launches new visual voice model nveagle, which can chat by looking at pictures

2024-09-02

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

pinwan reported on september 2 that according to official news from nvidia, nvidia, together with the research teams of georgia tech, umd and hkpu, launched a new visual language model - nveagle.

it is reported that nveagle can understand complex real-life scenes and better interpret and respond through visual input.the core of its design is to convert images into visual tags and then combine them with text embedding to improve the understanding of visual information. nveagle includes three versions: eagle-x5-7b, eagle-x5-13b and eagle-x5-13b-chat. among them, the 7b and 13b versions are mainly used for general visual language tasks, while the 13b-chat version is specially fine-tuned for conversational ai and can better interact based on visual input.

one of the highlights of nveagle is that it uses a mixture of experts (moe) mechanism, which can dynamically select the most suitable visual encoder according to different tasks, which greatly improves the ability to process complex visual information.the model has been released on hugging face for easy use by researchers and developers.