Brands Social

Meta Releases New AI Models for Text, Image, and Music Generation

Meta’s FAIR team releases Chameleon model for processing text and images together.
New multi-token prediction method improves LLM efficiency.
JASCO model allows controlled AI music generation with text and other inputs.
AudioSeal detects AI-generated speech faster and more accurately.
Meta introduces tools to enhance diversity in text-to-image generation.

Meta’s Fundamental AI Research (FAIR) team has spent over a decade conducting open AI research. As technology rapidly innovates, collaboration within the global AI community becomes evermore vital to Meta.

Today, Meta is pleased to present some of their latest FAIR research models with the global community. Through sharing this work openly, they aim to spark iterations on this research while responsibly furthering AI technology advancement.

Meta’s Chameleon Model Can Process and Generate Both Text and Images

Meta are pleased to release key components of the Chameleon models under an academic research license, providing key pieces that allow this mixed-modal family of models to comprehend images as well as text for understanding purposes. Chameleon can process both words and images simultaneously just like humans can; similarly it delivers both image and text at once. Although most large language models produce unimodal results (converting text to images for instance), Chameleon can take any combination of text and images as input and produce any combination as output – opening up endless opportunities ranging from creating creative captions for images or using both prompts and images together to form entirely new scenes!

Multi-Token Prediction Aids AI Models to Accurately Predict Words

Trained on large volumes of text, large language models (LLMs) have already proved valuable tools in aiding people generate creative text, brainstorm ideas and answer questions more quickly and accurately than before. LLMs focus on one training objective – anticipating what the next word might be – making the approach simple but inefficient: children typically require significantly fewer texts before reaching language fluency themselves.

Meta recently unveiled an innovative method to develop superior and faster LLMs: multi-token prediction. Utilizing this strategy, Meta train language models so as to predict multiple future words simultaneously instead of performing one prediction per token as was done previously. Furthermore, in accordance with responsible open science principles and as part of responsible open science initiatives such as Open Knowledge Exchange Initiative and OSF Open Science Platform, these pretrained models for code completion under noncommercial, research license are made freely available for code completion use by anyone worldwide.

JASCO Offers More Control Over AI Music Generation

Generative AI has enabled people to unleash their creativity in exciting new ways, like turning text prompts into musical arrangements. Although existing text-to-music models such as MusicGen only accept text input for music production, our new model, JASCO is capable of accepting other inputs like chords or beats so as to increase control of its generated music outputs.

This allows the inclusion of both symbols and audio in one text-to-music generation model.

Results indicate that JASCO stands up well against evaluated baselines when it comes to generation quality while offering more versatile controls over its output music.

AudioSeal Helps Spot AI-Generated Speech

At Meta, they have also introduced AudioSeal as the first audio watermarking technique designed specifically to detect AI-generated speech locally within audio snippets. AudioSeal makes it possible to isolate individual AI segments within longer audio snippets for detection using AudioSeal technology.

AudioSeal stands apart from conventional methods by employing its localized detection approach for faster and more effective detection, outstripping traditional methods by up to 485 times in speed compared to prior methods and making it suitable for large-scale and real-time applications.

AudioSeal will be made available under a commercial license and represents one line of research conducted by Meta to prevent misuse of generative AI tools.

Enhancing Diversity in Text-To-Image Generation Systems

It is vital that text-to-image models serve all groups equally and accurately reflect our globalized society, which means creating automatic indicators that measure potential geographical disparities within these text-to-image models. Meta has developed such indicators.

Meta conducted an annotative research project to better understand how perceptions of geographic representation vary among people from various regions, collecting over 65,000 annotations with 20+ survey responses per example regarding appeal, similarity, consistency and shared recommendations to improve automatic and human evaluation of text-to-image models for better diversity and better representation in AI generated images. This allowed more diversity to be represented more accurately within AI generated images.

Today, Meta is unveiling our geographic disparities evaluation code and annotations, hoping they’ll assist the community in improving diversity within generative models.

Up Next

Bring New Photo Sharing Experiences: TikTok Introduces Whee

Don't Miss

Snapchat and IAS Collaborate for Enhanced Brand Safety

Click to comment

Global Brands Magazine

Meta Releases New AI Models for Text, Image, and Music Generation

Brands Social

Meta Releases New AI Models for Text, Image, and Music Generation

Meta’s Chameleon Model Can Process and Generate Both Text and Images

Multi-Token Prediction Aids AI Models to Accurately Predict Words

JASCO Offers More Control Over AI Music Generation

AudioSeal Helps Spot AI-Generated Speech

Enhancing Diversity in Text-To-Image Generation Systems

Leave a Reply

Text Translator

Awards Ceremony

Click on the Image to view the Magazine