Quantcast
Channel: Fast Company
Viewing all articles
Browse latest Browse all 4679

Meta’s Llama 3.1 is open-source, kind of. Here’s how it could reshape the AI race

$
0
0

Meta today released a trio of new open-source large language models called Llama 3.1, the largest of which may lead to new chatbots that rival ChatGPT. In fact, Meta CEO Mark Zuckerberg believes the company’s Llama-powered AI assistant will be more widely used than ChatGPT by the end of this year. 

Llama 3.1 is actually a small family of models–Llama 3.1 405B, 70B, and 8B. (The numbers connote the number of parameters—that is, the neuron-like connection points where calculations are made, and weights are applied—used by the models.) The 405B model was trained on a massive amount of data—15 trillion tokens, which represent words or word parts. The tokens represent web data dating to 2024 (earlier models have been limited in their recency by cut-off dates, sometimes years in the past).

The 405B model was trained using 16,000 of Nvidia’s H100 graphics processing units. State of the art “frontier” models are trained by processing large amounts of web-scraped, licensed, or synthetically generated text and image data. The new models also have the ability to reach out to other models (via APIs) to tools and knowledge sources such as up-to-date information, math expertise, and coding. 

Developers can download the new Llama models from Meta or from Hugging Face, or access them via major cloud services like AWS, Azure, and Databricks.

Meta calls the 405B version “the world’s largest and most capable openly available foundation model.” The company says the model beats OpenAI’s GPT-4 and GPT-4o, along with Anthropic’s Claude 3.5 Sonnet on commonly used benchmark tests, and “is competitive with” those other models across a range of tasks. Meta believes developers will use its new Llama models to create more agentic chatbots, tools with greater reasoning capabilities, and better computer-coding agents. 

The company also uses as examples of Llama 3.1 405B’s power the capacity for “synthetic data generation” and “model distillation.” The former means the ability of one large model to create training data for a smaller model. The latter means the ability of a large model (a “teacher”) to transfer elements of its intelligence to a smaller (“student”) model. Meta says it altered its commercial license agreement to allow for these uses. This could have important implications for how models work together, and economic implications for the return on investment of smaller models. 

But the model will also power some consumer-use cases. It now powers Meta’s AI assistant at Meta.ai (for U.S. users, anyway) and within WhatsApp. 

The new models are text-based, not multimodal. But Zuckerberg says in a new video posted on Instagram that his company is working on next-gen models to power multimodal features such as an “Imagine” feature that creates images based on a photo of a person and a prompt (for example, “Imagine me playing soccer”). Zuckerberg says his company is also working on technology that will allow users to create their own AI apps and share them across the company’s social platforms. 

Over the past few years, as the AI race has heated up and attracted billions in investment dollars, companies have grown more and more secretive about how their models are built and how they work. 

Meta says it’s making the model weights publicly available through Hugging Face and a group of technology partners (including Nvidia), along with some new safety tools designed to make sure people don’t prompt the model to do harmful things. 

Open-source advocates believe that AI can advance faster and better maintain safety if AI companies develop the burgeoning technology out in the open. Meta has long touted its commitment to open source, but many developers have noted that the company is open about some aspects of its models. 

“Meta is continuing the industry standard of open-washing in AI,” says Nathan Lambert, a machine learning expert who works at the Allen Institute for AI. Lambert says Zuckerberg and Meta’s definition of open-source differs in spirit from the major proposed definitions currently being debated by institutional working groups (which Meta participates in). 

Meta’s definition of “open” seems to permit a lack of information on the data used to train the models. The parameter weights (generated during the model’s pretraining) released with a model are important, but the substance and curation of the training data plays an equal role in the performance of the model, AI researchers have come to believe. “Meta’s release documents detail the data being ‘publicly available’ with no definition or documentation,” Lambert says. 

Scale AI CEO Alexandr Wang says his company, which produces and sculpts synthetic training data, provided a large amount of data used in the fine-tuning and reinforcement learning from human feedback (RLHF) of the new Llama models. 

Others say it’s the terms of Meta’s commercial usage license that fall short. “Meta isn’t open-washing (per se) but Meta’s custom license and limits on usage does violate the ethos of open source,” Gartner analyst Arun Chandrasekaran tells Fast Company in an email. 

Despite this, Chandrasekaran believes Llama 3.1 will have real impact for both businesses and consumers. “[T]his will be a very useful model to a large set of enterprise clients,” he says, “and we can also expect Meta to push AI features more aggressively in its consumer products.”

The big picture is that Meta is, first and foremost, a very rich social media company that makes its money selling ads within social feeds. It’s assembled an impressive organization of highly paid AI researchers who can develop models that help with important parts of Meta’s business, such as content moderation. But it’s also in a position to seed the growing AI ecosystem with its free models and tools, which could benefit both Meta’s influence, and its bottom line, in the future.



Viewing all articles
Browse latest Browse all 4679

Trending Articles