Microsoft Artificial Intelligence AI

Move over LLMs: Why Microsoft, Salesforce & others are developing ‘small language models’

Author

By Webb Wright, NY Reporter

August 7, 2024 | 11 min read

Large language models like OpenAI’s GPT-4 have been celebrated for their ability to perform a wide variety of tasks – but their training also requires a staggering amount of resources. The AI industry is starting to focus more of its attention on smaller systems designed for specific functions.

Microsoft

Microsoft is one of a group of major tech companies that's been investing in small language models. / Adobe Stock

At its core, science is the process of breaking things down into their smallest bits, and then observing their behavior. (The word itself suggests reduction: like “scissors” and “rescind,” ”science” stems from a Latin root meaning “to split.”) Physicists have spent whole careers trying to peer past the atom into the subatomic realm; biologists toil to map the fundamental building blocks of life; and now, computer scientists are trying to isolate artificial intelligence into as tiny of a package as possible.

“You have this miraculous object,” says Sébastien Bubeck, Microsoft’s vice president of generative AI, “but what exactly was needed for this miracle to happen; what are the basic ingredients that are necessary?”

Large language models (LLMs) – the algorithms underpinning popular AI chatbots like ChatGPT, Gemini and Claude – have proliferated in recent years. Now, so-called small language models (SLMs) are growing in popularity. Leading tech companies building these bite-sized AI models pitch them as less resource-hungry alternatives to LLMs that, despite their wee size, can still deliver big benefits to brands.

Powered by AI

Explore frequently asked questions

In June of last year, Microsoft unveiled Phi-1, an SLM designed to assist with Python coding. The company followed that up with the release of Phi-2 in December and its Phi-3 models this past spring, both of which are larger than their predecessors but still nowhere close to being as big as the leading LLMs. (Phi-3-medium, the largest of the latest models, has 14bn parameters, while GPT-4 is rumored to have 1.76tn parameters – roughly a 125-fold difference.) Microsoft claimed, in a blog post announcing the release of the Phi-3 models, that they are “the most capable and cost-effective small language models available.”

The company’s recent investments in SLMs stem in part from a belief that the LLM paradigm – in which a small handful of enormous models like GPT-4 monopolize the market – will gradually cede to a world of smaller models designed to specialize in a particular task, or a narrow set of tasks.

Sure, GPT-4 can compose poetry, assemble recipes from the ingredients in your fridge, help you to debug a finicky bit of code or handle a dazzling number of other tasks. But what if you're an advertising agency executive who just needs an AI model that will, for example, help analyze consumer behavior in order to deliver more precisely targeted ads? In that case, a smaller model trained specifically on consumer and ad performance data could very well be more effective and less unwieldy than one that’s been trained on the entirety of the internet.

The power of SLMs, in other words, stems from the fact that their relatively small corpus of training data is focused around a single domain. “The whole fine-tuning process … is highly specialized for specific use-cases,” says Silvio Savarese, chief scientist at Salesforce – another company that’s been developing SLMs.

Think of it this way: If you needed to install some screws for a home-repair project, for example, it would make much more sense to pay $3 for a perfectly functional screwdriver than it would to pay $100 for the multitool that’s crammed with knives, tiny scissors and so on.

The widespread embrace of SLMs is part of a broader shift around the use of AI that's been occurring throughout the private sector. As many experts have pointed out in recent months, the phase of grandiose AI hype sparked by the launch of ChatGPT in late 2022 is slowly but surely giving way to more sober, practical perspectives.

Business leaders are talking less about AI as humanity’s most significant technological achievement to date and are instead rolling up their sleeves and trying to actually implement the technology into their daily operations.

Part of this process will include the embrace of SLMs, according to Brian Yamada, chief innovation officer at WPP-owned ad agency VLM. “As we move into the operationalization phase of this AI era, small will be the new big,” he says. “Smaller, narrow models – or combinations of models – will solve specific use-cases and save time, money and compute.”

Some have been expressing a kind of claustrophobic unease with the current dominion of a handful of LLMs over the marketplace.

Earlier this year, former Twitter CEO and founder of Block Jack Dorsey spoke at a live event about the dangers of allowing a small contingent of algorithms to become the gatekeepers to the world’s mechanisms for retrieving information and solving problems. “The only answer to this is not to work harder at open sourcing algorithms, or making them more explainable … but to give people choice of what algorithm they want to use from a party that they trust … and give people a choice to have a marketplace around an algorithm,” he said.

Suggested newsletters for you

Daily Briefing

Daily

Catch up on the most important stories of the day, curated by our editorial team.

Ads of the Week

Wednesday

See the best ads of the last week - all in one place.

The Drum Insider

Once a month

Learn how to pitch to our editors and get published on The Drum.

Similarly, IPG CEO Philippe Krakowski said at a June event that he feared the widespread reliance upon the same AI models throughout the ad industry would lead to a “reversion to the mean” – a stifling of creativity.

SLMs have the added benefit of being much less expensive to build and to operate than their larger counterparts, a byproduct of the fact that they require less compute to build and operate. This has become a primary selling point for the companies building them. “The first and biggest benefit of SLMs is the cost,” says Microsoft’s Bubeck. “We’re talking several orders of magnitude cheaper.”

While there isn’t a standardized definition of SLMs, a reasonable range, Bubeck says, is about three to four billion parameters – small enough to operate from a smartphone.

Relatively low operational costs paired with potential business applications make SLMs an attractive investment. In addition to Microsoft and Salesforce, other leading tech companies like Google, Anthropic and Apple – which earlier this year unveiled its long-awaited generative AI strategy – have also recently launched their own small language models.

There’s a tradeoff, of course. Parameters in an AI model are roughly analogous to synapses in a brain: as they’re reduced, so too is the scope of the system’s capabilities. So while smaller models are going to save you money, they’ll also be more limited. “You have to find the right balance between the intelligence that you need versus the cost,” says Bubeck.

To Salesforce’s Savarese, SLMs represent a step towards a radically different, and more capable, form of AI. The growing focus on smaller, more specialized language models is, he says, slowly giving rise to “a world of agents.”

The word “agent” as it’s commonly used refers to any entity capable of acting and making decisions autonomously in the world. But in Savarese’s words – and in Salesforce’s marketing materials – it’s used specifically to describe AI systems that can not only perform a narrow task, but also generate and execute plans across time.

Many AI chatbots today have no problem, for example, creating a detailed travel itinerary for your planned fall vacation. They might produce step-by-step instructions about how to book flights, destinations you ought to visit and how much money you can expect to spend overall. An agent, on the other hand, would actually be able to take action on your behalf – booking your flight and making restaurant reservations, for example. It would be able to “interact with the real world to perform these tasks,” says Savarese.

“This is the new way we are going,” he predicts.

Last month, Salesforce unveiled a 1bn-parameter SLM that the company claims outperforms LLMs like Claude on certain narrow tasks. “On-device agentic AI is here!,” Salesforce CEO Mark Benioff declared in a tweet.

For more on the latest happenings in AI, web3 and other cutting-edge technologies, sign up for The Emerging Tech Briefing newsletter.

Microsoft Artificial Intelligence AI

More from Microsoft

View all

Trending

Industry insights

View all
Add your own content +