10 differences between small language models (SLM) and large language models (LLMs) for enterprise AI

Kane Simms
June 14, 2024
in Article

10 differences between small language models (SLM) and large language models (LLMs) for enterprise AI https://vux.world/wp-content/uploads/LLMs-vs-SLMs-hero-image.jpg 1920 1080 Kane Simms Kane Simms https://secure.gravatar.com/avatar/26839585565b6484d0560f5e365378f0?s=96&d=blank&r=g June 14, 2024 June 14, 2024

With all this talk about large language models, you’d be forgiven for thinking that they’re going to solve the world’s problems. All we need is more data and more computing power!

However, when it comes to enterprise AI, bigger isn’t always better. It’s true that large language models have some brilliant capabilities, but do you *really* need a large language model for your use cases? Perhaps a small language model will do.

What is a small language model (SLM)?

A small language model is an AI model, similar to a large language model, only with less training data and less parameters. They fundamentally do the same thing as a large language model; understand and generate language, but are smaller and less complex.

How big is a small language model?

Small language models come in a variety of shaped and sizes and the definition of when a model becomes a large language model differ depending on who you ask. Typically, though, anything below 30 billion parameters is considered a small language model. However, SMLs can be as small as a few hundred million parameters.

What’s the difference between small language model (SLM) and a large language model (LLM)?

There are 10 primary differences between the two that will help you understand which type of model you might consider for a given use case:

Size. This is obvious. As mentioned above, LLMs are a lot larger than SLMs. Some of the more recent LLMs such as Claude 3 and Olympus, have 2 trillion parameters! Compare that with Phi-2 at 2.7 billion.
Training data. LLMs require extensive, varied data sets for broad learning requirements. SLMs use more specialist and focused, smaller data sets.
Training time. To train an LLM, it can take months. SLMs can be trained in weeks.
Computing power and resources. Because of the large data sets and parameter sizes, LLMs consume a LOT of computing resource to train and run the models. SLMs use far less (still a lot, but less), making them a more sustainable option.
Proficiency. LLMs are typically more proficient at handling complex, sophisticated and general tasks. SLMs are best for more adequate, simpler tasks.
Adaptation. LLMs are harder to adapt to customised tasks and require heavy lifting for things like fine tuning. SLMs are much easier to fine tune and customise for specific needs.
Inference. LLMs require specialised hardware, like GPUs, and cloud services to conduct inference. This means they have to be used over the internet. SLMs are so small, they can be ran locally on a raspberry pi or a phone, meaning they can run without an internet connection.
Latency. If anyone’s tried building a voice assistant with an LLM, then you’ll know that latency is a huge issue. Depending on the task, you’re waiting seconds for LLMs to respond. SLMs, because of their size, are typically much quicker.
Cost. Inevitably, if you’re having to consume a lot of computing resource for inference, and your model size is bigger, it means that the token cost for LLMs is high. For SLMs, it’s a lot lower, meaning they’re cheaper to run.
Control. With LLMs, you’re in the hands of the model builders. If the model changes, you’ll have drift or worse, catastrophic forgetting. With SLMs, anyone can literally run them on your own servers, tune them, then freeze them in time, so that they never change.

How to decide what sized model to use

To decide what mode to to use, first start experimenting with large language models. This is to validate that the task you’re trying to accomplish can, in fact, be done. If it can be done at all, an LLM should be able to do it.

Once you’ve proven that the task is doable, you can then start working down in model sizes to figure out whether the same task can be done using a smaller model. When you reach a model size where your results start to change, get less accurate or slightly more unpredictable, you’ve reached your potential model size.

That doesn’t necessarily mean you should go back up in model size. It may mean that the model size you’ve reached requires some further tuning or training.

Prompt tuning

This tuning can be done firstly with prompt tuning. This is specifically to provide the model with some in-context learning i.e. data that it can use to accomplish the task, delivered to it in the prompt.

Retrieval augmented generation

Second, consider retrieval augmented generation (RAG) or *indexing*. This is to provide the model with external data that is can use at runtime to pull into its responses. For some use cases, especially those that are search-based of some kind, you’ll find this may give you the results you’re looking for and is easier than fine tuning as it doesn’t require tampering with model weights or access to the raw model.

Fine tuning

Lastly, if the first two options haven’t solved your problem, then fine tuning is the final consideration. This is where you train the model for a specific task based on data related to that specific task. There are a number of different types of fine tuning methods you can use, ranging from fine tuning the output using embeddings all the way through to fine tuning the parameters of the models themselves. To do this, you need access to the raw model, rather than an API, and so you’re typically heading into small language model territory here. Find out more about fine tuning.

Relevance for enterprise teams

Think about the tasks that you have for AI within your organisation. It’s probably something along the lines of:

Intent classification
Knowledge retrieval
Content summarisation
Sentiment analysis
Conversation management
Contextual response generation
Translation

And things like that. Of course, there are many more use cases for AI, but these are among the most common.

Now consider whether you need the intense power of a large language model for these tasks.

Classification? Really? You need all the internet’s information to be able to recognise that ‘my credit card was stolen’ means ‘stolen card’?

The *vast* majority of enterprise AI requirements are specific to that enterprise. You more than likely don’t need the most powerful AI tools on the planet to do what you want (and you certainly don’t need the cost).

Try working backwards from a large language model and see whether a small language model is more fit for your purpose.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
resolution	session	This is a functionality cookie used to collect the horizontal value of the visitor screen resolution. It helps in optimizing the website view to the user.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111445333_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
CONSENT	16 years 2 months 25 days 18 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.
__smVID	1 month	This cookie is set by Sumo. The purpose of the cookie is not yet known.
_mailmunch_visitor_id	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
AnalyticsSyncHistory	1 month	No description
attribution_user_id	1 year	This cookie is set by the provider Typeform. This cookie is used for Typeform usage statistics. It is used in context with the website's pop-up questionnaires and messengering.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
debug	never	No description available.
intercom-id-or0x2acp	8 months 26 days 1 hour	No description
intercom-session-or0x2acp	7 days	No description
li_gc	2 years	No description
li_sugr	3 months	No description available.
mailmunch_second_pageview	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.