Latent Semantic Indexing: Why It Won’t Benefit Your SEO11 min read

Lachlan Perry

Lachlan Perry

From a digital marketing and web hosting background, Lachlan is keen to provide all of his insight and knowledge learnt over the years working in the industry to those who want to see their business succeed. On weekends, you can find him enjoying good music and even better food.
Share on facebook
Share on twitter
Share on linkedin

If you love SEO or are wanting to learn more about how to create high-quality, user-focused content that ranks, you’ve more than likely come across the term Latent Semantic Indexing (LSI). 

It is a term floated by influencers and SEO “gurus” who suggest that the application of “LSI keywords” will greatly influence the results of your organic SEO campaigns and provide higher rankings. 

Latent Semantic Indexing has a long-standing history of being misunderstood and in this article, we’ll try to dispel all the myths and help fight the disinformation surrounding LSI and why it’s nothing more than a fancy term used to sell SEO courses and eBooks on “how to write better content for better rankings”.

Before we begin to understand whether or not LSI has any effect on your SEO, it’s important to know what the science around it is and how the term came to fruition. 

What Is Latent Semantic Indexing?

We’re glad you asked.

Latent semantic indexing, commonly referred to as latent semantic analysis, is a mathematical practice that aims to find the hidden relationship between words and concepts in order to enhance the accuracy of said information, through a technique called singular value decomposition (SVD). 

By using SVD, search engines are able to scan through unstructured data and identify relationships or common traits amongst the concepts and context contained within said data.

Before the concept of SVD, computers and search engines weren’t able to determine the relationship between certain sets of words and may have treated them individually without knowing if they were contextually related.

Its history began in the late 80s, when Susan Dumais, a famous Microsoft research and search engineer, along with fellow inventors such as George Furnas and Scott Deerwester, created the famous term and sought out to simplify the information retrieval process that was otherwise “tedious” and still required “users or providers of information to specify explicit relationships and links between data objects or text objects”.

The original patent was released in 1992, but Google has been making use of natural-language processing for almost a decade now, reflected in many of its algorithm updates.

What LSI Sought Out To Understand & Solve

If you’re not yet following, I’ll simplify it even further for you. 

Some words may have multiple, different meanings (polysemic) and there are words that mean the same thing as each other (synonyms). 

Before the concept of LSI, there was a fundamental flaw with technology-based systems being able to retrieve a user’s request given that some words have different meanings, and that the individual words themselves would not always be able to reliably provide the best information to the user.

It is all about understanding the relationship and concept between words – think of the term “hotdog”. The individual words “hot” and “dog” are both vastly different, but when put together, create an entirely new concept altogether. 

What Are LSI Keywords?

Before we begin this section, it’s important to make a very clear distinction between these two concepts.

Latent Semantic Indexing (LSI) is a real concept coined by Microsoft researchers – LSI keywords, are not. 

There is no evidence to suggest that Google is using LSI as part of their ranking algorithm or to try and identify pages of “higher value” – despite what this featured snippet says. 

A featured snippet from Google about "when Google started using LSI"

The year is now 2021 and by all means, LSI is old, patented technology and may bear no impact over how Google retrieves synonyms, polysemous and semantically related words. 

That’s not to say that Google isn’t likely looking for things like this, as they most likely are given how accurate their search engine is – but suggesting it is based off a 30-year-old patent without evidence to suggest so is a little hard to believe.

LSI keywords sound like you’re applying the scientific approach of a patent that sought to innovate the information retrieval, understanding and evaluation but in reality – you’re simply just using synonyms and related phrases to your butter up your keywords, which isn’t necessarily a scientific application in nature.

Is there any harm in adding synonyms and related keywords to your content? Absolutely not. 

But the general consensus of false information that is spread suggesting that the use of this technique will improve your search rankings, is no more than a farce pushed by SEO gurus, LSI “activists” and content marketers to sell you their expensive courses. 

We use synonyms and related words to help enrich content, make it exciting and engaging to our readers because we tend to write long-form, evergreen content. 

There are plenty of guides out there pushing for the “use of LSI keywords”. In fact, if you conduct a Google search for “latent semantic indexing”, you’ll see that almost all of the top results include a section on LSI keywords and how to use them for your website copy to achieve better SEO results. 

The reality is that LSI keywords aren’t a real thing. 

Don’t just take it from me, take it from the mouth of Google’s John Mueller too.

Latent Semantic Indexing & Its Affect On Your SEO 

To make a long story short, there is no evidence that Google is using LSI to better understand the context of a page, or is using LSI to determine where you should rank regarding a keyword because of the synonyms and related words used. 

There has long been debate about the importance of LSI and its effect on how well your SEO campaigns perform, but Google themselves have never released a patent nor sub-patent on the use of LSI in their ranking algorithm. 

There are LSI patents that appear on Google Patents but none of them seems to indicate that latent semantic indexing plays a vital role in how Google looks for synonyms and related words. They do talk about semantics and the use of phrase concurrency, but as far as LSI goes, there isn’t much to go off.

For those looking to overhaul their content strategy, you might’ve looked at LSI keyword generators or tools to help you find semantically related keywords. 

As we mentioned before, there is no science behind the application of using “LSI keywords”, given there is no understanding of how Google uses synonyms and related keywords since there is no definitive proof they are using LSI as part of their algorithm. 

The problem with these tools is that it doesn’t provide any information about how it’s generating these keyword ideas or what technology is being used to determine “yes, these are LSI keywords”. 

It has been thought these tools are operating on phrase-based indexing, rather than latent semantic indexing and this is generally where the confusion may lie. 

This might be because this concept may rely on phrases that are “valid” or “good”, and could take into consideration how frequently these phrases were used and if they might be related to each other.

When we tend to think of keywords we’re ranking for, spinning off as many synonyms and related words we can think of gives us the impression that we’re ticking off all boxes and this might be what these LSI tools are operating on.

The phrase-based indexing patent invented by Anna Lynn Patterson identifies phrases in documents on the internet and indexes them according to those particular phrases.

When a user submits a search query to the search engine, it attempts to provide the most relevant from its repository of information, whilst looking for phrases in the user’s search term. 

The information retrieval system will then rank the results it aims to show the user, using phrases to influence the ranking order (or which is the most relevant). 

It’s a very interesting patent and given there are several patents assigned to Google, it suggests this may be something that Google is fully utilising.

There is no harm adding related words to your content but pushing these concepts without evidence-based findings might not be the best time-consuming effort for your overall SEO strategy.

Forget LSI – Here’s What You Should Do Instead

The advice of using LSI keywords is to “help Google better understand the context of your pages”, but the reality is there are much more efficient ways to do this. 

Even in 2021, Google’s machine learning isn’t perfect. It still needs a little assistance understanding what is on your page and the most ideal way to help them is through structured data.

Focus On Structured Data

Structured data, or commonly known as schema markup, is the use of organised information that crawlers for search engines such as Google and Bing digest to better understand the context of your web pages. 

There are plenty of ways you can create organised data for your web pages and the list of different schema types are available on their website.

For eCommerce websites, you might want to consider using review rating markup to provide reviews on products that you might sell, as well as listing their price and stock availability.

When you implement schema to your web pages, you can sometimes trigger rich results. Rich results enhance the way your website looks amongst the search results, making it more enticing for users to click in comparison to your competitors. 

Google will show rich results on websites at its own discretion, so don’t be disheartened if they don’t show up for your website – the important thing is that you’re aiding their crawlers to understand what your web pages are about. 

An image of a rich result for nike.com in Google search results.

Take Note Of The Concept Of Co-occurrence

As discussed earlier with the phrase-based indexing patent, the concept of word occurrence is becoming increasingly important as search engines try to understand how certain words relate to each other. 

If Google does indeed use phrase-based indexing, then the significance of these related words and how frequently they are mentioned with one another can help Google better understand the context of your pages. 

This is more than just using synonyms – this is about using phrases related to your topic. If you have a blog post that talks about “Australian politics”, you would expect words like “parliament house” and” prime minister” to be included in the document.

As such, the appearance of these phrases can also help predict the semblance of other phrases.

Google has also moved beyond “keyword-stuffing” over a decade ago, so make sure that you’re being quality-friendly with the number of related phrases a user might be searching for as you don’t want to dilute the quality of your content with unnecessary or unmeaningful phrases over and over.

Pay Attention To Technical SEO

Core Web Vitals or the “Page Experience Update” is set to drop in mid-2021 so now is the time to understand how to optimise your website properly. 

There is no telling what the weight of this ranking signal will be, but it is debated that it will be lightweight in nature, just like the SSL update. 

John Mueller also had some future advice for SEOs at SMX Virtual, stating that websites that are “technically better” will have a small advantage but overall, content is still king. 

We don’t necessarily know what 2021 will hold for SEOs but good advice is to keep on top of your technical SEO. This includes looking for things such as:

  • Ensuring your canonical URLs are set properly 
  • Setting noindex to low-quality or thin content pages that serve no purpose to the user
  • Mistakenly using parameters or session IDs 
  • Double-checking there are no crawling or indexing problems with your sitemap and robots.txt file
  • Triple checking that Googlebot can render your pages on mobile as Google move all websites to a mobile-first index.

The Wrap Up

LSI is technology that was used in the 80s and may not hold any real significance over how Google search for semantically related keywords.

There isn’t any harm in using synonyms throughout your content for the purpose of enrichment, but there is no evidence to suggest it is part of Google’s algorithm and there’s certainly no suggestion it will help improve your SEO rankings. 
Dispelling all of the misinformation regarding LSI is extremely ideal if we want to create trust amongst our readers and digital marketers.

The continued spread of false information will only continue to erode the valuable information that users will receive, and that’s not what our jobs as SEOs are about.

Scroll to Top

Get a FREE Quote

And double your traffic and sales.