Thin content was one of the
first SEO issues Google targeted with its Panda algorithm update in 2011. That
update rocked the entire industry and kick-started the search giant’s war
against low-quality content.
It also made life increasingly difficult for black hat SEOs trying to game the SERPs. However, there are plenty of genuine, technical SEO reasons why you might end up with thin content on your website. In this article, we explain exactly what thin content is, how to find it on your site and what you need to do about it.
Google describes thin content as
having “little or no added value”. This is the description you’ll see if you’re
unlucky enough to get a manual action warning in Google
Search Console, informing you that you’ve been penalised for having thin
content on your site.
You definitely don’t want one of
The question at this point is:
what kind of content does Google consider to have “little or no added value”?
Back in the early Panda days,
Google was mostly targeting deceptive uses of thin content – for example:
In this case, we are looking at low-quality content, often created by basic machine concatenation, and offering limited, if any, value. For example, grabbing a news story in Spanish and then running it through Google Translate before adding it to your site – a big no-no.
We are starting to see examples
(or ‘robots’) writing high value content and this is something that will
become more prevalent as AI and machine learning continue to improve. This does
not fall into thin content but you would still want a human editor to
review this type of content before publishing it.
Affiliate websites offering
useful, comprehensive purchase advice have nothing to fear from Google. However,
pages filled with affiliate links that offer no useful or relevant information
for the end user are prime targets for getting hit by a search penalty.
If you’re in the affiliate game,
stick to the following guidelines:
If you systematically add content to your website from external sources, you’re also at risk of a thin content penalty. There are a number of ways in which content is copied (or scraped) from other sources, a few of the more common ones being:
Doorway pages are a means to spam the search engine results pages (SERPs) with very thin content that target a very specific term or close group of terms with the purpose of sending this traffic to another website or destination.
This creates a poor user search experience and adds unwanted steps for the user to get to their desired end result. Often, doorway pages mean that the user ends up on a lower quality and less relevant search result page than required, resulting in excessive searching to discover the content they needed.
Essentially, if your content is
copied from anywhere else, generated by software or you’re creating pages with
little or no content, you could be in trouble. Even if you’re not trying to be
deceptive (for example, reposting relevant news stories), you have to question
why Google would choose to rank your page when it’s simply repeating content
that’s already available – it has nothing new or valuable to offer.
As Google explains over at Search
“One of the most important steps in improving your site’s ranking in
Google search results is to ensure that it contains plenty of rich information
that includes relevant keywords, used appropriately, that indicate the subject
matter of your content.
“However, some webmasters attempt to improve their pages’ ranking and
attract visitors by creating pages with many words but little or no authentic
content. Google will take action against domains that try to rank more highly
by just showing scraped or other cookie-cutter pages that don’t add substantial
value to users.”
It all comes down to adding
substantial value to the end user because this is what Google aims to deliver
as a search engine.
For more info on thin content,
take a look at this video from Google’s former head of web spam, Matt Cutts:
It’s not a particularly recent
video but everything Matt Cutts says is still relevant today.
While the most publicised danger of thin content is getting hit by a Google search penalty, your problems run much deeper than this if you’ve got too much of it. If Google’s algorithms can tell you’re using thin content deceptively, then you can bet the majority of users who visit your site can see it as soon as they land on the page.
Whatever your objectives are with the page, you’re not going to convince many people to take action this way. You’ll struggle to keep users on the page, encourage them to engage with your brand or inspire them to convert.
Essentially, this is the real
danger of thin content: your marketing objectives are going to fall flat.
Now, in terms of the Google
Search penalties, these can be pretty devastating and it helps to understand
how Google’s Panda algorithm works.
The Google Panda update was
first released in 2011 with the purpose of de-valuing low-value and thin
websites, to stop them from appearing so prominently in SERPs.
The other, lesser communicated,
side of this update was the additional ranking gains (tied to content quality
signals) rewarding websites creating high-quality content.
Google Panda updates can impact
(remember, this ‘impact’ can be positive or negative) a single page, a whole
topic or theme, multiple themes, or entire websites.
The Panda filter applies a
number of perceived content quality criteria as well as questions that the
Google Quality Raters would be asking themselves when manually viewing content
– things like:
Related reading: The SEO’s guide to Google quality raters
The above is just the starting
point for Panda protecting your website and content.
It is important to get a second
opinion on your content. Be objective and honest with yourself and your team
about the quality of what is being produced, and how it needs to improve.
While the penalties for having
too much thin content can be severe, there are quite a lot of scenarios where
you’re naturally going to end up with content that could fall into this
If you have a search function on
your website, the results pages are going to offer very little or no original
content. This can’t be helped, of course. The purpose of a search results page
is to show snippets of other pages across your site and help users choose the
most relevant option.
Solution: Prevent Google from indexing results pages by adding a disallow line for these pages in robot.txt file.
In many cases, it’s perfectly
reasonable to have a photo or video gallery on your website. You might be a
wedding photographer, a marquee hire company or a business with a bunch of
video case studies to show off.
If the purpose of this page is
to allow visitors to browse your photos or videos and choose which ones they
want to view, this causes some thin content issues. You probably don’t want a
load of text getting in the way on the gallery page itself and your problems
get worse if each image or video has its own dedicated page.
Solution: This really depends on how you structure your gallery. You might choose to create content for your gallery page and no-index the individual image/video pages, for example. Or you might take the opposite approach and create unique content for each image/video and no-index the gallery page.
Alternatively, you could create a carousel that displays all images/videos on the same URL – it all depends on what you want to rank for and the kind of content you’re planning to create.
Shopping cart pages aren’t there
to provide users with valuable content; they’re designed to help people manage
orders and complete purchases. Technically, we’re in thin content territory
here but the fix is pretty simple.
Solution: Once again, stop Google from indexing these pages by no-indexing them in your robot.txt file.
Duplicate pages are a natural part of managing a website. Moving over to HTTPS from HTTP creates duplicates, as does having www and non-www domains while managing multilingual websites and recreating pages for multiple locations can also result in duplicates.
Technically, duplicate content
isn’t quite the same thing as thin content but the two do overlap in certain
Solution: Mark the page version you want to rank with canonical tags, use 301 redirects if you’re sending users to a different URL and use hreflang tags for international languages/locations.
In many cases, thin content
isn’t detrimental to the user experience at all. In fact, it’s sometimes better
to forget about content and simply deliver the functionality users need – eg:
Luckily, keeping these pages
safe from search penalties is relatively simple. By no-indexing pages, telling
Google which version to index (canonical tags) and/or using 301 redirects to
send users to the right place, non-deceptive thin content shouldn’t be a
This is one of the most common scenarios where thin and/or duplicate content occurs on a website. This is especially true if you’re selling multiple versions of the same or very similar product.
Naturally, brands try to avoid having duplicate content across these pages but it’s difficult to say the same thing in a hundred different ways.
It becomes a battle of thin
content vs duplicate content and this causes a lot of confusion for website
owners, SEOs and marketers in general.
The truth is, duplicate content
is the lesser of two evils here and it’s better to provide users with
comprehensive product details – even if they’re the same or similar – than
publishing pages with very little (albeit unique) content.
Here’s What Google’s Andrey Lipattsev had to say about duplicate product pages during a Q&A on duplicate content with fellow Googler John Mueller.
“And even, that shouldn’t be the first thing people think about. It shouldn’t be the thing people think about at all. You should think, I have plenty of competition in my space, what am I going to do? And changing a couple of words is not going to be your defining criteria to go on. You know, the thing that makes or breaks a business.”
More to the point, there is no
search penalty for duplicate content but there is for thin content.
So, when it comes to product
pages, don’t worry too much about duplicate content for very similar products
or variations of the same product. Instead, focus on optimising for the best
experience and giving Google any clues you can about which page to prioritise
in terms of indexing.
Here are some tips:
The key takeaway from the
Q&A on duplicate content is that when pages are similar (or the same),
Google is looking for a way to differentiate between them and product
descriptions are just one of the hundreds of factors it looks at.
There are a number of ways to discover thin content (levels of words, duplication, and value) and a few of the more common actions can be seen below.
Using Copyscape (and other free tools), you can
crawl the web to look for any content that has been copied from your domain, as
well as any content that may have been added to your own site over the years
copied (in part or full) from external sites.
You can also use Google search
operators to manually check Google for instances of content copying/scraping or
Here’s an example of what you
need to do:
Here’s an example of the above in action. In this case checking any duplication of content from a post I created for Search Engine Journal:
As you can see, the first site
appearing is the originator website, and as this content is opinion-driven, it
is intended to be distributed, shared socially and used on other websites.
An important aspect of this is
the purpose of the content, whether it’s to drive traffic back to the main
website, encourage shares or something else.
I’ve been using our machine learning software Apollo Insights for nearly ten years. One of the ways in which I use the data is to locate pages that are not contributing towards total site success.
You can see this in action below
(the ‘Page Activity’ widget):
Another metric I use Apollo
Insights for is locating content with a limited word count.
Although more words doesn’t
always mean better quality content, in most cases a page with very few words is
unlikely to be providing the depth of user and search value needed to deliver
an optimum search experience.
You can see this below using a
deep data grid – in this case I am looking at depth of content based on
expected content structural elements, things like presence of multiple levels
of header tags, and checking that the page is active and real:
Remaining with Apollo, ‘Auditor’ tells me how many pages have fewer words on them than I would expect from a high-quality website page. I can also look at the bigger picture and combine this knowledge with items like: external linking, framed content, pages orphaned off from the main website and much more.
The first stage in fixing thin content is understanding what high-quality and value-enhancing content looks like in the first place. The example below is from Think With Google: ‘The Customer Journey to Online Purchase‘.
Some of the key points which
flag this as high quality for me include:
Using external comparisons is a great way to put in place the lowest benchmark for your own content quality. The goal is to create content on your website that is far better than any other examples available online.
Once you identify what ‘good’ looks like in your niche, you want to move towards creating ‘great’ content. At this stage, you need to find
the content that doesn’t work at present (see previous section on ‘finding thin
content’) and boost the content so that it can contribute more towards total
site success, as well as its own standalone value.
“You will also need to find new opportunities for effective content creation. Don’t limit your content value by re-purposing alone, there is always an opportunity to create something amazing with digital content.”
Other tactics for creating new
quality content include:
If you would like to chat about
thin content, how we can help identify and fix it, or simply want to make your
existing content work harder for you, then contact us at our London or Portsmouth
Lee is Head of Services at Vertical Leap.
We send a semi-regular newsletter on business and other related topics, with links to the latest stories from us and what we’re reading around the web.