Duplicate content is identified in two ways. First, it’s recognized as content that is repeated from one site to another, or multiple pages on the same site have large sections of information that say the same thing. Either way, publishing duplicate content on your website can negatively impact your Google ranking, even if it’s unintentional.
As the internet expands, search engines must prioritize how information is ranked in order to deliver the most relevant results to people searching for answers. The rapid pace that content is published, read, and indexed into categories for future queries is impressive, but it’s not perfect.
Google defines the amount of time and resources devoted to crawling a site as a crawl budget. It’s important to realize that Google doesn’t index everything on your website, even if they read it. The AI bots identify which pages to index. Google explains, “each page must be evaluated, consolidated, and assessed to determine whether it will be indexed after it has been crawled.”
There are a host of factors that impact whether website URLs are indexed or if they earn SERPs. Different link metrics also affect overall search visibility in search for the organic keywords you earn and search engine rankings and impressions.
SEO best practices will positively impact your ability to rank higher in search and black hat SEO, or choosing bad tactics will negatively impact your chances of ranking high in search, if at all. And this brings us back to duplicate content.
Table of Contents
Is There a Duplicate Content Penalty?
Google states they don’t punish websites for having duplicate content, but they also have a disclaimer saying otherwise. If your duplicate content was not the result of intentional manipulations of search results or spamming practices, then you shouldn’t be penalized for having duplicate content. If it is, you may.
Google states, “In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the site’s ranking may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”
3 Duplicate Content SEO Issues You Want to Avoid
Search engines want to provide the best user experience by showing various original content rather than multiple pages containing the same content. Let’s look at duplicate content and how it impacts SEO negatively.
Duplicate Content Impacts Link Equity
“Link equity” refers to how certain links transfer authority and value from one webpage to another.
The number of external links your page earns matters. According to Backlinko, the top result in Google has 3.8 times more backlinks than positions two to ten. If your site has duplicate content, external websites might link to a duplicate version of your preferred URL instead of your preferred URL. Duplicate content harms your link-building campaigns by reducing the opportunities for each individual link to earn external links.
Identical Content Wastes Your Crawl Budget
If numerous web pages contain duplicate content and you want one indexed, crawlers will crawl all duplicate variants, taking time away from them crawling other important pages.
Your Blog Post Won’t Index
There are two types of duplicate content: internal and external.
Internal duplicate content occurs when one site creates duplicate content through multiple URLs on the same site. External duplicates occur when two or more different websites have the same page copied. External and internal duplicates can occur as exact- or near-duplicate pages.
As I’ve already addressed, Google doesn’t index everything on your website. However, in Search Console, in the Index Report under the Coverage section, you can see which pieces of content are not indexed.
Among the reasons pages Google excludes content they list:
- Pages with redirects
- Pages with no index tags
- Duplicate pages without user-selected canonical tags
- Pages that were indexed, not submitted in the site map
As you can see, duplication issues are one of the core reasons content isn’t indexed. It is a waste of time and money to focus on SEO content creation if your pages aren’t going to appear in an organic search, so it’s vital your web pages are indexable.
Common Causes of Duplicate Content
There are many unintentional reasons your website will have duplicate content, including:
- Faceted/filtered navigation
- Tracking parameters
- Session IDs
- HTTPS vs. HTTP, and non-www vs. www
- Case-sensitive URLs
- Trailing slashes vs. non-trailing-slashes
- Print-friendly URLs
- Mobile-friendly URLs
- AMP URLs
- Tag and category pages
- Attachment image URLs
- Paginated Comments
- Search results pages
- Staging environment
How Much Duplicate Content is Acceptable?
While it’s likely unintentional, website owners create duplicate content. Moz reports that some experts estimate up to 29% of the web is actually duplicate content! While some duplicate content may be acceptable, when blog articles repeat the same information multiple times, you run the risk of keyword cannibalization.
What is Keyword Cannibalization?
Keyword cannibalization is when you have various blog posts on your site that can each rank for the same search term in Google. Cannibalization is a common duplicate content SEO issue that happens when blocks of content are repeated within the post or because you’ve already optimized another article for the same keyword.
Optimizing posts and articles for similar keywords will compete with each other for search engine visibility. Usually, Google will display only one or two results from the same site in the search results for any given query. However, if you’re an authoritative domain, you might get three.
When you have cannibalized content, your own URLs compete in search queries for first-page positions. For example, this could be the difference between one link in the 5th or 6th position and two links in the 21st and 22nd positions. Which would you prefer?
You can avoid keyword cannibalization by using a duplicate content checker and by ensuring that each type of content you publish uses SEO best practices for quality content. Using a topic-based search strategy and organizing your blog posts into topic clusters is a great way to prevent cannibalizing your content.
Do You Need Help Updating or Removing Duplicate Content?
Removing duplicate content is part of an SEO-focused content strategy. One way to fix duplicate content SEO issues is to update competing blog posts and optimize them for different keywords. Check out our Guide to Updating Old Blog Posts to learn how to update cannibalized content.
If you need more help, the content team at SMA Marketing has a comprehensive strategy for identifying duplicate content. We consider each URL independently, taking a holistic approach to update, optimize, and remove content. Give us a call!