How to Find and Eliminate Duplicate Content On Your Website
Duplicate content is a major burden for thousands of sites online. Just a few instances of duplicate content can trigger Google to rank your site lower in search results, leaving you unable to recover until those content duplication issues are addressed. Duplicate content can also interfere with your user experience, leaving your site visitors feeling that your site is more fluff than substance. The worst part is that thousands of sites suffer from duplicate content issues without ever knowing the issues exist.
Even if you’ve taken pains to ensure that all of the content on your site is original, it’s possible that your site is being flagged for duplicate content by Google. Because of this, it’s necessary to periodically check your site for any possible duplicate content issues and address them as proactively as possible.
Types of Duplicate Content
Duplicate content comes in many forms, which vary in noticeability. Any of them can harm your search visibility, so keep an eye out for all of them when you scan your site for potential issues.
1. Straight Plagiarism
Straightforward plagiarism is what most people think of when they think of duplicate content. This occurs when a site takes content from another site, and copies and pastes it into their own. This is the most egregious type of content duplication and often occurs when “scraper sites” use automatic software scripts to automatically republish content from major media publications.
2. Search Engine Manipulation
Another type of malicious content duplication is designed to manipulate search engines. In this practice, sites take content on their own site and duplicate all or part of it in an effort to produce more scannable content without the extra effort it takes to create something original.
3. Duplicate Title Tags or Descriptions
Duplicate title tags and meta descriptions are far more common, and far more forgivable. Since they aren’t highly visible to the user, they don’t interfere with user experience, and since they are commonly duplicated by accident, they don’t harm your search visibility as much. Still, duplicated phrases in your title tags and meta descriptions can accumulate and work against you.
4. Multiple URLs for One Page
It’s also common to have one page accidentally associated with two separate URLs that aren’t properly canonized. Under this setup, Google sees two different pages with identical content, which can harm your search visibility.
Tracking Down Duplicate Content
There are a few ways to find the duplicate content on your site. Use a tool like Siteliner or Screaming Frog to automatically scan your website and report duplicate content. Or, log into Google Webmaster Tools and crawl your site manually. Under “Search Appearance,” head to “HTML Improvements,” where you’ll be able to implement and download a full list of duplicate title tags and meta descriptions from your site.
Eliminating the Problem
Fortunately, Google is forgiving when it comes to duplicate content. If you’re caught with only a handful of instances of duplicate content on your site, Google will likely understand that you aren’t necessarily trying to manipulate your rankings or deceive your visitors. You may still face a dip in your search visibility, but you won’t see your rankings completely tank, and if you correct the problems with duplicate content on your site, eventually, your search visibility will completely recover.
1. Rewrite Where Possible
This the simplest way to take care of the problem, but it’s also very time-consuming. Unfortunately, as with meta description duplicates, in many cases, this is the only way to fix the problem. Either eliminate the text entirely or rewrite it from scratch to be an original section of content.
2. Restructure Your URLs
Duplicate content errors that are the result of Google seeing one page as multiple pages can only be fixed by clearing up your URL structures. For example, Google sees thisisyoursite.com/ and thisisyoursite.com/?sessionid=111 as two different pages, even though your users only see them as one. Choose one consistent format for all your URLs, and stick with it.
Once done, you’ll probably still have a handful of duplicate URL issues. You can fix all of these through canonicalization. Essentially, you’ll be using canonical tags to instruct Google which pages it should crawl and which ones it should ignore. 301 redirects can also be helpful here.
Once your duplicate content issues are fixed, don’t be surprised if it takes a while for your search visibility to be restored. Google sometimes takes days or weeks to re-scan your site for content, so it may be two weeks or more before it notices that your duplicate content issues have been fully corrected. Even after your initial scan is complete, it’s a best practice to check your site on a monthly basis to see if any other duplicate content issues have arisen. Stay sharp and take action immediately to minimize the potential consequences.