First of all, duplicate content is not bad per se. However, it will diminish your SEO efforts after you create hundreds of pages with similar content.
In other words, Google doesn’t have a duplicate content penalty, but it has to determine which page is the “original” one and thus lower the value or completely ignore the existence of the duplicated page.
It lowers the rank position in the SERPs.
Search engines are powerful machines, and only the most relevant content appears in the first few pages of the SERPs. If the content on two pages on your website is similar or the same, Google will successfully decipher which has the greater importance. Sadly, this can lead to lower rankings for both pages.
It wastes crawling bandwidth.
Googlebot crawls your site for information to determine which page is the authentic one. When there’s more than one page with similar content, the chances are Googlebot will crawl the less relevant pages instead of the one with the higher priority.
It can ruin the backlink value.
Let’s presume you have a “News” section on your website, and the same news article has two different, but valid URLs. Now, since anyone can choose the backlink they want, they won’t be redistributed evenly.
Most important, the one with higher authority may get only a few shares, while the one with a lesser authority URL may be shared numerous times. Because backlinks aren’t the only ranking factor, the rankings will probably go down the SERPs. If you’re lucky, they’ll stay in the same spot they were before.
Identifying Duplicate Content Issues
For the most part, duplicate content issues occur without any human factor being involved. Nevertheless, a developer’s interference is needed once the duplicates prevent other pages to rank. Following are the most common reasons you have duplicate content on the website.
The site always uses some kind of URL parameters that can either help you or create more confusion. To Google, uppercase and lowercase URLs are counted as two separate pages.
This creates multiple versions of the same URL, so consider changing them to have only the lowercase URLs because it’s the standard practice.
Another URL issue can be the session ID parameter. For example, e-commerce owners keep track of the visitors with the help of sessions. A session is a short history that tells them what the user did on the page. It has its special identifier called the session ID.
Some websites fail to remove the session ID from the URL. Once there’s a new visitor on the site, one new ID is added to the existing URL name. We’re then left with multiple URLs for the same pages. We can easily fix this by saving the session information into cookies.
The last common URL parameter has to do with transient categories. Most online stores’ URLs change within the page categories. For example, a visitor wants to buy something from a home improvement store. He clicks on home decor > pillows > pillows on sale, and the URL looks like this: www.website.com/homedecor/pillows/sale.
Should he access the same page from the “pillows” section, his URL would look like this: www.website.com/pillows/sale if the platform allows the page to be accessed from different categories.
A developer can fix these categories within the platform you use.
WWWs and HTTPs
When there’s a WWW version of your site, but it is accessible without it as well, duplicate content is present on each page. The same goes with HTTP and HTTPS, although we encounter this issue much rarely.
The solution is to choose which version you want to go with and redirect the chosen version to the other.
Printer Friendly Pages
If you have printer friendly pages on the website, they are practically the same as the regular ones, with different formatting. Google will find those, and again, you’ll have more duplicate content.
Fighting Duplicate Content
The universal solution for all of the aforementioned issues is often solved with a canonical URL. It is the “correct” URL for a certain chunk of content that was previously available on more addresses.
The tools we can use to determine whether there’s any duplicate content on the website, range from the simple and free ones, to the tools you have to purchase.
The easiest method is to copy a piece of content and put it in quotation marks in the search engine. All of the pages that contain the same content will be displayed on the first page of search results.
Google Webmaster Tools can help, too. When you click on the category Search Appearance > HTML Improvements, it will show you the URLs with the duplicate material.
To prevent making the same content on multiple pages, it’s important to:
- remove the session ID extension in the URL in the platform settings
- have only lowercase letters in the page URL
- start using print style sheets instead of printer friendly pages
- choose only one version of the protocol (HTTP or HTTPS) or www as the beginning of the URL name
- be consistent when making new content and avoid having the same page sections across multiple pages
- use identical URLs for the same pages on internal links
- not block duplicate content pages from indexing, since Google can detect that you want to hide something from it
- link back to the canonical version of the page; make sure the anchor text sounds natural
Duplicate content issues can hurt your SEO efforts in the long run. Even though Google claims it doesn’t penalise websites that have the same content on multiple pages, it can be bad for your ranking and ruin the backlink value.
Identifying the bigger problems takes a few minutes and can be done with the help of the simplest tools. After you realise which duplicate issues you’re having the most trouble with, it’s time to start combating them.