Duplicate and near-duplicate pages often have indexing issues and show poor search results, but luckily the rel="canonical" attribute can prevent that. The correct use of canonicals is one of the most basic and simple operations for website technical optimization. Be careful, though: if this attribute is misused, it will not solve your ranking problems and may even aggravate them.
In this article, our team will tell you all about canonical tag: what it is, when you (don’t) have to use it, and why it may boost your website traffic.
What is rel canonical and what is it for?
The canonical is an attribute that indicates the most prioritized page among duplicates for search engines. It is used when website pages have duplicate or similar content and, as a result, interfere with each other’s ranking. The canonical tag allows you to specify which page among the duplicates should be indexed to evaluate your content and its quality.
If you have a single page that is accessible by multiple URLs, or different pages with similar content (for example, a page with both a mobile and a desktop version), Google sees these as duplicate versions of the same page. Google will choose one URL as the canonical version and crawl that, and all other URLs will be considered duplicate URLs and crawled less often.
Google Developers Documentation
Usually, Google respects the canonical URL you set as the primary one, but not always. It is important to note that for Google Search, the canonical tag is not a directive but only a hint. Hence, when you need to determine the most relevant page for higher ranking, remember that data about the URL you set is taken into account along with other signals.
If you do not specify which URLs will be the main among duplicates, expect two scenarios: 1) a search engine will choose it by itself, relying on other signals; 2) a search engine will see all similar pages as duplicates. In both cases, a negative ranking impact is pretty much inevitable, so we recommend you not count on Google and set the main page on your own.
Basic rules for specifying canonicity
Identifying the potential canonical URL is quite simple. Still, there are rules you should follow to make it work properly:
- Specify absolute URLs, not relative ones.
- Go for the HTTPS version if you’ve switched to SSL.
- Ensure that the letter case of the URL specified in the attribute matches the case of the absolute URL.
- Do not specify several different URLs as canonical for the same page.
- Make sure the canonical page can be scanned and indexed.
- Add only one canonical link in the <mark>head</mark> section per page.
How to specify a canonical address
There are various ways to specify a canonical address. In Google Help, you can find 5 main canonization options:
- rel=canonical tag in the <mark>link</mark> attribute;
- rel=canonical HTTP header;
- Sitemap file;
- 301 redirect;
- AMP version of the website.
The rel=canonical tag in the <mark>link</mark> attribute
To prevent duplicate content issues, you can use a rel=canonical link tag. A canonical link is the most effortless and well-known method to specify the canonical address for a page. In this particular case, it’s a fragment of HTML code. You should merely add a <mark>link</mark> tag to the <mark>head</mark> of the copy page and indicate the main page version to highlight the canonical URL.
Code example:
The syntax is simple and clear:
- link rel="canonical" indicates that this page has a canonical version.
- href="https://example.com/" shows the site address where you can find a canonical site version.
Pros of this method:
- You can mark any number of pages.
- Easy to write even with basic knowledge of HTML.
- Numerous popular SEO-friendly CMSs have either particular fields for canonical pages or specific plugins, such as Yoast SEO for WordPress.
Cons of this method:
- Increases the page code size.
- Listing on large sites can take a while.
- Only suitable for HTML documents.
The rel=canonical HTTP header
This option is constantly used for documents where link rel="canonical" cannot be specified, e.g., for PDF-type documents, because they don’t have a <mark>head</mark> section. However, this method of marking canonical pages is also suitable for ordinary HTML documents.
To set rel=canonical, you need to open the .htaccess file and write the Link command to the header. Here you can see how it will look like:
Pros of this method:
- You can mark any number of pages.
- Suitable for all documents supported by Google Search.
- Won’t increase the page size.
- You can create an automation rule for canonical tags (but only if the site has a clear structure of URLs).
Cons of this method:
- May require access to the server settings.
- Currently, Google supports it for web searches only.
- Listing on large sites can take a while.
Sitemap file
A sitemap is a file where you can provide valuable information about your page’s content to enhance a website’s crawl efficiency. Google sees pages included in the sitemap.xml file as canonical, so you need to add every canonical page to the file. If you miss this moment, Google will set the canonical version by itself, which might not end well.
Pros of this method:
- Simple setup and update.
- Perfect for large sites.
- Page size remains the same.
Cons of this method:
- There is no guarantee that the URLs specified in the Sitemap will be considered canonical in all cases.
- Less important for Googlebot than the rel=canonical attribute.
- Google robots still have to define the duplicate page for every canonical one, which you specified in the Sitemap file.
301 redirect
If you want to specify a canonical page and remove duplicate versions for some time, this method is the best for you. With 301 redirect, it’s pretty easy to show the bot that the URL to which the redirect is sent is more important and relevant.
301 redirect means that the page is at a different address (redirect address). To configure it, you need to go to the web hosting, choose a canonical version, and redirect other versions to that one.
Pros of this method:
- Allows you to get rid of outdated and irrelevant copies.
Cons of this method:
- Doesn't allow for keeping duplicate pages for future indexing.
- Requests access to the server settings.
Practices of canonical use
Even though canonical is used to avoid similar or duplicate content in search results, there are a few more cases when specifying a canonical page is important for your website. In the following chapters, we will tell you more about it, describing extra bonuses you can get from using canonical tags.
As a rule of good digital manners
Today, using the canonical attribute has become a part of Internet etiquette. Even if you have a small site with an unlikely chance of duplication, we recommend specifying a canonical page to prevent potential problems.
In this case, you need to add the self-referencing canonical versions for all main page versions. Once that’s done, pages with additional parameters will be prevented from being indexed and diverting traffic away from the canonical one.
Display and sorting options
The traditional way to use canonical is to specify the main page through sorting or display options that lead to additional GET parameters and other URL modifications. You can often find it on the websites of online stores and aggregators; for instance, look at how it works at etsy.com.
On the Men's Shirts & Tees page, we can see many filters and sorting options: by price, popularity, reviews, and so on.
Thanks to these options, you can add a GET identifier to the URL.
For example, if we choose to sort by Lowest Price, we will get the following URL:
https://www.etsy.com/c/clothing/mens-clothing/shirts-and-tees?explicit=1&category_landing_page=1&order=price_asc
These URLs can pop up to infinity, depending on the sorting and filtering parameters. Although they will display the same information (a list of products) contained on the main page, the search engine will see no difference between them.
If all these pages are indexed and appear in the search, we will get severe internal competition for ranking. That’s why we needthe canonical tag. If such pages are essential for users but pointless for promotion, you need to specify the main one as canonical.
In technical words, you should write <mark>link rel="canonical" href="https://www.etsy.com/c/clothing/mens-clothing/shirts-and-tees"</mark> in the <mark>head</mark> of the sort page. It means that the page specifies the main version of the document, which should be in the search without GET parameters.
Unoptimized filter pages
This case is similar to the previous one, but here you have more room for maneuver. You see, if a specific query matches the filter, the page can be optimized and promoted separately. Then setting the canonical to the main one is not necessary.
Though, there are situations when you cannot optimize the filtering page for any of the clusters, especially if this page is generated using several filters simultaneously.
UTM and tracking parameters
To collect certain information, such as tracking referral sources, you can add specific parameters to the URL, e.g., when you want to track how many users visited your site because of a newsletter or a post to Facebook.
You can check whether there are URLs with tags that lead to your site in Google Analytics.
Even though UTM and tracking parameters are used to collect certain information, these URLs have duplicate content. Since you are not the only one who can create such tags, using a self-referencing canonical page would be a great solution.
Common mistakes of canonicalization
Canonicalization has some pitfalls, so you better not take this too lightly. With a careless attitude, you can mess it up and impair the site’s ranking.
Let’s look at the most common errors and how to avoid or correct them.
1. Using canonical tag for pages with different content
Some webmasters mistakenly link canonical pages with non-identical ones, hoping to boost their SEO efforts. However, this doesn’t work as Google recommends using canonical if you have one page with different addresses or several pages with similar content.
A typical example is specifying a product page that is out of stock or a category page indexed as canonical and vice versa. Since the content of such pages is significantly different, search engines can just ignore this attribute and display both pages in the search results.
2. Blocking non-canonical addresses using robots.txt
Blocking a page in a robots.txt file can prevent it from being crawled. Consequently, the bot will not be able to recognize the specified canonical tags. If you’re interested in learning more about robots.txt, be sure to check our other article once you’re done with this one.
3. Blocking non-canonical URLs using noindex tag
Canonical and noindex tags contradict each other, so when you combine both of them on the same URL, Google usually prefers rel=canonical. If you don’t need to index your URL, use the noindex tag and forget about rel=canonical. But if you don’t need to index the page and, at the same time, want to specify the canonical one, use only rel=canonical or 301 redirect.
4. Ignoring 4xx server response code for non-canonical documents
If the non-canonical document returns the code 4xx, you will get the same result as in the previous case. Search engine bots will not recognize the canonical tag for such pages because the code 4xx will block them.
5. Specifying the first page as the canonical for all subsequent pagination pages
Since the first page and pagination pages have different content, you shouldn’t specify only the first one as canonical. When you set the first page as canonical for all pages, Google may treat them as duplicates, dropping all their content, including internal links. Instead, we recommend you use self-referencing canonical tags for pagination pages.
6. Specifying multiple canonical tags for one document
Using multiple rel=canonical tags for the same page is a mistake, as Google will probably ignore all of them. You can face this problem if the tag is added to the page by several agents, either by manual addition, plugin, or some CMS setting. In such cases, it is best to double-check and make sure your rel=canonical tag works as intended.
7. Specifying rel=canonical in other HTML sections (not in the <mark>head</mark>)
An essential requirement for rel=canonical to function properly is its inclusion within the <mark>head</mark> section of the HTML document. If the attribute is added to the <mark>body</mark> or another page section, it will be ignored.
8. Ignoring 4xx or 3xx server response code for canonical documents
A 4xx or 3xx HTTP status code means that the page you want to specify as canonical isn’t accessible to search engines and won’t show up on the search results page. This might occur if the page specified as the main one doesn’t work properly and can’t be indexed. In this case, you need to replace the page address with the correct one.
9. Adding duplicate pages without specifying canonicality
This warning appears if there are one or more identical or similar pages, but no canonical version is specified for them. In this case, Google will determine the main page on its own, displaying it in the search results. Naturally, it may be different from the version you want to index and promote.
To fix it, you should analyze the groups of duplicates and specify the most suitable page as canonical.
10. Specifying an incorrect canonical tag for pages with different language versions
Specify a canonical page when using tags with the hreflang attribute. The canonical page has to be in the same language as the alternate one or at least the language closest to it.
11. Specifying a canonical URL which is also canonicalized to a different page
This problem crops up when page A has a canonical page B, and page B has a canonical page C. As a result, it causes a “chain of canonicity” issue.
Because of the canonical chain, search engines may misinterpret tags or completely ignore them. Therefore, when writing a URL, ensure the page you’re pointing to doesn’t have a canonical that indicates another page.
12. Specifying URL with a different protocol
When specifying a canonical page, refer to the protocol in the main version of the site. If it uses HTTPS protocol, you should specify the HTTPS in the href attribute as well.
13. Attaching internal links to the canonical URL
Internal incoming links should point to the main version of the document. Consequently, you need to minimize the number of internal links to canonicalized pages to improve your crawl budget. But there are some exceptions, e.g., when you refer to a non-canonical, non-main version of the document to improve UX or when you want to show products immediately using sorting.
Conclusion
Plenty of people still think that canonical is only a recommendation for search engines, but it can become an effective tool for managing site indexing, and you can handle it even on your own. Setting up a canonical tag is not out of your depth — you just need to decide on the best option for your purpose and carefully bring it to life.
Be sure that your efforts are worth it, as canonical tags have a strong impact on the success of your online store. The correct use of the canonicalization tool positively affects site ranking, boosts website promotion, and minimizes the consequences of copying your content by third-party sites. It’s quite the thing to take your business to the next level.