Google, Yahoo! and Microsoft announced a joint agreement today at the SMX West conference in Santa Clara to support a new protocol which is intended to assist webmasters in reducing duplicate content issues on websites. All three are issuing blog postings about this, and Matt Cutts presented the new protocol in a session just a few minutes ago at SMX.
This is a really exciting addition to the SEO’s toolbox! Duplicate content often occurs when webmasters accidentally create alternate URLs for the same content across their sites. The larger the site, the more likely it is to have serious duplication issues. This was one of the most difficult issues I used to work upon when I was in charge of SEO for Superpages.com — nearly any site which uses dynamic URLs with querystrings to specify how content is delivered end up with some level of duplication.
Here’s just a few examples of duplicate URLs:
The solution the search engines collaborated upon to solve canonical and duplicate content issues is very straightforward — one can add them within the HEAD tags of a document:
<link rel=”canonical” value=”http://example.com/page.html”/>
Matt provided a number of caveats and advance clarifications about use of the tag:
- It’s a hint to the search engines. Not a directive/mandate/requirement.
- Far better to avoid dupes and normalize URLs in the first place.
- If you’re a power user, exhaust alternatives first.
- Does not work across domains.
- DOES work across subdomains.
(The example Matt gave was from Zappos’ new design subdomain: zeta.zappos.com vs. http://www.zappos.com)
- Pages do not have to be identical.
- Can one use relative / absolute urls? Yes, but we suggest absolute!
- Can you follow a chain of canonicals? We may, but don’t count on it.
Matt added a further disclaimer about how search engines may not be able to handle some extreme cases, so don’t push the envelope too much. He said:
- Point to a 404?
- Or create an infinite loop?
- Or point to an uncrawled URL?
- Or www/non-www conflict?
- Search engines will do the best they can.
Then, he jokingly quoted Ghostbusters in context to this: “Don’t cross the beams!”
This whole protocol is really interesting and a great tool for webmasters to use. However, the caveats and strong suggestion that webmasters try to fix duplication content issues before resorting to this canonical tag would make me prefer to try to solve such problems instead of using this. It’s good to have the option, though!
Here’s the top announcement articles about this Canonical Tag protocol:
- Google Blog: Specify Your Canonical
- Yahoo! Search Blog: Fighting Duplication: Adding more arrows to your quiver
- Live Search Blog: Partnering to help solve duplicate content issues
- Search Engine Land: Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduct Duplicate Content Clutter
Interesting idea, Chris! Given the trouble that the search engines have had in telling the difference between variant versions of a given page, it is good that the site’s publisher will now be able to provide some hints.
Instead of the Absolute Link proposal that Matt Cutts suggested, I would recommend using domain relative links in conjunction with a BASE HREF tag. The domain relative link structure is much better for testing pages on a developement or staging server and makes content migration easier. Adding a BASE HREF tag should allow the domain relative links to be consistently resolved to the correct domain.