Textpattern tips, tutorials and code snippets

Dynamic canonical URLs

It is not too often that Google delivers an explicit mandate to help websites rank better. Most of their guidelines are vague and not too constructive, but the recent announcement of support for canonical URLs is as explicit directive as their implementation of XML sitemaps and the “nofollow” value for the rel attribute.

A complement to sitemaps and redirects, canonical URLs help search engines understand what URL should be used for any given page of content. For static websites, this is mostly irrelevant; an HTML file is an HTML file and there is no alternative. But for dynamically generated sites where multiple database query strings can lead to the same content, Google, Yahoo, Bing and others can keep their indexes clean by focusing on the correct version. While Textpattern maintains a fairly straightfowrard URL structure, it is still affected. For instance, all three of these links go to the same place:

  • http://graphicpush.com/rdfa-microformats-standards-big-questions
  • http://graphicpush.com/index.php?id=374
  • http://graphicpush.com/374 (using smd_short_url plugin)

Obviously, I want the GOOG to index the first URL — it is the most descriptive and SEO-friendly. To help explicitly control that, I need to implement a canonical URL link element.

The Basic Structure

We’ve all used the link tag for other purposes — importing CSS files, establishing global navigation elements, and more. This follows the same structure, but uses “canonical” as the value for the rel attribute. For example:

<link rel="canonical" href="http://graphicpush.com/rdfa-microformats-standards-big-questions" />

Now, no matter what URL Google actually follows to my page, it knows that http://graphicpush.com/rdfa-microformats-standards-big-questions is the URL that should be indexed. That’s it.

Making It Dynamic

Because we’re dictating what URL should be used, the <txp:permlink /> tag will work perfectly for individual articles, because it generates a URL based on the Permanent link mode setting in Textpattern’s preferences, not what URL was used to access the page. This would be our code:

<link rel="canonical" href="<txp:permlink />" />

Yes, it’s that simple.

Sections and Categories

The above tag works great for individual articles, but section and category pages require manual URL reconstruction. (Keep in mind the <txp:site_url /> tag automatically creates a trailing slash so there’s no need to add one directly after.)

Sections in Messy Mode

<link rel="canonical" href="<txp:site_url />index.php?s=<txp:section />" />

Sections in Clean URL Mode

<link rel="canonical" href="<txp:site_url /><txp:section />" />

Categories in Messy Mode

<link rel="canonical" href="<txp:site_url />index.php?c=<txp:category />" />

Categories in Clean URL Mode

<link rel="canonical" href="<txp:site_url />category/<txp:category />" />

Wrapping It Together

The canonical URL link tag belongs in the metadata, so I suggest integrating it with whatever model you are using for custom titles and descriptions. This usually involves conditional tags. For instance, if you were running a simple navigation structure with sections and articles, this could be used in a page template:

<txp:if_individual_article>
   <link rel="canonical" href="<txp:permlink />" />
<txp:else />
   <link rel="canonical" href="<txp:site_url /><txp:section />" />
</txp:if_individual_article>

The conditionals can get complicated quickly for URL scenarios that are not native to Textpattern, especially ones that require manual URL construction. For instance, a site using the tru_tags plugin will require additional work for the actual tag pages:

<txp:if_section name="tag">
   <link rel="canonical" href="<txp:site_url />tag/<txp:tru_tags_tag_parameter />" />
</txp:if_section>

Also, the plugin gbp_permanent_links does not always play nice with <txp:permlink />, so manual URL reconstruction may be necessary. For instance, something like site.com/section/category/article-title is not “out of the box” functionality, so the canonical link tag might look like this:

<link rel="canonical" href="<txp:site_url /><txp:section />/<txp:category />/<txp:article_url_title />" />

Worth the Effort?

Yes. History has routinely shown us that by the time Google publicly unveils a new technological aspect of their ranking algorithm, it’s already been in practice for some time, and web developers are wise to adhere to the recommended best practice. Yes it’s one one more tag to worry about, but it’s simplicity makes it almost effortless to integrate into any Textpattern environment.

10 Comments Comment feed

I’ve implemented this but made a minor tweak for the default section.

<txp:if_individual_article>
   <link rel="canonical" href="<txp:permlink />" />
<txp:else />
<txp:if_section name="">
   <link rel="canonical" href="<txp:site_url />" />
<txp:else/>
<link rel="canonical" href="<txp:site_url /><txp:section />" />
</txp:if_section>
</txp:if_individual_article>

This is really helpful…I wasn’t even aware of canonical urls. Thanks for the tip.

Very helpfull. Tks Kevin and Matt

I’ve used a variation of this technique which avoids the conditional tags at the cost of a dash of php:

<txp:php>echo rtrim(site_url(array()), '/');</txp:php><txp:page_url />

If the site_url or page_url tag offered an attribute to strip the trailing/beginning slash we could do with a simple <txp:site_url /><txp:page_url />, but that gives a double-slash after the domain name, so the rtrim is necessary.

Still, it’s a bit easier than the nested tags, and it works for both the front page, section pages and permalinks.

This is really useful for building a proper SEO/SERP website. Thank you for taking the time to write this awesome post!

Question: How do I get rid of /default? On my home page is now has http://www.mischavanderspek.com/default as canonical which isn’t very nice.

Thank you,
Mischa

Mischa, have you looked at MattD’s comment above? He has this line of code which should do what you want:

<txp:if_section name="">
   <link rel="canonical" href="<txp:site_url />" />
  • Will Atkinson
  • 25 October 2011

Like Jonathan, I am unsure how to implement canonicals using php on a site that includes only one header (which is shared through sessions), by each page of the site.

I wish txp:site_url didn’t automatically create a trailing slash as it stops its use with txp:page_url to create a full url to the page for use in things like Twitter buttons.

Please ignore my previous comment, of course I can use txp:permlink

Add a comment

Use Textile help to style your comments