If you're new here, you may want to subscribe to my RSS feed

First things first: what a sitemap is. A sitemap is an XML document which gives Google an idea of where it should find pages on your website if it gets bored looking on its own. And it can be generated simply by an algorithm similar to the googlebot’s where it searches through all your links on all your pages and finds out which are which. Then all Google has to do is come in, get the content and get out. They’re a good idea and should be uploaded to your Google Webmaster Tools account or, lacking that, be announced in your robots.txt. Syntax for sitemap in robots.txt is a bit lower.

One of my goals is to keep updated sitemaps for this blog so Google can crawl it often and easily and help users find info around here. So I’ve tried a few systems and software. First method was free online sitemaps. Google was a helpful bastard here and supplied me with http://www.freesitemapgenerator.com/ which is a nice little trick. However it only crawls about 5000 URLs and has a few difficulties following some auto-generated links (it adds the smarty prefix twice in some cases). So if you doublecheck its results, there’s something usable here. Downside to this is that some sites are bigger than 5000 URLs (especially database-driven ones and tag-based ones) and the bigger downside that you get the result by email and the load on free servers is usually always 100%, meaning it takes a while to get it.

Second hit is http://www.xml-sitemaps.com/ which is nice since it needs no account and offers you the ability to tell google all your pages have been modified just now, as well as priority info. It’s pretty nice since it also gives you the ability to get a tarballed version of the sitemap as well as an html version. It’s pretty nice, but I have no idea as to its limitations.

I have also tried a software called Google Sitemap Generator which has been a bit disappointing really since it failed me on a more complex website. But, where the free web services fail, this one can help a bit.

Also, for Wordpress there’s a nice little plugin called xml-sitemap which does all the work for you, generating the sitemap and also pinging the major players to index it. What that means is that once the sitemap is done it alerts search engines that your sitemap has been updated and resides at… wherever you put it, usually {domain}\sitemap.xml

That’s it about sitemaps, just remember to update them often and try not to change where they are so Google will always redownload them himself.

As for the robots.txt syntax, it’s as simple as:

Sitemap: http://{domain}/sitemap.xml.gz

where obviously the {domain} is your domain. Now just add this in your robots.txt and mind that this example has an archived sitemap added to it.

If you have any trouble with this don’t hesitate to comment or contact me.


One Response to “XML sitemaps - tell Google where to find your content”

  1. SEO for WordPress blogs - tips and tricks | eydryan Says:

    [...] outbound links - my first high-ranking post was one about sitemaps and the reason for its success is that it linked to top ranking pages for the [...]

Leave a Reply


Comments links could be nofollow free.