All about Sitemaps
Undoubtedly, Search engines are becoming increasingly effective in crawling and indexing millions and trillions of web content daily. But even then, waiting for your site to be discovered and crawled by Google requires a lot of patience. And, what if your site was publishing time sensitive content (like, news article) or specialized digital media content likes images and videos? Even after continuing improvements in the crawling and the subsequent organic search results to discover newly added content, it still requires time for Google to find it all. This is where Sitemaps comes to rescue.
So, a Sitemap?
To put it simply, Sitemap.xml is a file which contains a detailed list of links to all the webpages on your website. Google provided for Google Sitemaps so that, it would be easy for web crawlers to find dynamic pages, which were typically being missed.
It helps the search engines to understand the organization of your website in general and web content in particular and hence, crawl them better.
Also, Sitemaps provide the information about valuable metadata (information about when a webpage was last updated, how often is it altered etc) of pages that you list in the Sitemaps.
While small websites with a little content or webpages do not really need a sitemap (though it is always advisable to have one), it’s all the more necessary for larger sites to have a sitemap to support their navigation.
Sitemap are especially very important for sites that use Adobe Flash or JavaScript menus that does not include HTML coding.
When do I use Sitemaps?
Even if the internal linking of your pages are optimized for better crawling by the search engines, it’s advisable to have a Sitemap, if you can relate your website with any of the following criterias:
- If the site is really large: If your site is huge, there are even more possibilities of Google bots overlooking or missing out on crawling some of the new or recently updated pages on your website.
- If the site has a large number of pages that are isolated or well not linked to each other: If there are a lot of pages on your site that are not naturally linked with each other, there are again, chances that Google could overlook those pages.
- If the site is new and has few external links to it: Google bots crawl the web, following one link to another and so on. So, if you have only a few external links pointing to your site, Google might not discover some of your pages.
- If the site uses rich media content or Ajax: These are not easily crawled by Google, so it would be much better if you provide a Sitemap for Google to take additional information into account, wherever required.
SEO benefits of having a sitemap:
- Easier navigation and visibility by the Search Engines
Giving a sitemap to the search engines make it easier for them to crawl and access your site and subsequently index them for inclusion into its results. And since, this protocol is also acceptable by other search engines like Yahoo, Bing etc, it will lead to better indexing and ranking of your website by all search engines in the long run.
- Easier navigation and visibility by the users
A Sitemap can be a lifesaver for the users if you have a complicated and lengthy website. It’s very natural for a user to go away if a large number of pages are placed altogether at his sight. In this situation, sitemaps guide them through the content of your website, and help them chose the section in which they are interested in. This helps in sustaining traffic in the long run.
- It informs the Search Engines immediately of a new page has been included (The changes are indexed fast as compared to when you don't have a sitemap)
The changes to your website will be indexed faster by the search engines if you reflect the same changes in your sitemap and submit it to them. This can be useful to those websites which have their content updated quite frequently. It’s not necessary that the web crawlers visit your website everyday and therefore, it is important that the changes are indexed before the next visit.
- Reduces the chances of skipping out on any page for indexing
Since all the internal pages of the website are linked with the sitemap, the chances of Google crawlers missing out on any important page is reduced to almost nothing.
The old cold war between XML and HTML sitemaps
HTML sitemaps
HTML sitemaps can be understood as content archives, which are actual webpages containing a link to every page on your website. It’s like any other regular webpage which can be read by the users as well as the search engines.
If you have a small website, a single page HTML sitemap would suffice. On the other hand, the HTML sitemap of a huge website would more likely be designed as content archives, perhaps organized by various categories including sections, publication date etc.
For SEO and usability purposes, it is advisable to put a little description along with the links to let the users know, where will they redirected to on clicking the link. Additionally, there should be a link to sitemap on each and every page. The logic behind this is that if a user or a crawler lands up at any page on your website, it would be easier for them to find their way around the site and discover pages.
XML Sitemaps
An XML Sitemap is consumed directly by the search engines and hence, not visible to the users. All of the major search engines including Google, Bing, Yahoo use these sitemaps while crawling. The XML Sitemap protocol includes a set of defined XML tags (both required and optional), through which webmasters provide information about their pages, URLs, the frequency at which the changes are made, when were the pages last modified and the priority of a specific page in relation to the other pages in Sitemap etc.
These XML Sitemap files can go up to 10 MB file size or include up to 50,000 URLs. An added advantage of XML sitemaps is that it can be used to provide metadata (additional information) about the pages included in your sitemap.
Here is a sample XML format by sitemaps.org:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.example.com/catalog?item=73&desc=vacation_new_zealand</loc>
<lastmod>2004-12-23</lastmod>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.example.com/catalog?item=74&desc=vacation_newfoundland</loc>
<lastmod>2004-12-23T18:00:15+00:00</lastmod>
<priority>0.3</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=83&desc=vacation_usa</loc>
<lastmod>2004-11-23</lastmod>
</url>
</urlset>
Which is better? HTML or XML?
After going through both the sitemaps, the question remains which is better? Should you create an HTML sitemap, which is an old school landing page for the users and the search engines to find most of your pages through a single page and or an XML sitemap, that is only visible to the search engines?
Well, Matt Cutts was asked the same question. Let’s watch what he has to say in this regard:
hi5DGOu1uA0
As he said, XML sitemaps do not guarantee that your page will be definitely crawled. It just helps Google discover your pages. On the other hand, HTML sitemaps, when they have a link to your homepage, can help the search engines to discover the rest of the pages of your site through the link. Additionally, HTML sitemaps carry the usability advantage as well.
It’s advisable to have both the sitemaps. But, if you really have to choose one amongst them, I would suggest to start with an HTML sitemap and you can add it’s XML version later as it is quite an easy task.
Image Sitemap:
Google always have an eye for good and original image content. If your site is rich in images and you wish to get, as many of them indexed, then you should go for creating Google Image Sitemap.
These Google image extensions for Sitemaps not only helps Google to get additional information about the images present on your URLs but also, helps it reach those images, which it would not have found otherwise (such as images with JavaScript code). It therefore, gives an indication to Google about the images you want to get crawled or indexed.
To give this information to Google, you are required to add image specific tags to your sitemap. You can do this by either creating a separate sitemap for listing images or can add the image information to an existing sitemap. Though, there are only 2 XML tags that are required, but you still have an option of providing additional metadata about your image content.
You can list up to 1000 images for each page.
Here is a screenshot of the format for Image sitemap by Google
Video Sitemap:
Video content is especially, hard to get indexed by crawling as it’s metadata is not always available in the page on which it is present.
If your site uses videos, Google Video sitemaps are an excellent way to indicate Google about the video content on your site (and hence, ask it to crawl them). They provide for improving your findability of your videos and hence, your site’s performance in Google Video Search results.
Through a video sitemap. webmasters can inform Google about the category, title, description of the video etc.
Similar to image sitemaps, the video specific tags can be added to the existing sitemap or a separate sitemap dedicated for listing the videos.
News Sitemap:
If you are a blogger which primarily focuses on writing about timely events and updating it timely or, have a news based website, you should consider creating a Google News sitemap.
With personalized search these days, the news modules are displayed at the top of organic results on the SERPs, getting into the news column can provide a great opportunity for widening the visibility of your content. It is always advisable to create a separate sitemap for news content, as this particular sitemap will be crawled more often by Google to check for new articles or news piece. Therefore through a Google news sitemap, your articles would be discovered faster and will be crawled faster, improving the coverage of your content.
Additionally, these sitemaps identify the title and publication date of each article and specify the type of content in each article using genres and access tags.
Google news sitemap is all the more recommended for those sites which are new, have dynamic content or have a path that requires several links to be clicked, to reach to their website/content.
Building a Sitemap
Unles, you have a small website and are willing to make the sitemap on your own, you should consider automating the process of creating your sitemap. RankWatch Sitemap generator tool
Submitting your Sitemap
Unlike, the robots.txt file, which has a default location in the site’s root and therefore, read by the search engines on visiting a site, Sitemap has no standardized file name and location and are therefore, not read by the search engines by default. Though, you can provide a reference to your sitemap in your robots.txt file.
But, you essentially need to submit it to Google for it to happen, through your Google Webmaster Tools account which will in addition, also tell you if your sitemap has any errors which needs to be fixed.
Share Your Thoughts