Shyam biswas Search engine optimization (SEO) blog: XML Sitemaps

XML Sitemaps

Google, Yahoo!, and Microsoft all support a protocol known as XML Sitemaps. Google first

announced it in 2005, and then Yahoo! and Microsoft agreed to support the protocol in 2006.

Using the Sitemaps protocol you can supply the search engines with a list of all the URLs you

would like them to crawl and index.

Adding a URL to a Sitemap file does not guarantee that a URL will be crawled or indexed.

However, it can result in pages that are not otherwise discovered or indexed by the search

engine getting crawled and indexed. In addition, Sitemaps appear to help pages that have been

relegated to Google’s supplemental index make their way into the main index.

This program is a complement to, not a replacement for, the search engines’ normal, link-based

crawl. The benefits of Sitemaps include the following:

• For the pages the search engines already know about through their regular spidering, they

use the metadata you supply, such as the last date the content was modified (lastmod

date) and the frequency at which the page is changed (changefreq), to improve how they

crawl your site.

• For the pages they don’t know about, they use the additional URLs you supply to increase

their crawl coverage.

• For URLs that may have duplicates, the engines can use the XML Sitemaps data to help

choose a canonical version.

• Verification/registration of XML Sitemaps may indicate positive trust/authority signals.

• The crawling/inclusion benefits of Sitemaps may have second-order positive effects, such

as improved rankings or greater internal link popularity.

The Google engineer who in online forums goes by GoogleGuy (a.k.a. Matt Cutts, the head of

Google’s webspam team) has explained Google Sitemaps in the following way:

Imagine if you have pages A, B, and C on your site. We find pages A and B through our normal

web crawl of your links. Then you build a Sitemap and list the pages B and C. Now there’s a

chance (but not a promise) that we’ll crawl page C. We won’t drop page A just because you

didn’t list it in your Sitemap. And just because you listed a page that we didn’t know about

doesn’t guarantee that we’ll crawl it. But if for some reason we didn’t see any links to C, or

maybe we knew about page C but the URL was rejected for having too many parameters or

some other reason, now there’s a chance that we’ll crawl that page C.

Sitemaps use a simple XML format that you can learn about at http://www.sitemaps.org. XML

Sitemaps are a useful and in some cases essential tool for your website. In particular, if you

have reason to believe that the site is not fully indexed, an XML Sitemap can help you increase

the number of indexed pages. As sites grow in size, the value of XML Sitemap files tends to

increase dramatically, as additional traffic flows to the newly included URLs.

Layout of an XML Sitemap

The first step in the process of creating an XML Sitemap is to create an .xml Sitemap file in a

suitable format. Since creating an XML Sitemap requires a certain level of technical know-how,

it would be wise to involve your development team in the XML Sitemap generator process

from the beginning. Figure 6-2 shows an example of some code from a Sitemap.

FIGURE 6-2. Sample XML Sitemap from Google.com

To create your XML Sitemap, you can use the following:

An XML Sitemap generator

This is a simple script that you can configure to automatically create Sitemaps, and

sometimes submit them as well. Sitemap generators can create these Sitemaps from a URL

list, access logs, or a directory path hosting static files corresponding to URLs. Here are

some examples of XML Sitemap generators:

• SourceForge.net’s google-sitemap_gen

• ROR Sitemap Generator

• XML-Sitemaps.com Sitemap Generator

• Sitemaps Pal

• XML Echo

Simple text

You can provide Google with a simple text file that contains one URL per line. However,

Google recommends that once you have a text Sitemap file for your site, you use the

Sitemap Generator to create a Sitemap from this text file using the Sitemaps protocol.

Syndication feed

Google accepts Really Simple Syndication (RSS) 2.0 and Atom 1.0 feeds. Note that the

feed may provide information on recent URLs only.

Shyam biswas Search engine optimization (SEO) blog

Pages

Friday, July 12, 2013

XML Sitemaps

No comments:

Post a Comment

About Me