It is one thing to create a website and put up some content, but quite another to get it noticed by Google.
Often, the more content you have, the higher your number of crawled and
indexed pages in search engines. But that is not always the case. If
the crawling process is not optimal, search engines might miss out on
some of your content. Today, we have some guidelines for you from
Google, explaining which fields in sitemaps are important, when to use
XML Sitemaps and RSS/Atom feeds, and how to optimize then for Google.
The first question that you could ask is, which to use; XML Sitemaps or RSS/Atom feeds? Should you use RSS/Atom feeds alongside XML Sitemaps? XML Sitemaps are an indispensable part of your site, and they describe a whole set of URLs within it. On the other hand, RSS/Atom feeds describe the most recent changes.
The problem with XML Sitemaps is, they contain complete site information, and hence are much larger than RSS feeds. Ergo they're also downloaded less frequently. So it's not a question of why, and rather why not use both these formats? Each has its own use, and complements the other.
XML Sitemaps or RSS feeds?
The first question that you could ask is, which to use; XML Sitemaps or RSS/Atom feeds? Should you use RSS/Atom feeds alongside XML Sitemaps? XML Sitemaps are an indispensable part of your site, and they describe a whole set of URLs within it. On the other hand, RSS/Atom feeds describe the most recent changes.
The problem with XML Sitemaps is, they contain complete site information, and hence are much larger than RSS feeds. Ergo they're also downloaded less frequently. So it's not a question of why, and rather why not use both these formats? Each has its own use, and complements the other.
XML sitemaps give Google information about all the pages on your site,
while RSS/Atom feeds let Google know what has been most recently updated
on your site. Google also adds that “submitting sitemaps or feeds does
not guarantee the indexing of those URLs.”
Sitemap and RSS feeds best practices
In order to optimize the crawl process, you should use XML Sitemaps
along with RSS/Atom feeds. Here are some best practices for them from Google.
- The two most important pieces of information for Google are the URL itself and its last modification time.
- Only include URLs that can be fetched by Googlebot (ie, don’t include URLs blocked by robots.txt).
- Only include canonical URLs.
- Specify a last modification time for each URL in an XML sitemap and RSS/Atom feed
- For a single XML sitemap, update it at least once a day and ping Google each time.
- For a set of XML sitemaps, maximize the number of URLs in each XML sitemap. The limit is 50,000 URLs or a maximum size of 10MB uncompressed. Ping Google when each XML sitemap is updated.
- When a new page is added or an existing page meaningfully changed, add the URL and the modification time to the RSS/Atom feed.
- In order for Google to not miss updates, the RSS/Atom feed should have all updates in it since at least the last time Google downloaded it. The best way to achieve this is by using PubSubHubbub.
Good luck getting your webpages crawled quickly :)