The best way to help Google find your website’s pages is to create a Sitemap. With its help, you can show the search bots how your site’s pages are organized and which ones are the most important and relevant. In this article, we will look at what a sitemap.xml file is and explain how to create and configure it.
Sitemap.xml is an important component of any site’s technical optimization. In a way, it is a guide for bots, showing them the most important routes. From the Sitemap, the search engines can understand how the site is organized, what takes priority, when the content was last updated, and so on.
However, not all optimization experts use this tool, and there are different opinions online on whether it’s really necessary. Some might say that Google’s resources and capabilities allow for finding and figuring out the structure of any site without outside help. In our experience, however large, the resources of search engines are disproportionate to the number of documents found online. As a result, problems with crawling and indexing occur more often than you might imagine, affecting both the million-page platforms and smaller sites.
What is a Sitemap?
A Sitemap is a text file containing a list of all the website pages that search bots need to know about. Basically, it is a document that guides the search engines in crawling the main content of a site.
A sitemap.xml file is located in the root directory of a site. There you can specify the URLs, the priority of their scanning, the date of the last update, the availability of other language versions, etc. You can also add additional information, depending on the type of content. For example, you can specify a video’s duration, rating, or age limit.
Elements making up an XML Sitemap
All the elements are marked with special tags, which visually resemble HTML code.
Mandatory elements:
- First line. The first line indicates the XML version and the required encoding for Sitemap files — UTF-8;
- urlset — a tag that indicates the standard of the current protocol. It is a parent tag for the ones following;
- url — a tag for each URL entry. It is a parent to the tags below and a child of urlset;
- loc — URL of a page. This URL must begin with the protocol (such as HTTP) and end with a trailing slash if your web server requires it.
Optional elements:
- lastmod — a tag that indicates the last date when the page was updated. It is a child tag of. Google considers the value of this tag only if it coincides with the actual time of the last page update;
- changefreq — a tag that indicates the approximate refresh rate of the page. Valid values are: always, hourly, daily, weekly, monthly, yearly, and never;
- priority — a tag that indicates the priority of the page in comparison with other pages. The value is between 0.0 and 1.0.
Although in the Sitemaps XML format document, they list the changefreq and priority tags as suitable for use, the search engine ignores the values listed in them, according to the latest data from Google Search Central.
Why does my site need a Sitemap?
You will definitely need a Sitemap if your site is large or you have just started a new project. In the case of the former, it will help search bots detect and index the pages located far from the main one. When it comes to new sites, using a Sitemap is one way to speed up their indexing. Otherwise, you might have to wait for a long time until the search engines pay attention to your pages. You will also need a Sitemap if you have a lot of multimedia or news-related content or even a large archive of pages that are not interlinked.
A Sitemap makes it easier for Google and other search engines to crawl and index your site, thereby increasing its chances of appearing in search results.
However, don’t think that a Sitemap is unimportant if your website doesn’t fall into any of these categories. Even though Google, in its documentation, offers a list of sites that may not need a Sitemap, we consider it a must-have component of a successful promotion.
A Sitemap offers the following benefits:
- It helps crawlers know which pages to index. By adding a URL to the Sitemap, you’re emphasizing its importance.
- It serves as a tool to control the crawl budget. With the help of a Sitemap, you can specify which pages to crawl more often and which to spend fewer resources on.
- It allows you to indicate regional versions of pages — this is an easy way to organize a multi-regional site. For this, you only need to add hreflang attributes to URLs.
- It facilitates the crawling of sites with a convoluted structure. If the structure and linking on the site are not organized correctly, search bots may not always get to the right pages when following the links from the main page. In this case, adding them to a Sitemap will help solve that.
- It speeds up the crawling of media files and news pages. If you want your site’s content to appear in search results for pictures, videos, or news, adding information about it to the Sitemap is worthwhile.
Google’s requirements for Sitemap files
To avoid problems associated with the use of a Sitemap by search engine crawlers, you should adhere to the following rules:
- Use UTF-8 encoding.
- The file size cannot exceed 50 MB in compressed format.
- The maximum number of URLs cannot exceed 50,000.
- The links within a Sitemap must be located within the same domain as the file itself.
- If the file is too big, divide it into several files and specify them in the Sitemap index file.
- The server response when accessing the file should be 200 OK.
- Specify only URLs (without GET parameters and session IDs).
- Mark additional language versions of a page with the hreflang attribute.
- Only numbers and Latin letters can be used.
How to create a Sitemap
If you think your website actually needs a Sitemap, follow the next steps to create it:
Step 1. Examine the website’s structure and determine canonical addresses
The first thing to do is to look at the pages on your site and see how they are organized. If your site is small, it will be quite easy to collect everything without additional tools. On large websites, the manual collection will require a lot of time and effort, but parsers such as Netpeak Spider or Screaming Frog can help with that.
By creating a Sitemap file, you let search engines know which URLs to show in search results. Such URLs are called canonical. If you’ve added the same content on multiple URLs, select its main version and include only that in your Sitemap.
From all the pages on your site, determine the important ones — they will be added to the Sitemap.
Important! Make sure that the pages you defined to be added to the Sitemap have a server response code of 200 OK, are canonical, and without GET parameters and session IDs.
Step 2. Define file format
Google supports several formats for Sitemap files.
The most common format for a Sitemap is XML. If you use RSS, mRSS, and Atom 1.0 feeds on your site, you can also use them as a source for your Sitemap.
Step 3. Create your Sitemap
Once you’ve defined the format and prepared a list of URLs, you can begin creating the Sitemap itself. You can do this in several ways:
- With a built-in CMS tool.
If your site uses one of the content management systems, check whether it has a built-in Sitemap generator. This is a fairly basic tool present in most CMS, and it can generate a Sitemap for you without additional plugins. In this case, you only need to add the URLs defined in step 1 and specify the desired parameters.
- Manually.
Manually creating a Sitemap is viable if you don’t have many URLs or are planning to use the text format. You will need a text editor for this such as Notepad.
To create a Sitemap in text format, simply add a list of URLs to the document and save it with a .txt extension.
For a Sitemap in XML format, you will need to use the proper syntax with the URLs. If you know how to work with HTML, that should be a piece of cake. Take your time, and make sure you added everything correctly.
- Using a generator.
If you need to add several hundred or thousands of pages to a Sitemap, it is better to use a generator. There are many paid and free online tools for creating Sitemaps. You can choose the best one for you from Google’s selection.
Step 4. Check whether the file you created is correct
For your Sitemap to work, it must be error-free. This is especially true if you’ve created the Sitemap manually.
You can use a Sitemap Validation Tool to check whether your Sitemap is correct. There are plenty of such tools available online and they will highlight the problems and provide suggestions on what to correct.
Step 5. Make the Sitemap available for search engines
For that, you need to:
- Add the file to the site’s root directory;
- Add the Sitemap link to the robots.txt file.
A Sitemap file can be placed in any part of the site, but it only affects directories one level below the parent directory. For this reason, we recommend that you place the Sitemap file at the root level to make it work for the entire site.
Step 6. Add the Sitemap to Google Search Console
Once you’ve completed the above steps, it’s time to tell Google about your new Sitemap. You can do this by adding your Sitemap to the Google Search Console in the Sitemaps tab.
Once it’s added, you will get information concerning the errors, which pages are already indexed and which are not, along with the reasons behind these developments.
Sitemap files with extended syntax
Now that you know about a classic Sitemap’s purpose and creation process let’s look into Sitemap files with extended syntax.
In their guidelines, Google mentions 3 types:
- for videos;
- for images;
- for news.
Video Sitemaps
The XML video Sitemap is one of the extended syntaxes that Google supports. Using extended syntax for videos in a Sitemap helps Google display this type of content in search results where possible.
Other benefits include:
- It lets Google understand the website content.
- It allows adding a detailed description to the file.
- Your videos will be available for search in Google Video.
- A video thumbnail will be displayed in the search results (which can increase the number of hits from the SERP).
A video Sitemap includes the following mandatory elements:
- URL of the page where the video is located. If there are multiple videos on a page, create a single loc tag for it with a child video element for each video present there.
- Actual URL of a video file. Google can scan the following file types: 3GP, 3G2, ASF, AVI, DivX, M2V, M3U, M3U8, M4V, MKV, MOV, MP4, MPEG, OGV, QVT, RAM, RM, VOB, WebM, WMV, and XAP.
- URL of the player for a particular video. Usually, this information is specified in the src attribute of the embed tag.
- URL of the video’s thumbnail. Recommended sizes: between 160×90 and 1920×1080 pixels. Image formats: .jpg, .png, or gif.
- Title. it must match the title of the page where the video is displayed.
- Video description. it must match the meta description of the page. The maximum number of characters is 2048.
You can find all XML tags for the video Sitemap in Google’s Guidelines.
XML Sitemap for images
Google also supports extended syntax for images. You can create a separate image Sitemap or add syntax to an already existing one. Using extended syntax in a Sitemap provides the search engines with additional information about the images present on a site. It can also help Google detect and index images it might have missed when crawling the site.
As is the case with standard Sitemaps, there are mandatory and optional XML tags.
The mandatory tags include:
- Image:image1 — contains all the information about a single image.
- image:loc — a URL of an image.
Optional tags, such as image captions, titles, and URL licenses, were also used in the past but have been since deprecated.
Sitemap files for Google News
We recommend using this syntax if your website has a news section and you want to increase the likelihood of your articles appearing in Google News.
Requirements for this type of Sitemap include:
- Publications must be created no sooner than two days ago.
- The file needs to be updated as new articles are published.
- It contains up to 1000 URLs.
There are only 3 mandatory tags:
- Name of the media where the article is published.
- An article’s publishing date.
- An article’s header.
Conclusions
An XML Sitemap is one of the components of a well-executed technical SEO that you can implement on your site today. And if you already have a Sitemap, it would be worth making sure there are no errors and checking whether all the pages are properly indexed.
If your website looks confusing or you simply don’t have the time to figure out what to add, get in touch with us. We’ll happily answer all your questions and help you create your Sitemap.