- Introduction to XML sitemaps
- Sitemap guidelines: Limits and add-ons
- Where can I find my sitemap?
- 10 XML Sitemap best practices to follow for webmasters
Creating and optimising sitemaps is an essential yet overlooked search engine optimisation (SEO) practice.
Sitemap challenges appear to be a common issue among our new clients, primarily because they are not required, so adding one to their website is not a priority. Unfortunately, we’ve also seen cases where existing sitemaps get forgotten about, which could negatively affect SEO.
Sitemaps are important for your website and search engines as it provides an easy way for crawl bots to understand your site’s architecture. Sitemaps also offer information on necessary metadata, including:
- The last time a web page was updated
- How often web pages are updated
- Relationship between pages on your site
With that being said, are you ready to learn how to optimise sitemaps?
In this ultimate guide to XML sitemaps, we’ll cover basic sitemap guidelines and best practices to follow to ensure your sitemaps are correctly optimised for users and search engines. Let’s begin!
Sitemaps: An Introduction
A sitemap contains a list of web pages, videos and other files in your website to help search engines understand the structure and relationship between URLs on your site and crawl your website more efficiently.
While a proper internal linking structure should ensure all your pages are reached from your website’s menu, a sitemap can improve crawling speed, especially for large websites.
Google supports various sitemaps formats, including RSS feeds and .txt files, but most are formatted as an XML file.
What Does an XML Sitemap Look Like?
You no longer have to be a web developer to create a sitemap. Still, as the best way to understand a sitemap’s structure is to explore an existing one, we’ve prepared a standard XML sitemap example as shown on Sitemaps.org that can be used as a guide:
Note that creating a simple sitemap is now relatively stress-free, thanks to the various online mapping tools available on the Internet. Screaming Frog is a popular SEO tool that can help you figure out what is best to go in your sitemaps.
When building a sitemap and making it accessible to Google, you should be aware of the following factors to prevent issues down the line.
No sitemap should contain more than 50,000 URLs, and each file size should not exceed 50MB (uncompressed).
Even with this requirement, you should aim to be well within the limit to avoid bogging your web server with large files. If you have a large website with more than 50,000 URLs, create multiple sitemaps to prevent potential issues.
You could also create a sitemap index file containing a list of all your website’s sitemaps and submit this to Google. Whether you choose this method or submit your sitemaps individually, remember only to use canonical URLs and ensure that these are adequately formatted, meaning no rogue characters.
You can also use your sitemap to point to various media types, including images and videos, and specify what language is used on a specific web page using hreflang tags.
Where Can I Find My Sitemap?
In most cases, a website sitemap is located on the root. For example, you can find Soar Online‘s sitemap by visiting: https://soaronline.co.uk/sitemap_index.xml.
If you don’t have a sitemap index, the URL will take on the following format “https://soaronline.co.uk/sitemap.xml.
However, if your sitemap is housed in a folder, it would follow this structure domainname/folder/sitemap.xml. Also, sitemaps located in folders only show the URLs within that folder, so if you have one folder for products and another for pages, the two will never overlap.
If you’re still unsure about where to find your sitemap, you could also check your robots.txt file, which is also located at the root of your site, e.g. https://soaronline.co.uk/robots.txt. Alternatively, go to Google Search Console (GSC) and click the “Sitemaps” tab housed under “Index” in the left-hand sidebar to see if a previously submitted sitemap is available.
Now that we’ve covered the basics, it’s time to delve into sitemap best practices to help you take maximum advantage and enhance your indexability.
10 Sitemap Best Practices to Follow
While there are many sitemap best practices to follow, we’ve compiled some of the most valuable and fundamental methods to help improve your SEO.
#1 Prioritise High-Quality Pages
As with most SEO practices, quality is vital.
Avoid compiling hundreds or thousands of low-quality pages in your sitemap, as search bots will interpret your site as worthless. Submit the most valuable URLs in your sitemap, such as:
- Properly optimised pages, images and video
- Unique content
- Content that would prompt engagement
- Canonicalised URLs
#2 Isolate Problematic Pages
Sometimes search engines won’t index all of your pages, and even using tools like Google Search Console to mitigate the issue can be frustrating as it won’t isolate the problem pages for you.
Large websites often encounter this issue, especially eCommerce sites, which have multiple pages for each product.
That’s why SEO experts, such as ourselves, recommend creating multiple smaller sitemaps and categorising by type – page, product, portfolio etc., to help affirm which web pages are having indexing issues. For example, it could be that some of your product pages aren’t getting indexed because they don’t contain an image.
Once you’ve diagnosed the main issue, you can then take appropriate action, which may be to fix those issues or assign a noindex tag.
#3 Robots Meta Tag vs Robots.txt
If you don’t want a page to be indexed or for links to be followed, use the meta robots “noindex,nofollow” tag rather than blocking pages using robots.txt, as this will eat up your crawl budget.
The tag will prevent search bots from indexing the specified page but preserve link juice, handy for utility and footer pages such as Terms and Conditions or About Us pages.
Robots.txt files are better for disallowing a whole site section, such as a category page. Bots will also interpret the robots meta tag as a firm instruction, whereas the robots.txt file directive is more suggestive.
#4 Create Dynamic XML Sitemaps for Large Sites
If you manage a large website, you should create a dynamic sitemap with rules to determine when a page should be included in your sitemap or marked as indexable.
Dynamically generating a sitemap is usually much quicker and requires less resourcing than a static sitemap. These types of sitemaps are always up-to-date, too, as it is created every time it’s requested, meaning it more accurately reflects the state of your website.
#5 Get Rid of Index Bloat
Have you come across any links that are not relevant to search engines within your sitemap? We advise removing them.
Remember, only high-quality; important pages need to be included in your sitemap as search bots will still crawl other URLs or sections linked to those pages. Disallow or remove sections of your website that duplication occurs, such as in blog tags and author URLs.
If you have a dynamic sitemap, be mindful of child sitemaps generated from web pages you wouldn’t expect to see and remove those.
#6 Be Aware of Missing Pages or Sections
Double-check your sitemaps to ensure all the necessary pages are included in case you’ve accidentally removed key web pages or sections, especially landing pages.
#7 Updating Publication Times
Fresh content will help generate new traffic; however, you shouldn’t trick search engines into re-indexing pages that you don’t make any substantial changes to, as Google could end up removing the date stamps.
Only update modification times for posts, pages and products that you’ve actually repurposed.
#8 Only Include Canonical Versions of URLs
If you have multiple pages in your sitemap which are similar, for example, product pages for the same item in different colours, you should assign a “link rel=canonical” tag to these pages, which will tell search bots which page it should crawl and index.
Most websites tend to use a self-referring canonical tag, but as long as the canonical doesn’t point to an irrelevant page, or worse, a 404 page, the tag will make it easier for crawl bots to discover your most important pages.
#9 Don’t Use Multiple URL Formats
Only include absolute URLs in your sitemap – links that contain all the information necessary to locate the source – and make sure these are in your preferred URL format.
What does this mean?
Well, if you manage an HTTPS website, the URLs in your sitemap should also contain that extension. The same goes for whether your website uses www., and trailing or non-trailing forward slashes at the end of URLs – make sure your sitemap URLs match.
If you’ve recently migrated to the secure protocol, but the URLs in your sitemap are still showing HTTP, check whether a plugin is causing conflict. It’s a common issue for WordPress websites, so you must check your sitemap after performing a website migration.
#10 Don’t Create a Sitemap If You Don’t Have to
While the sitemap is essentially a directory of all your web pages, helping users quickly find the content they’re looking for, and search engines understand your site architecture, not everyone needs one.
If you own a small website or rarely update your site, search bots will have no problem crawling and indexing your web pages, providing you have a well-organised linking structure.
However, large websites with hundreds and thousands of pages or websites that publish content regularly will reap the benefits of using a sitemap.
If you think having a sitemap would be valuable, remember to correctly format any URLs you submit and stay within the size limits to avoid putting any strain on your web server.