Share this post:

Share on facebook
Share on twitter
Share on linkedin

Ultimate Guide on How to Optimise XML Sitemaps

  • Introduction to XML sitemaps
  • Sitemap guidelines: Limits and add-ons
  • Where can I find my sitemap?
  • 10 XML Sitemap best practices to follow for webmasters


Creating and optimising sitemaps is an essential yet overlooked search engine optimisation (SEO) practice.

Sitemap challenges appear to be a common issue among our new clients, primarily because they are not required, so adding one to their website is not a priority. Unfortunately, we’ve also seen cases where existing sitemaps get forgotten about, which could negatively affect SEO.

Sitemaps are important for your website and search engines as it provides an easy way for crawl bots to understand your site’s architecture. Sitemaps also offer information on necessary metadata, including:

  • The last time a web page was updated
  • How often web pages are updated
  • Relationship between pages on your site

With that being said, are you ready to learn how to optimise sitemaps?

In this ultimate guide to XML sitemaps, we’ll cover basic sitemap guidelines and best practices to follow to ensure your sitemaps are correctly optimised for users and search engines. Let’s begin!

Sitemaps: An Introduction

A sitemap contains a list of web pages, videos and other files in your website to help search engines understand the structure and relationship between URLs on your site and crawl your website more efficiently.

XML Sitemap

While a proper internal linking structure should ensure all your pages are reached from your website’s menu, a sitemap can improve crawling speed, especially for large websites.

Google supports various sitemaps formats, including RSS feeds and .txt files, but most are formatted as an XML file.

What Does an XML Sitemap Look Like?

You no longer have to be a web developer to create a sitemap. Still, as the best way to understand a sitemap’s structure is to explore an existing one, we’ve prepared a standard XML sitemap example as shown on that can be used as a guide:





Note that creating a simple sitemap is now relatively stress-free, thanks to the various online mapping tools available on the Internet. Screaming Frog is a popular SEO tool that can help you figure out what is best to go in your sitemaps.

Sitemap Guidelines

When building a sitemap and making it accessible to Google, you should be aware of the following factors to prevent issues down the line.

Size Limitation

No sitemap should contain more than 50,000 URLs, and each file size should not exceed 50MB (uncompressed).

Even with this requirement, you should aim to be well within the limit to avoid bogging your web server with large files. If you have a large website with more than 50,000 URLs, create multiple sitemaps to prevent potential issues.

You could also create a sitemap index file containing a list of all your website’s sitemaps and submit this to Google. Whether you choose this method or submit your sitemaps individually, remember only to use canonical URLs and ensure that these are adequately formatted, meaning no rogue characters.

Optional Add-ons

You can also use your sitemap to point to various media types, including images and videos, and specify what language is used on a specific web page using hreflang tags.


Where Can I Find My Sitemap?

In most cases, a website sitemap is located on the root. For example, you can find Soar Online‘s sitemap by visiting:

If you don’t have a sitemap index, the URL will take on the following format “

However, if your sitemap is housed in a folder, it would follow this structure domainname/folder/sitemap.xml. Also, sitemaps located in folders only show the URLs within that folder, so if you have one folder for products and another for pages, the two will never overlap.

If you’re still unsure about where to find your sitemap, you could also check your robots.txt file, which is also located at the root of your site, e.g. Alternatively, go to Google Search Console (GSC) and click the “Sitemaps” tab housed under “Index” in the left-hand sidebar to see if a previously submitted sitemap is available.

Now that we’ve covered the basics, it’s time to delve into sitemap best practices to help you take maximum advantage and enhance your indexability.

10 Sitemap Best Practices to Follow

While there are many sitemap best practices to follow, we’ve compiled some of the most valuable and fundamental methods to help improve your SEO.

#1 Prioritise High-Quality Pages

As with most SEO practices, quality is vital.

Avoid compiling hundreds or thousands of low-quality pages in your sitemap, as search bots will interpret your site as worthless. Submit the most valuable URLs in your sitemap, such as:

  • Properly optimised pages, images and video
  • Unique content
  • Content that would prompt engagement
  • Canonicalised URLs

#2 Isolate Problematic Pages

Sometimes search engines won’t index all of your pages, and even using tools like Google Search Console to mitigate the issue can be frustrating as it won’t isolate the problem pages for you.

Large websites often encounter this issue, especially eCommerce sites, which have multiple pages for each product.

That’s why SEO experts, such as ourselves, recommend creating multiple smaller sitemaps and categorising by type – page, product, portfolio etc., to help affirm which web pages are having indexing issues. For example, it could be that some of your product pages aren’t getting indexed because they don’t contain an image.

Once you’ve diagnosed the main issue, you can then take appropriate action, which may be to fix those issues or assign a noindex tag.

#3 Robots Meta Tag vs Robots.txt

If you don’t want a page to be indexed or for links to be followed, use the meta robots “noindex,nofollow” tag rather than blocking pages using robots.txt, as this will eat up your crawl budget.

The tag will prevent search bots from indexing the specified page but preserve link juice, handy for utility and footer pages such as Terms and Conditions or About Us pages.

Robots.txt files are better for disallowing a whole site section, such as a category page. Bots will also interpret the robots meta tag as a firm instruction, whereas the robots.txt file directive is more suggestive.

#4 Create Dynamic XML Sitemaps for Large Sites

If you manage a large website, you should create a dynamic sitemap with rules to determine when a page should be included in your sitemap or marked as indexable.

Dynamically generating a sitemap is usually much quicker and requires less resourcing than a static sitemap. These types of sitemaps are always up-to-date, too, as it is created every time it’s requested, meaning it more accurately reflects the state of your website.

#5 Get Rid of Index Bloat

Have you come across any links that are not relevant to search engines within your sitemap? We advise removing them.

Link Building 2

Remember, only high-quality; important pages need to be included in your sitemap as search bots will still crawl other URLs or sections linked to those pages. Disallow or remove sections of your website that duplication occurs, such as in blog tags and author URLs.

If you have a dynamic sitemap, be mindful of child sitemaps generated from web pages you wouldn’t expect to see and remove those.

#6 Be Aware of Missing Pages or Sections

Double-check your sitemaps to ensure all the necessary pages are included in case you’ve accidentally removed key web pages or sections, especially landing pages.

#7 Updating Publication Times

Fresh content will help generate new traffic; however, you shouldn’t trick search engines into re-indexing pages that you don’t make any substantial changes to, as Google could end up removing the date stamps.

Only update modification times for posts, pages and products that you’ve actually repurposed.

#8 Only Include Canonical Versions of URLs

If you have multiple pages in your sitemap which are similar, for example, product pages for the same item in different colours, you should assign a “link rel=canonical” tag to these pages, which will tell search bots which page it should crawl and index.

Most websites tend to use a self-referring canonical tag, but as long as the canonical doesn’t point to an irrelevant page, or worse, a 404 page, the tag will make it easier for crawl bots to discover your most important pages.

#9 Don’t Use Multiple URL Formats

Only include absolute URLs in your sitemap – links that contain all the information necessary to locate the source – and make sure these are in your preferred URL format.

What does this mean?

Well, if you manage an HTTPS website, the URLs in your sitemap should also contain that extension. The same goes for whether your website uses www., and trailing or non-trailing forward slashes at the end of URLs – make sure your sitemap URLs match.

If you’ve recently migrated to the secure protocol, but the URLs in your sitemap are still showing HTTP, check whether a plugin is causing conflict. It’s a common issue for WordPress websites, so you must check your sitemap after performing a website migration.

#10 Don’t Create a Sitemap If You Don’t Have to

While the sitemap is essentially a directory of all your web pages, helping users quickly find the content they’re looking for, and search engines understand your site architecture, not everyone needs one.

If you own a small website or rarely update your site, search bots will have no problem crawling and indexing your web pages, providing you have a well-organised linking structure.

However, large websites with hundreds and thousands of pages or websites that publish content regularly will reap the benefits of using a sitemap.

If you think having a sitemap would be valuable, remember to correctly format any URLs you submit and stay within the size limits to avoid putting any strain on your web server.

You may also be interested in...

Semantic SEO and different spoken terms

How to Use Semantic SEO to Boost Traffic & Search Rankings

Over the years, search engines have become more intelligent and adopted processes such as semantic analysis to understand human language better and deliver more relevant search results to users. To keep up with the times, marketers need to change their approach to keyword research and utilise semantic SEO to build topical authority.

Read More »
Content Marketing in 2022

The Anatomy of Top-Performing Organic Content in 2022

Engaging content needs to be original, relevant and offer value to the user via an extensive criterion that ranges article length to visual elements and content quality. Technical factors such as article structure, internal links, and image ALT tags also play an essential role in a winning content formula.

Read More »

Web Traffic Results, Trends & Benchmarks For 2022

Over the last few years, we’ve witnessed a phenomenal increase in website traffic, especially mobile traffic, accounting for approximately 55% of total web traffic in 2022.
More and more consumers use their smartphones to discover new brands, shop online and research products or services.

Read More »
Spelling & Grammar Impacting SEO Strategy

Spelling & Grammar: Is It A Google Ranking Factor?

The introduction of artificial intelligence (AI) and machine learning to support algorithmic processes such as RankBrain is steadily gaining weight among the top ranking signals, meaning marketers should seek to practice good spelling, grammar and punctuation.

Read More »
Instagram marketing tips

Instagram Marketing Tips for 2022

According to the latest statistics, Instagram has 1 billion monthly active users and 500 million daily active users, making it one of the most popular social media networks globally.

Read More »
MarTech Awards Logo

Soar Online
Best Boutique SEO Agency UK

Excellence Award for Digital Recovery Services 2022

Track the keywords & phrases your clients would use to search for your business

Use our handy tool to receive keyword ranking reports on a weekly basis – it’s fast and free.

500 Club

5 Fantastic Benefits
only £500 per month

The Digital Revolution is happening right now

Give your online business some altitude.

Contact Us

This field is for validation purposes and should be left unchanged.