- What is crawling and indexing?
- How can I check whether my website is indexed with search engines?
- Common reasons why Google might not be indexing your web pages
- How to get indexed by Google
If Google doesn’t index your website, you won’t show up in search results for queries or generate organic traffic. So in a sense, your website is practically invisible.
Despite efforts, many web pages never get indexed by Google, especially those belonging to large websites.
Usually, indexing issues come down to the following factors:
- Robots.txt files
- Crawl budget
- Noindex tags
- Rogue canonicals
- Nofollow internal links
- Crawl budget
- A lack of high-quality backlinks
- Value and quality of content on a web page
Although many SEO professionals believe that the more technical aspects of a website prevent Google from indexing, that is a myth. While poor technical signals could affect indexing, it’s just as important to have valuable, high-quality content on your web pages.
We’re going to address some of the more common indexing issues and how to mitigate them to improve your website’s indexability.
Crawling and Indexability: What are they?
Before we delve into how to speed up your website’s indexation, it’s essential to understand how search engines crawl and index your site.
Google crawls websites with a web spider called a Googlebot, which travels through your website via links, reads and processes the data it finds into a collective index.
The crawling software allows Google to compile more than 1 million gigabytes of information in a fraction of a second and organise this information into search by what it perceives to be most relevant.
Google orders search results in order of relevance due to the millions of pages on the web that contain similar information, hence the ranking algorithm. Don’t get ranking and indexing confused. The former refers to having a position on the index, whereas the latter focuses on your positioning in search engine result pages (SERPs).
Still, you can’t generate any visibility if you fail to show up on SERPs in the first place.
How to Check if Your Site is Indexed on Google
There are several ways to check if your site and web pages are indexed with Google.
You could use either the “site:” or “info:” search operators to check if you’ve been indexed or use Google Search Console (GSC) to inspect your URLs.
The site: search indexing method will roughly indicate how many of your web pages Google has indexed. However, Google Search Console provides a more comprehensive overview of the status of your website in Google, including information on:
- Mobile usability
- Web page specification, e.g. is it a product, page or post
- Review snippets
- Sitelinks search box
You can also use Google Search Console to assess the performance of your web pages in search, identify the number of valid pages (with and without warnings) and establish the number of URLs that are good, poor or need improvement.
The overview tab will provide you with a quick rundown about the number of valid URLs. As long as this number is not zero, this means Google has indexed at least one of your pages.
If you want to inspect your URLs more closely, use the URL Inspection Tool. If a page is indexed with Google, a green tick will appear in the first box with text reading “URL is on Google.”
Why Isn’t Google Indexing My Web Pages?
There could be several reasons why Google isn’t indexing your web pages. However, we’re going to cover the most common reasons that GSC reports statuses for unindexed pages.
While the information provided by GSC may not help you address the issue, it’s an excellent place to start your diagnosis.
#1 “Crawled – Currently Not Indexed”
The “Crawled – Currently Not Indexed” status shows that Google has crawled the page but decided not to include it in its index.
COVID-19 has caused an e-commerce boom, and as a result, Google has become pickier when it comes to the quality of web pages. So if you notice that your URLs have been assigned a “Crawled – Currently Not Indexed” status, your next step should be to consider reasons why Google decided against indexing your page.
Have you got the correct canonical tags in place? Is there duplication? Is a noindex tag assigned or a crawl block in your robots.txt file?
#2 “Discovered – Currently Not Indexed”
If Google Search Console has given your URL “Discovered – Currently Not Indexed” status, this means that Google knows about your URL but hasn’t crawled the page yet.
Our readers with small websites (under 10,000) pages will probably find that the issue resolves itself after Google performs another crawl. However, small websites that keep receiving this status and larger sites should act immediately, as this is a massive problem.
Google may report pages as “Discovered – Currently Not Indexed” due to the following:
- An overloaded server
- Content overload
- Poor-quality content
- Poor internal linking status
Identify whether patterns are occurring between pages that fall into this category to help you find a solution to the issue.
It might be that you have an insufficient crawl budget, which will prevent Googlebot from crawling your website efficiently. On the other hand, suppose you don’t optimise your crawl budget. In that case, crawling spiders could end up spending a sufficient amount of time on web pages that don’t matter, potentially resulting in important parts of your website going undiscovered.
If your crawl budget is wasted on unimportant pages, this will hurt your overall SEO strategy. To prevent this problem from arising, optimise your crawl budget and ensure you have a solid internal linking structure that will allow a Googlebot to navigate your website and index the pages that matter efficiently.
#3 “Duplicate Content”
Duplicate content can harm your website’s SEO and prevent search engines indexing your web pages.
Note that duplicate content can be caused by various factors irrespective of whether it was maliciously or non-maliciously created. Non-malicious duplication could include:
- Content found via discussion forums
- Items in an online store that are linked to by multiple distinct URLs
- Printer-only versions of web pages
- Language variations, e.g. having several versions of the same pages in different languages to target an international audience
Malicious forms of duplication may include:
- Using content from competitor websites, e.g. manufacturer product descriptions
- Copying content to drive traffic numbers higher
- Using duplicate content to mislead search engines and users
Marketers and web administrators should avoid using duplicate content to prevent damaging their SEO and improve the chances of having their web pages indexed. Instead, create unique content and use 301 redirects to repair duplicate content smartly to provide value to users.
How to Get Indexed by Google
Organic search traffic is fundamental to business growth, so the faster your site gets indexed, the more time you have to increase visibility, build your audience, and drive conversions.
Going to Google Search Console (GSC) as your first port of call is good practice when publishing a new post, page or product on your site, as you’re effectively telling Google to go and take a look at the web page.
However, requesting indexing will not fix underlying problems with older pages. If you’ve come across invalid pages in GSC and are unsure about where to start, we’ve provided a checklist below to help you diagnose and resolve the issue:
- Remove crawl blocks in your robots.txt file
- Include the page in your sitemap
- Remove rogue canonical tags
- Add “powerful” internal links
- Build high-quality backlinks
#1 Check Whether There is a Crawl Block in Your Robots.txt File
If you’ve published a new page or site and Google hasn’t indexed it, there might be a crawl block in your robots.txt file.
To check if a robots.txt file is preventing your web pages or website from being indexed, you can check the problem by visiting yourdomain.com/robots.txt.
If a crawl block is in place, one of the two snippets will appear:
1 User-agent: Googlebot
2 Disallow: /
1 User-agent: *
2 Disallow: /
Essentially, these snippets tell Google that they’re not allowed to crawl your site or a specific page on your website. Once you remove them, the crawlers will have free access to your site.
GSC will also tell you if a specific page is being blocked by a robots.txt file in the “Coverage” block of the tool.
#2 Add Web Pages to Your Sitemap
Sitemaps are critical for healthy indexing as they inform Google about the pages on your website and offer some guidance on how often your site should be re-crawled.
Although Google should discover your pages irrespective of whether they are in your sitemap or not, it’s a helpful way for a Googlebot to find them.
To check if a page is in your sitemap, you could either visit yourdomain/sitemap.xml or use the URL inspection tool on Google Search Console. If the URL is not indexed due to a sitemap issue, GSC will attach a “Sitemap: N/A” status.
#3 Remove Incorrect Canonical Tags
A canonical tag is a snippet of HTML code that allows Google to determine the preferred version of a web page from a set of duplicate pages on your site.
Most pages either have no canonical tag or a self-referencing canonical URL, which is an excellent defensive SEO move, ensuring that multiple versions of the same page don’t get indexed separately.
If your website has a rogue canonical link, you might be telling Google about a preferred version of a web page that doesn’t exist, resulting in the page you want to get indexed going undiscovered.
You can check whether a URL has a canonical in place by going to the URL Inspection Tool on GSC.
#4 Optimise Your Crawl Budget by Removing Low-Quality Pages
A sufficient number of low-quality pages will waste your crawl budget and run the risk of important pages going undiscovered.
Although Google says that “crawl budget isn’t something most web administrators need to worry about”, if you have hundreds or thousands of pages that miss the mark on what they set out to achieve, crawlers might never get round to some of your web pages.
You could optimise your crawl budget by:
- Improving page loading speed
- Putting a crawl block on non-canonicalised web pages
- Monitoring your crawl limit in GSC
- Repurposing stale content
- Reducing redirect chains
While the crawl budget rarely affects smaller websites, it’s still good practice to optimise.
#5 Build High-Quality Backlinks
Backlinks are a critical component of Google’s algorithm, and going to the trouble of building hundreds of high-quality backlinks will encourage Google to index your pages. After all, if another site refers users to your website, it must hold some value.
If you have a bunch of spammy or low-quality backlinks, Google might refuse to index your pages and, in some cases, penalise you. However, remember to submit your backlinks for indexing with Google, too; else, you would’ve wasted all that time spent link building.
Note that Google doesn’t just support websites with backlinks – there are still billions of indexed web pages without backlinks. Still, Google is more likely to crawl and re-crawl web pages that contain high-quality backlinks faster than those without – leading to faster indexing.