- How to perform a site crawl on Screaming Frog
- Using Screaming Frog to analyse key areas of your website
- How to create custom configurations in Screaming Frog
- Customising robots.txt files and locating orphan pages
If you are new to search engine optimisation (SEO) and are looking for tools to help search engine bots crawl your website, then this post is for you!
We regularly use Screaming Frog at Soar Online to support our technical SEO efforts.
Screaming Frog is a website crawler that allows you to perform a full technical audit and an inquisitive check of your website. Thanks to its versatility, you can use Screaming Frog for other uses, including:
- Competitor analysis
- Verifying schema markup
Various SEO experts use Screaming Frog. However, this tool can be a challenge if you’re new to the software. We’re going to explore a few of our favourite Screaming Frog features and detail how you could use them to get the most out of this SEO Spider tool.
But first, let’s break down the user interface.
Getting Started with Screaming Frog
After installing Screaming Frog, we advise familiarising yourself with the menu, options and settings to help you navigate the platform.
The first control element is the “File” option, which will house your last six crawls and allow you to set default settings for the software.
As the name suggests, this option allows you to export multiple URLs.
The “Reports” control element creates downloadable crawl overviews and data reports.
You can construct a website sitemap in this control element. Screaming Frog offers various sitemap options, so it’s great for large websites with a complicated site structure.
Screaming Frog has two visualisation types – a directory tree visualisation and crawl visualisation.
While visualisations don’t offer the best way to diagnose issues, they can help provide perspective and highlight underlying patterns in data that may be difficult to identify using traditional reporting methods.
How to Crawl Your Site
When performing a crawl in Screaming Frog, the software defaults to Spider mode and conducts the audit according to the configurations and filters created by yourself.
You perform an audit by entering the URL into the search bar near the top of the software and clicking “Start”. Alternatively, you could upload your sitemap by changing the “Mode” to “List”, which will instruct the platform to crawl links contained within this blueprint of the website. (Soar will perform a basic SEO audit on your site completely free, click here to order one today
Now that we’ve covered how to get started with Screaming Frog let’s delve into how you can use the software to support SEO tasks.
You can use Screaming Frog to identify low-quality backlinks on your website, which you might need to disavow.
Knowing the quality of your backlinks is extremely important, especially if you’re comparing your website performance to competitors and assessing a new client to see how the website is faring.
As of the Screaming Frog 8.0 update, you can now integrate other analytical tools such as Ahrefs, Google Analytics, and Majestic with the SEO spider software. All you’ll need is your API code to connect the accounts.
Using the Majestic SEO tool as an example, which has a powerful backlink tracking ability, we’re going to show you how to use it in conjunction with Screaming Frog.
After connecting the two accounts and performing a site audit, Screaming Frog will return its usual data, and link metrics gathered from Majestic, which will show in the “Link Metrics” tab.
If you’re analysing your own data, you can use backlink metrics to assess your performance against competitors, paying particular attention to the differences in engagement levels across your top-performing content.
You can also look deeper into your internal and external links to see how they’re being used, where they link to and if page authority is being passed down to lower-level pages in your sitemap.
If your images aren’t correctly optimised, this could lead to a slow page loading speed when users try to access your web pages.
Use Screaming Frog to determine your image sizes and identify any that may slow response times. You can also use Screaming Frog to review ALT text and image display issues for whatever reason.
To find data on your images, go to the “Images” tab, where you can filter by size or other factors to go through the ones that may be causing issues on your site and optimise accordingly.
Custom Extractions with Regex, CSS and XPath
Screaming Frog scrapes essential information about your website by default. However, if you’re looking for something specific, there are two features you can use to conduct an advanced site crawl and analysis: Custom Searches and Custom Extractions.
You can source the Custom Search feature in the Configuration element in the menu. It will allow you to find a preferred line of text within the source code of your web pages. For example, suppose you own an eCommerce website; this feature could help you identify which of your products are “Out of Stock” and whether the web page is still needed or needs to be removed.
On the other hand, the Custom Extractions feature, which is located under the same Configuration element, collects data from the HTML source via three paths:
XPath, an abbreviation of XML Path Language, extracts HTML elements of a web page, meaning any information contained in a div, span, p and heading tag.
Google Chrome has also made it easier to export XPath. Right-click on the code within the Inspection tool and go to Copy XPath. While you might need to tweak the syntax, you can paste the data into Screaming Frog to perform the extraction.
You can also scrape data from your site using a CSS path, which uses patterns to select elements, with the option of adding an attribute field. This is probably the quickest option for extracting data out of the three methods.
Regex is a unique string of text used for defining data patterns. You can also use Regex to extract schema markup in JSON-LSD format and tracking scripts; however, as it is pretty complex, it’s best suited for more advanced users.
Creating Custom Configurations
Screaming Frog’s 8.0 update now includes a Custom Configurations feature.
If you want to scrape specific information, you might need to set custom configurations to perform an audit in your preferred way. Before the latest update, you would have to add your custom configuration settings each time you wanted to switch between crawls on different sites.
Now you can save your custom configuration profiles in Screaming Frog. All you have to do is go to File > Configuration > Save As.
You can save an unlimited number of these and even share them with other users – useful if other people in your team need to access these settings.
Customising Robots.txt Files
Robots.txt files is a text file web admins created to instruct search bots how to crawl URLs, specifically, which links it should and should not crawl.
The robots.txt file plays a significant role in the overall management of a website. For example, failure to properly manage disallow entries, which is the rule that tells crawl bots not to visit a specific URL, could prevent critical sections of the site from being crawled.
Screaming Frog allows you to run a site audit and ignore the robots.txt file, to help you identify whether key content is being blocked from crawls and take appropriate action. It’s also a helpful feature to utilise when setting up a new site or performing a site migration.
To set this option, go to Configuration > Robots.txt > Settings.
You can also customise and create your own rules for the current robots.txt file of the website within the “Custom” tab of the menu element mentioned above to determine how changes to the file could affect your site.
Running a site audit with the new rule in place will also give you an idea of how search engines would crawl your web pages.
Sourcing Orphan Pages in Sitemap
An orphan page is a page that search bots cannot find via your internal linking structure, meaning users will also have difficulty accessing these web pages.
Orphan pages can occur for several reasons, including:
- Old pages unlinked but left as published
- Site architecture issues
- CMS creating additional URLs as part of page templates
- Pages that no longer exist but are being linked to via another website
While a small number of orphan pages isn’t a huge problem, it’s important to create a solid internal linking structure to help Google understand and rank your website better.
It’s incredibly significant for your top-level pages as the more links a page receives, the more important it appears to Google. However, to discover orphan pages in Screaming Frog, additional URL sources are required from sitemaps and other web tools such as Google Search Console or Google Analytics.
Once you’re set, start by performing a website crawl of your site and sitemap, then head to the “Internal” tab and filter the URLs by HTML for export. Make sure you create separate files for URLs found in your site crawl and the sitemap, then export both files into different tabs within a Google Sheets document and remove any duplicates.
Alternatively, suppose you’ve previously configured a “Crawl Analysis”. In that case, the right-hand pane will show an overview of URLs that require attention post-crawl, including Orphan URLs, which you can filter under the Sitemap, Search Console and Analytics tabs to view.