Maximizing Your Crawl Budget For Efficient Indexing
When it comes to making sure all of your important pages on your website are being indexed, there are a variety of things you can do to ensure that those pages are both crawled efficiently and frequently. One of the first and most important areas to focus on is identify Google’s crawl budget for your site and align various technical SEO elements to maximize your crawl budget.
But, what’s a crawl budget?
Your crawl budget is the number of pages Google will crawl on your site, each time Google’s spiders/robots crawl your site.
Sounds pretty simple right? Well, yes at a basic level but this is something that many SEO’s tend to overlook when in reality it should be a significant factor when determining how you instruct search engines to crawl your site and how you execute on a variety of technical SEO initiatives.
How To Determine Your Crawl Budget
Identify your crawl budget is much simpler than many would assume. All it takes is access to the Google Webmaster Tools (now Google Search Console) account for the website you are working on. To find your crawl budget all you need to do is navigate through Search Console with the following path:
One you are in the crawl stats report, take a look at the top graph and the metrics to the right of it. Google actually tells you how many pages on your website are crawled on a high, average and low day. The average number of pages crawled per day is your crawl budget.
Now that you’ve identified your crawl budget, how do you make the most of it to ensure your website is being crawled in entirety and efficiently so that your important pages are all indexed efficiently? Wonder no more, as there are several simple technical SEO strategies that you are probably already utilizing in some capacity, but you may not be aligning with your crawl budget to make the most of it.
Ways To Maximize Your Crawl Budget
Like I just said, there are several simple things you can be doing and we will review 7 simple ways to maximize your crawl budget.
Site Architecture & Internal Linking
The architecture of your site and its internal linking structure makes a significant impact on how search engines crawl your site. Structuring your site properly with a faceted navigation and categorical structure will influence search engines to crawl your website efficiently. Also, aligning your site architecture with your internal linking strategy will maximize the authority that is distributed throughout your site and increase the authority of deeper pages within your site.
Fix Broken Links & Redirect Chains
Broken links and redirect chains are two elements that can be remedied very easily, but could be a huge waste of your crawl budget if not addressed. Broken links are essentially links that lead to inaccessible pages (ie 404) and utilize crawl resources every time a crawler follows a broken link. Fixing broken links will ensure that you are not wasting any of your crawl budget. A website crawl and analysis will help you identify the broken links that you need to fix.
Also, redirect chains are a waste of your crawl budget because you are passing the crawler though multiple URLs before it reaches the final live URL that you want the spider to crawl. Website crawling software such as Screaming Frog allow you to crawl your site and identify redirect chains that you should fix to make the most of your crawl resources.
Page Speed
Page load times are an essential technical SEO element to optimize for a wide variety of reasons, one of which is to maximize your crawl budget. By making your pages load quicker, you are reducing the amount of time that Google takes to crawl your page, thus allowing you to get more out of the resources that Google assigns to crawling your website. A variety of tools can be used to measure your page load speeds and identify areas of improvement such as Google’s PageSpeed Insights, GT Metrix, Pingdom and Web Page Test.
Remove Duplicate Pages
Duplicate pages can result from a variety of reasons but can be a huge waste of your crawl resources. Even if the page has a canonical tag on it to signify it is a duplicate, this URL will still be crawled and use some of your crawl budget. It’s best to remove as many instances of duplicate pages as possible to ensure that you are not wasting resources on duplicate pages to ensure that the main version of the page is crawled and indexed efficiently.
Sitemaps
Sitemaps can come in the form of HTML and XML sitemaps, both of which have value to optimize. The HTML sitemap will help search engines crawl through the important pages on the site because crawlers will follow the internal links within the HTML sitemap. Make sure all of the important pages on your site are contained within the HTML sitemap and that duplicate pages are not listed in this.
XML sitemaps tell search engines which pages to crawl and index based on the priority of the page compared to other pages on the site. To optimize your XML sitemap for the best use of your crawl budget, make sure you do the following:
- Include the most important pages on your site and assign them higher priorities.
- Remove duplicate URLs & URLS that have no value in being indexed.
- Do not include URLs of any of the directories that are disallowed in your robots.txt file.
- Do not include any URLs that have “noindex “or “nofollow” tags.
Robots.txt Optimization & Server Log Analysis
A robots.txt file provides rules for search engines in terms of what directories the spiders/crawlers should crawl and which directories they are disallowed from crawling. At a basic level, you should be blocking all directories that you do not want crawled or indexed (ie admin backend / CMS pages) so that you are not wasting crawl resources on the directories that should not be crawled.
In more advanced situations, server log analysis can be performed to better understand which pages and/or directories are being crawled more frequently than they should be. This requires access to your server logs and the knowledge to understand them, but aligning your robots.txt with areas you identify are wasting crawl resources (via analyzing your server logs) will allow you to achieve much more efficient crawling. A very simple example of this could be that analyze your server logs and find that your Privacy Policy is being crawled more often than your important marketing pages because the privacy policy is linked internally throughout your entire site with a footer link. Identifying the wasted resources through log analysis and blocking that page (which has no value to be indexed) will ensure that those resources are used on more important pages.
Build Authority To Your Website
While not as much of a technical SEO strategy, building the authority of your site is essential to maintaining and growing your crawl budget. The crawl budget for your site is determined by the authority of your site (previously page rank) so increasing the authority of your site will prevent you from losing crawl budget and can actually increase the crawl budget for your site since search engines such as Google will view your site as more authoritative and therefore crawl it more frequently.
Wrapping Up
To wrap up, the crawl budget for your site is the number of times a search engines such as Google crawls your site when it comes to your site. Making the most efficient use of these resources will allow you to have the important pages on your site crawled more frequently and efficiently, ultimately increasing the number of pages you have indexed and the efficiency in which these pages are indexed. Simple ways to do this are:
- Develop an efficient site architecture and internal linking structure
- Fix broken links and redirect chains
- Improve your page load speeds
- Remove or minimize duplicate pages from on site
- Optimize your sitemaps properly
- Optimize your robots.txt in alignment with your XML sitemaps and server log analysis findings
- Build more authority to your site
That said, what other strategies have you deployed to make the most of your crawl budget? Drop a comment below and let us know your thoughts!