Robots.txt file directives

In this post we will delve more in robot.txt file. We have talked about major aspects of robots.txt file in all about robots post. Here we will know more about Robots.txt file directives.

We have covered what robots file directives, but we will see it here in more details. We will talk about each directive, respectively.

 


User-agent Directive

 

It is the start directive in every robots.txt file. It is the directive that specifies information for search bots what and what not to crawl.

 

User-agent: *

All search spiders will be considered.

 

User-agent: msnbot

 

MSN search bot only.

 


Allow Directive

 

Something interested to bear in mind; this directive is applicable only for Google Bot. Yes, this directive was first introduced by Google, and it is understood by Google bot only.

The purpose of this directive is to tell Google bot what is allowed for it to crawl and index in your website. Ok, good, what about other search bots? Good question.

Other search bots will start crawling everything you did not specify under Disallow directive. Let us go and discover Disallow directive.

 


Disallow Directive

 

This directive will tell search bots not to crawl the specified URL. One URL is allowed for one Disallow. If you want to Disallow more URLs in your website, you have to specify this with a new Disallow in  a new line, like snippet below:

Disallow: /includes/

Disallow: /misc/

Disallow: /modules/

Disallow: /profiles/

 

The above example is taken from our website robots.txt file. Notice that the above robots file, disallowed folders, and its contents. This is an example about URLs; URL can be internet URL to specific pages like login page for example or some folders that you do not want web crawlers to crawl.

 


Crawl-delay Directive

 

This directive tells search bots, or web crawlers, how many time they should wait before loading and crawling a given page contents.

Anyways, not all search bot support Crawl-delay directive. Like, Google! Yes, you guessed it. Google has its own delay mechanism. Google depends on something called crawl-rate that defines the crawling rate or delay for Google bot.

Ultimately, Search bots are crawl-hungry, and this will put a lot of traffic on your website. The good news is that these search bots will obey the Crawl-delay directive; they will wait for the period specified. Look at the example below:

Crawl-delay: 10

 

Search bots will wait for 10 seconds before crawling your website, and you could save some bandwidth.


Sitemap Directive

 

With this directive, you can tell search engines – specifically Bing, Yandex and Google – where to find your XML sitemap. You can, of course, also submit your XML sitemaps to each search engine using their respective webmaster tools solutions, and it is highly recommended to do this, because search engine webmaster tools programs will give you lots of valuable information about your site. If you don’t want to do that, you can add sitemap line to your robots.txt and that is fair enough.

 


Test your robots.txt file

 

We strongly recommend testing your robots.txt file after any changes you made, and before uploading it to your root and submitting it to Google. One of the most trusted tools is the Google Search Console, specifically, Google robot testing tool, as stated in All about robots.txt post.

 

Robots.txt file directives

 


More Reading

 

 


If you like it share it

 
Facebook Twitter Google+