All about Robots.txt file

In this article we will discover an important file in your website, it is the file which will instruct search engines how to work, and will tell them what and what not to read in your website. It is the robot.txt file that exists in every website. It is all about robots.txt file.

What is robot file?

 

We are in the age of robotics and Artificial intelligence era. But wait a minute, robot file is not a robotic, it is a text file; plain text, it is that simple, which exists in the root of website, and instructs web robots (typically, search engines) how to crawl your pages in your website. This file will tell web robtos how to interact with your pages; either to crawl and index a given page or just skip it. Keep reading what this means in practical.

 


What is Robot.txt file practically?

 

Ultimately, your site, or any website, contains folders, that is in turn, contains images, pages, banners, etc. A given website has it is own hierarchy or structure.

And? Ok, search engines web robots or crawlers will start crawling all your websites contents; But How?, Good question, it starts by reading your robot.txt file. What? Yes it is. This simple, plain text file is the starter point. It works like a master for web robots, or a police man who tell crawlers what and what not to read in your website.

This important file exists under the root folder of your page, and you, or your team can create it, or you can instruct some softwares to create it for you. e.g. Attracta.

Now, we have understood what robots.txt file is, let’s know more about its structure.

 


Robot.txt File structure

 

A complete robot.txt file can contain two lines only, though robot file could contains multiple lines, as we will see later.

User-agent: [user-agent name]

Disallow: [URL string not to be crawled]

 

Other files can contain multiple lines of Allow/Disallow statements, as shown in the figure below:

All about Robot.txt File

The above robots file, is the file of index.com page. Looking at the first line User-agent: *, this means that the Allow/Disallow statements are the same for all web robots or web-crawlers; what are web-crawling bots anyways?

 


What is web-crawling bot?

 

As defined by Wikipedia, it is software that collects documents from the web to build a searchable index. This web-crawler starts by reading your robots.txt file, which is exist in the root of your website. A full list of user agent bots can be found here.

Let’s get back to our subject. In your robots.txt you can specify different crawling criteria; Huh, what that means? Ok, in your robot.txt file, you can list the Allow/Disallow statements for each web-bot.

Look at the example below,

All about Robot.txt File

It explicitly Disallow some pages to get crawled by msnbot; msnbot is a web-crawler bot, that will read the above robots.txt file, and it will work according to what was allowed and disallowed for it; User-agent: msnbot, other web-crawlers will fall into the other category; User-agent: *. But wait, what are Allow/Disallow keywords?

 


Allow/Disallow in Robot.txt File

 

As stated above, web-crawlers will read robot.txt file, to know what is allowed for it to crawl and what not. Let us look at the below robot.txt snipped to understand what is going.

User-agent: *

Disallow: /

 

The above robot.txt file will block all web-crawlers from crawling all the contents of your website.

User-agent: *

Disallow:

 

The above robot.txt file will block all web-crawlers from crawling all the contents of your website. Note that we leave Disallow empty, we didn’t specify any page to Disallow, this will inform web-crawlers to crawl all your website contents.

 

User-agent: Googlebot

Disallow: /content/products-0/

 

The above robot.txt file will block Googlebot, a web-crawler bot, from crawling all the pages under /content/products-0/ folder in orcaes.co website.

 


Test Your Robots.txt File

 

When you are done with your robot.txt file, you can test it. Even if you are sure that your file is good and corrct, we recommend to test it. The testing tool was created by Google (and updated in July 2014) to allow webmasters to check their robots.txt file. To test your robots.txt file, you need to test using Google Webmaster Tools. You then simply select the site from the list and Google will return notes for you where it highlights any errors.

 


Summary

 

Search engines job in life is to crawl your website, and indexing it so that it can be displayed to searchers who are looking for your information in SERPs.

As soon as a given crawler reaches your website, it will first visit your robots.txt file. That it is, it is all about robots.txt file, the web crawler will start reading your robots.txt file and see what it is allowed and what is not for it to crawl.

In the coming post, we will see how robots.txt files affects your SEO, and how can you hack or optimize your robot file to get more traffic.

 


More SEO Reading

 

 


If you like it share it

 
Facebook Twitter Google+