Blocking Search Engines with Robot Files and Tags

Posted by EssexMax on 25 January 2012 under Search Engine | Be the First to Comment

If you want to tell search engines what to do, or what not to do, when reviewing your site, there are three tools, which we’ll cover in a little detail:

  • Robots.txt
  • The Robots Meta Tag
  • The Rel=nofollow Tag

The robots.txt file

This is a plain text file that you insert in the root of your website. In this text file, you give instructions to the various search engines that may come sniffing around your website. Typically, you can use a robots.txt to “Disallow” crawling and indexing.

Let’s say we didn’t want Google to index three things: our “contact us” page (as it has an email address we don’t want spammed), our “privacy info page” (as it’s boring and we don’t want to try to rank for this), and a folder of “work in progress” pages (under development and not ready to be shared or indexed yet). In the robots.txt file, we’d add:

User-agent: *
Disallow: /contact.html
Disallow: /privacy.html
Disallow: /work-in-progress/

Once the robots.txt file has been uploaded, search engines such as Google should obey the command to ignore those two pages, and the work-in-progress folder.

Robots Meta Tag

This is a Meta tag that lives in the <head> section of a web page. Here is an example of the Robots tag in situ:

Robots Meta Tag

There are two parts to the tag, separated by a comma:

  • Index / Noindex: This tells the search engine whether they should index the page, or keep it out of their search results
  • Follow / Nofollow: This tells the search engine whether they are allowed to go to all of the links on your page, or not visit any of the links it finds on the page

The Robots meta tag option is a per-page option. Unless you use this tag, it’s assumed that any page is “index” and “follow”, so there’s no read need to add “index,follow” to a page.

Rel = Nofollow

There may be times when you have a webpage that contains links, but you don’t want a search engine to follow some of the links. Again, this could be a “Contact Us” page, or a link to competitor or advertiser that you’re not keen on Google to go to. For this, you can use the rel=”nofollow” tag. This sits in the code for the link, and looks something like this:

The Rel Nofollow Tag

Hopefully, that explains the basics of the three ways that you can instruct search engines such as Google and Bing on what parts of your site they’re ok to visit, and which pages, folders and links they should stay away from.

Any questions? Please add a comment below.

Add A Comment