Dhtml Menu, (c)2004 Apycom
 
 

First Webs, Inc.

Rockford; Chicago; the World

 

What is a robots.txt file?

 

A robots.txt file is a small text file that tells search engine spiders, robots, and crawlers which pages they should include when they crawl (or index) a web site.  A robots.txt file is placed in the root directory of a web site (most often called the public directory, the htdocs directory, or the www directory).

A robots.txt file is used by webmasters and Internet marketers to advise which directories, or files to EXCLUDE from being indexed by search engines.  By default everything the search engine can read is normally indexed, so a concious effort must be made to tell search engines what NOT TO INCLUDE!

A robots.txt file is generated in a pure text editor, and is uploaded to the root directory of a site via FTP in ASCII mode.  There are many robots.txt file generators on the Internet.  In this author's view, the robots.txt file is so simple (once understood) that a file generator is simply not needed.

Care must be taken to set up the robots.txt file correctly.  There are 2 common statements in a robots.txt follow - namely the "User-agent" and "Disallow" statements.   The "Disallow" is the exclusion statement. 

Common illustrations follow below:   

The "User-agent"  statement can have different values indicated by the  "wild card" character " * " (asterisk). 

* This statement specifies the User-agent for Google:
User-agent: googlebot

* This statement specifies the User-agent for all robots:
User-agent: *

* This Disallow specifies not to access the page  /private-stuff.html  in the root directory:
Disallow: /private-stuff.html

* This Disallow specifies not to access the entire directory   /images/:
Disallow: /images/

*  This Disallow specifies not to access the entire site.  Clearly you will want to make sure NOT to use this Disallow, or your site will likely never be indexed.
Disallow: /

There is a lot of good information on the treatment of robots.txt files on the Internet.  Refer to these sites:

http://www.webmasterworld.com/forum93/
http://en.wikipedia.org/wiki/Robots.txt

http://www.robotstxt.org/wc/robots.html
http://www.searchengineworld.com/robots/robots_tutorial.htm

Hint #1:  Just because you request exclusion does not necessarily mean exclusion is guaranteed.  A web site is a public document.  Files excluded by the robots.txt instructions can still be viewed by typing the complete address in the browser.

Hint #2:  To see if a site has a robots.txt file in place, in your browser type the domain name followed by /robots.txt.  For example,  http://www.yourdomain.com/robots.txt.

Hint #3:  To validate a robots.txt file, go to http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

 

129 So. Phelps Ave., Ste 908

Rockford, IL 61108

815.332.8062

vwwanner@nilsem.com

Copyright 2005-2015

First Webs Resource Site

VernonWanner.com

All Rights Reserved

View Principal's Profile on Linked In