Robots are processes on computers on the Internet that methodically surf the web in order to compile information about the web, typically an index of web pages. An example of a robot is "Scooter" which surfs the web in order to compile the AltaVista index.
Robots are constantly scanning the Internet updating their indexes. Every so often they revisit the pages they have already indexed to see if they have changed, and to update their indexes if they have.
Generally the robots don't need much encouragement to visit your web. So long as there's a link to your web on a page that's already been indexed, the robots will eventually find, traverse, and index your web. However, if you want to be sure, you can submit your home page URL to one or more robots (webcrawlers). See the Launching A Web page for more information.
When a robot indexes a page in your web, it typically indexes the page by the words that appear on the page (keywords). It may also record the first few lines of text on the page as a sample of page content to be presented to users in search results.
Usually the robots do a fairly good job of indexing each page. However, if it's important that a particular page (e.g. your home page) be indexed comprehensively, or that its description in search results be readable, then you may wish to assist robots by explicitly adding keywords and a description at the top of the page.
To do this, add the following META tags in the HEAD section of your page. The two tags can be specified independently, but if you're going to bother to add one, you might as well add both. Here's an example for a fish and chips shop:
<META NAME="keywords" CONTENT="fish,chips,seafood"> <META NAME="description" CONTENT="The finest fresh fish and chips.">
The description doesn't have to provide a "title" for the page as this is provided by the page's TITLE contruct. The keywords and description can go on for multiple lines. I find with the keywords, it's a good idea to place each group of synonyms on the same line. Don't forget to put commas between the keywords!
<META NAME="keywords"
CONTENT="fish, garfish, bream, salmon,
chips, potato, fried, grilled,
seafood, ocean, food">
<META NAME="description"
CONTENT="Come to Captain Pegleg's tasty fish and
chips shop just of Eastern beach. We always
have the freshest fish.">
At least one robot (AltaVista) won't store more than 1024 bytes of information about your web, so don't go overboard with the keywords and description.
Usually its very desirable for robots to index your web as this will likely increase the number of people visiting your web. However, if, for whatever reason, you don't want some or all of your web indexed, you can give a hint to the robots not to index it. Nothing will stop them from indexing a page if they are determined to (or if they ignore the hints you provide) but at least you can try. There are two ways to tell robots that pages should not be indexed.
The first way is to add a META tag in the HEAD section of each page that you don't want indexed. The CONTENT of the tag is a comma-separated list of one or more terms. Here are the terms available:
So, to indicate that robots should neither index a page, nor follow the links it contains, you would add:
<META NAME="ROBOTS" CONTENT="NONE">
To indicate that robots can index the page, but shouldn't follow links on the page, you would add:
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
These tags should appear in the HEAD section of the page concerned. If you don't specify either INDEX or NOINDEX, the default is INDEX. If you don't specify either FOLLOW or NOFOLLOW, the default is FOLLOW.
The other way to indicate that robots shouldn't index your web is to create a file called robots.txt in the root directory of your webserver's web. This is the most effective way to exclude robots; the per-page tags described above are really a last resort. For more information on the format of robots.txt files, see the references below.