Introduction to robots.txt
Robots.txt is a text file located in the root directory of your web site written to instruct search engine robots and spiders where they are allowed to crawl. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code.
How to create robots.txt file?
If a site owner wishes to give instructions to web robots he must place a text file called robots.txt to the root of the web site hierarchy (e.g. www.website.com/robots.txt). You can create the robots.txt file manually, using any text editor. It should be an ASCII-encoded text file, not an HTML file and the filename should be lowercase.
Include the robots.txt file in your server's root directory. This is standard web management practice. It must be in the main directory because otherwise user agents (search engines) will not be able to find it - they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. This text file should contain the instructions in a specific format The structure of a robots.txt is pretty simple:
# Comments appear after the "#" symbol at the start of a line, or after a directive
# this example allows all robots to visit all files
User-agent: *
Disallow:
# exclude all robots from part of the server
User-agent: *
Disallow: /scripts/
Disallow: /images/
Disallow: /admin/
# Example that tells all crawlers not to enter one specific file
User-agent: *
Disallow: /dir/file.html
Disallow: /dir/file2.html
# allow google image bot to search all images
User-agent: Googlebot-Image
Allow: /*
# Block all images on your site from Google image search:
User-agent: Googlebot-Image
Disallow: /
# To remove a specific image from Google Images
User-agent: Googlebot-Image
Disallow: /images/image.jpg
# To remove a specific file type from Google Images (for example, .gif)
User-agent: Googlebot
Disallow: /*.gif$
# disallow WayBack archiving site
User-agent: ia_archiver
Disallow: /
# disallow all files with ? in url
User-agent: *
Disallow: Disallow: /*?*
# Sitemap
Sitemap: http://www.domain.com/sitemap.xml
All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders your web site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.
- Author:
PromoteClick.com - Online advertising solutions.
SEO: Search Engine Optimization Bible
Amazon Price: $26.39 (as of 12/31/2009) ![]()
Usually ships in 24 hours
Professional Search Engine Optimization with PHP: A Developer's Guide to SEO
Amazon Price: $29.19 (as of 12/31/2009) ![]()
Usually ships in 24 hours
Robots.txt Optimization for WordPress
Specifying where search engines should look for content in high-quality directories or files you can increase the ranking of your site, and is recommended by Google and all the search engines. An example WordPress robots.txt file:User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /
# digg mirror
User-agent: duggmirror
Disallow: /
Sitemap: http://www.example.com/sitemap.xml
Search Engine Optimization (SEO) How to Optimize Your Website for Internet Search Engines (Google, Yahoo!, MSN Live, AOL, Ask, AltaVista, FAST, GigaBlast, Snap, LookSmart and more)
Amazon Price: $35.99 (as of 12/31/2009) ![]()
Usually ships in 24 hours
Are You Ready to Completely Dominate Google?SEO Elite 4.0 is the latest version of the powerful search engine optimization program created by Brad Callen. SEO Elite is software created for internet marketers who want to be in top 10 position on Google, MSN and YAHOO! This software saves hours of time for analyze competition and link building strategies. If you want to be beter from your competition in the rankings, you must get this software!
This guide is easy to understand and gets straight to the point. It will take you step by step through the advance techniques of search engine optimization that only the gurus and the most successful webmasters took years to master.
If you want to be beter from your competition in the rankings, you must get this software. Get SEO Elite and increase your Google rankings as fast an humanly possible!
Here's my favorite link:
Discover The #1 Secret To Slapping GoogleHow many programs do we need to try before we can really get on and make some money? Discover The #1 secret to slapping google, taking the top rankings, and generating up to $1,384 per day on total autopilot
Watch the video below to learn how you can do this too within 15 minutes from now.
Watch the Video ยป
How to Submit website to Google, Alexa and MSN
Online advertising solutions for your online busin more...1 point
How Do I Achieve Good SEO?
Strategies and tactics undertaken to improve web p more...0 points
Search Engine Optimization (SEO), News & Articles
Search Engine Optimization (SEO)Optimising a websi more...0 points
How to Submit Blog to Google and Technorati Blog Directory
Learn How to Submit Blog to Google and Technorati more...0 points
Using Keywords to Achieve Search Engine Optimization
A keyword is a word that forms all or part of a se more...0 points






