Using Robots.txt to Control Search Engines
Ranked #3,281 in Internet, #185,177 overall
Introduction to robots.txt
Robots.txt is a text file located in the root directory of your web site written to instruct search engine robots and spiders where they are allowed to crawl. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code.
How to create robots.txt file?

If a site owner wishes to give instructions to web robots he must place a text file called robots.txt to the root of the web site hierarchy (e.g. www.website.com/robots.txt). You can create the robots.txt file manually, using any text editor or notepad. It should be an ASCII-encoded text file, not an HTML file and the filename should be lowercase.
Include the robots.txt file in your server's root directory. This is standard web management practice. It must be in the main directory because otherwise user agents (search engines) will not be able to find it - they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. This text file should contain the instructions in a specific format.
robots.txt will only work if it is in the top level directory of your web site
The structure of a robots.txt is pretty simple:
# this example allows all robots to visit all files
User-agent: *
Disallow:
# exclude all robots from part of the server
User-agent: *
Disallow: /scripts/
Disallow: /images/
Disallow: /admin/
# Example that tells all crawlers not to enter one specific file
User-agent: *
Disallow: /dir/file.html
Disallow: /dir/file2.html
# allow google image bot to search all images
User-agent: Googlebot-Image
Allow: /*
# Block all images on your site from Google image search:
User-agent: Googlebot-Image
Disallow: /
# To remove a specific image from Google Images
User-agent: Googlebot-Image
Disallow: /images/image.jpg
# To remove a specific file type from Google Images (for example, .gif)
User-agent: Googlebot
Disallow: /*.gif$
# disallow WayBack archiving site
User-agent: ia_archiver
Disallow: /
# disallow all files with ? in url
User-agent: *
Disallow: Disallow: /*?*
# Sitemap
Sitemap: http://www.domain.com/sitemap.xml
All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders your web site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.
Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.
Robots.txt Optimization for WordPress
Specifying where search engines should look for content in high-quality directories or files you can increase the ranking of your site, and is recommended by Google and all the search engines. An example WordPress robots.txt file:User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /
# digg mirror
User-agent: duggmirror
Disallow: /
Sitemap: http://www.example.com/sitemap.xml
The robots.txt files of big Web sites:
NY Times Robots.txt
User-agent: *Allow: /ads/public/
Disallow: /ads/
Disallow: /adx/bin/
Disallow: /aponline/
Disallow: /archives/
Disallow: /auth/
Disallow: /cnet/
Disallow: /college/
Disallow: /external/
Disallow: /financialtimes/
Disallow: /idg/
Disallow: /indexes/
Disallow: /library/
Disallow: /nytimes-partners/
Disallow: /packages/flash/multimedia/TEMPLATES/
Disallow: /pages/college/
Disallow: /paidcontent/
Disallow: /partners/
Disallow: /reuters/
Disallow: /thestreet/
User-agent: Mediapartners-Google
Disallow:
Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz
Sitemap: http://www.nytimes.com/sitemap_news.xml.gz
Spiegel Robots.txt
User-agent: *Disallow: /100year
Disallow: /15off
Disallow: /accept
Disallow: /approved
Disallow: /aspnet_client
Disallow: /bin
Disallow: /CheetahMailTag
Disallow: /common
Disallow: /config
Disallow: /controls
Disallow: /coremetrics
Disallow: /css
Disallow: /email
Disallow: /emailpromo
Disallow: /emailspecial
Disallow: /emi
Disallow: /fashion
Disallow: /forthehome
Disallow: /friends
Disallow: /js
Disallow: /member
Disallow: /mindware
Disallow: /moderntail
Disallow: /moderntails
Disallow: /nkreturns
Disallow: /normakamali
Disallow: /normakamalipromo
Disallow: /omniture
Disallow: /pop
Disallow: /reality
Disallow: /reports
Disallow: /request
Disallow: /search
Disallow: /stylist
Disallow: /together
Disallow: /trends
Disallow: /tv
Disallow: /ups
Disallow: /utilities
Disallow: /windows
Ebay Robots.txt
#
# allow-all
#
#
# The use of robots or other automated means to access the eBay site
# without the express permission of eBay is strictly prohibited.
# Notwithstanding the foregoing, eBay may permit automated access to
# access certain eBay pages but soley for the limited purpose of
# including content in publicly available search engines. Any other
# use of robots or failure to obey the robots exclusion standards set
# forth at /www.robotstxt.org/ wc/ exclusion.html> is strictly
# prohibited.
# v3
#
User-agent: *
Disallow: /help/confidence/
Disallow: /help/policies/
Disallow: /disney/
Disallow: *rt=nc
### END FILE ###
CNN Robots.txt
Sitemap: http://www.cnn.com/sitemap_index.xmlSitemap: http://www.cnn.com/sitemap_news.xml
Sitemap: http://www.cnn.com/video_sitemap_index.xml
User-agent: *
Disallow: /.element
Disallow: /editionssi
Disallow: /ads
Disallow: /aol
Disallow: /audio
Disallow: /audioselect
Disallow: /beta
Disallow: /browsers
Disallow: /cl
Disallow: /cnews
Disallow: /cnn_adspaces
Disallow: /cnnbeta
Disallow: /cnnintl_adspaces
Disallow: /development
Disallow: /NewsPass
Disallow: /NOKIA
Disallow: /partners
Disallow: /pipeline
Disallow: /pointroll
Disallow: /POLLSERVER
Disallow: /pr
Disallow: /PV
Disallow: /quickcast
Disallow: /Quickcast
Disallow: /QUICKNEWS
Disallow: /test
Disallow: /virtual
Disallow: /WEB-INF
Facebook Robots.txt
# contact us here: http://www.facebook.com/apps/site_scraping_tos.php
# to apply for white listing. Our general terms are available
# at http://www.facebook.com/apps/site_scraping_tos_terms.php
User-agent: baiduspider
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: Googlebot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: msnbot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: naverbot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: seznambot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: Slurp
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: teoma
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: twiceler
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: Yandex
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php
User-agent: *
Disallow: /
# E-mail sitemaps@lists.facebook.com if you are authorized to access these and are getting denied.
Sitemap: http://www.facebook.com/sitemap.php
Amazon Books
Geek T-Shirs
Add Your Comment
-
-
Coach Factory
Apr 23, 2012 @ 11:43 pm | delete
- Coach Factory with a light and elegant colors to wipe out rare easy Holiday Lane, simple and quiet moment is captured in the bag swing, the rectangle of the Baoshen to it a noble appearance also gave Coach Factory online enough practicality, simplicity the package surface of the car line in a solid color, clear coloring, overall lightweight feel on the fly in this modulation.The attractive golden people aspire, the chain of small beads into the package gently tied the possession of small suitcase full of secrets, the Coach Outlet and unique golden suitcase must be loaded with the highlight of the festival, with a rotating covered buttons instead of the previous design within the button, not only to avoid the prevalent custom may also play a decorative role, Stock your shining suitcase, you will inadvertently beautiful holiday considerably. Coach Outlet Online play in this series, the possibility of color, package of small coins was painted on a bright holiday colors, the use of the same color skin side wrapped to a small items a lot of aura, but also increased their sense of presence, retro buckle and the magic of color to build a classical luxurious texture, Coach Outlet Store Online is the indispensable element in the festival.The Burberry Outlet brand became more popular after the invention of gabardine, which, durable waterproof, breathable material by Thomas Burberry.Burberry shoes and Burberry Sale are part of the world-famous collection of high-class branded products. The brand can be moved to the brand image of the fashion level of quality and durability. Burberry Bags have absolutely no exception. Burberry Shoes are changed to be high fashion and durability, as they were originally intended for,standing severe climatic conditions.It is designed for adult men who know what they are difficult climatic conditions.Burberry Handbags mall re-launch in the autumn and winter coat is not a variant of styles, but all kinds of woolen coats. This is very practical, which within two years, take their products are somewhat thin, with the 2010 autumn and winter can be a practical Burberry Outlet online jacket is absolutely off the compassionate old design!
-
-
-
Coach Factory
Apr 23, 2012 @ 11:23 pm | delete
- Coach Factory with a light and elegant colors to wipe out rare easy Holiday Lane, simple and quiet moment is captured in the bag swing, the rectangle of the Baoshen to it a noble appearance also gave Coach Factory online enough practicality, simplicity the package surface of the car line in a solid color, clear coloring, overall lightweight feel on the fly in this modulation.The attractive golden people aspire, the chain of small beads into the package gently tied the possession of small suitcase full of secrets, the Coach Outlet and unique golden suitcase must be loaded with the highlight of the festival, with a rotating covered buttons instead of the previous design within the button, not only to avoid the prevalent custom may also play a decorative role, Stock your shining suitcase, you will inadvertently beautiful holiday considerably. Coach Outlet Online play in this series, the possibility of color, package of small coins was painted on a bright holiday colors, the use of the same color skin side wrapped to a small items a lot of aura, but also increased their sense of presence, retro buckle and the magic of color to build a classical luxurious texture, Coach Outlet Store Online is the indispensable element in the festival.The Burberry Outlet brand became more popular after the invention of gabardine, which, durable waterproof, breathable material by Thomas Burberry.Burberry shoes and Burberry Sale are part of the world-famous collection of high-class branded products. The brand can be moved to the brand image of the fashion level of quality and durability. Burberry Bags have absolutely no exception. Burberry Shoes are changed to be high fashion and durability, as they were originally intended for,standing severe climatic conditions.It is designed for adult men who know what they are difficult climatic conditions.Burberry Handbags mall re-launch in the autumn and winter coat is not a variant of styles, but all kinds of woolen coats. This is very practical, which within two years, take their products are somewhat thin, with the 2010 autumn and winter can be a practical Burberry Outlet online jacket is absolutely off the compassionate old design!
-
-
-
Smith D
May 31, 2011 @ 2:15 am | delete
- Putting robot.txt is always necessary for SEO. The syntaxes are given in a very nice way. I enjoyed the lens same like hcg diet info
-
-
-
John
Sep 30, 2010 @ 12:59 pm | delete
- Thanks for the useful info!
Buy Amoxicillin Online | Amoxicillin 500 mg | Amoxicillin Prescription
-
-
-
careergirl Apr 4, 2010 @ 9:50 pm | delete
- How do I fix the robots.txt to work with Technorati??
-
-
-
BudgetBath_Inc
Feb 22, 2010 @ 9:36 am | delete
- same here, technorati couldnt crawl my lens either... i wish someone had a way around it
-
-
-
reasonablerobinson
Feb 7, 2010 @ 12:34 pm | delete
- I've just been told that this file has stopped my Squidoo lens being crawled by Technorati. Can I put this right/.
-
Love This Lens?
This module only appears with actual data when viewed on a live lens. The favorite and lensroll options will appear on a live lens if the viewer is a member of Squidoo and logged in.
How to Submit website to Google, Alexa and MSN
Online advertising solutions for your online busin more...1 point
How Do I Achieve Good SEO?
Strategies and tactics undertaken to improve web p more...0 points
Search Engine Optimization (SEO), News & Articles
Search Engine Optimization (SEO)Optimising a websi more...0 points
How to Submit Blog to Google and Technorati Blog Directory
Learn How to Submit Blog to Google and Technorati more...0 points
Using Keywords to Achieve Search Engine Optimization
A keyword is a word that forms all or part of a se more...0 points
by vojin
Hi, I'm Vojin. My interests are in developing online business, programming, scifi movies and games. You can follow me on Twitter at
http://twitter.co...
more »
- 30 featured lenses
- Winner of 7 trophies!
- Top lens » How to Submit website to Google, Alexa and Bing
Explore related pages
- How to Submit website to Google, Alexa and Bing How to Submit website to Google, Alexa and Bing
- How to Rank High In Google - A Three Step SEO Strategy How to Rank High In Google - A Three Step SEO Strategy
- Google Search Engine Optimization Tips Google Search Engine Optimization Tips
- Traffic From Image Searches Traffic From Image Searches
- ★ How to Promote & Market Your Cafepress Shop | Bulk Add Products | Beginner's Guide - Part Four ★ ★ How to Promote & Market Your Cafepress Shop | Bulk Add Products | Beginner's Guide - Part Four ★
- Authoritative Links Guidelines | Search Engine Link Optimization | SEO | SEM Authoritative Links Guidelines | Search Engine Link Optimization | SEO | SEM