Using Robots.txt to Control Search Engines

Ranked #3,281 in Internet, #185,177 overall

Introduction to robots.txt

Robots.txt is a text file located in the root directory of your web site written to instruct search engine robots and spiders where they are allowed to crawl. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code.

How to create robots.txt file?

type=text

If a site owner wishes to give instructions to web robots he must place a text file called robots.txt to the root of the web site hierarchy (e.g. www.website.com/robots.txt). You can create the robots.txt file manually, using any text editor or notepad. It should be an ASCII-encoded text file, not an HTML file and the filename should be lowercase.

Include the robots.txt file in your server's root directory. This is standard web management practice. It must be in the main directory because otherwise user agents (search engines) will not be able to find it - they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. This text file should contain the instructions in a specific format.

robots.txt will only work if it is in the top level directory of your web site

The structure of a robots.txt is pretty simple:

# Comments appear after the "#" symbol at the start of a line, or after a directive

# this example allows all robots to visit all files
User-agent: *
Disallow:

# exclude all robots from part of the server
User-agent: *
Disallow: /scripts/
Disallow: /images/
Disallow: /admin/

# Example that tells all crawlers not to enter one specific file
User-agent: *
Disallow: /dir/file.html
Disallow: /dir/file2.html

# allow google image bot to search all images
User-agent: Googlebot-Image
Allow: /*

# Block all images on your site from Google image search:
User-agent: Googlebot-Image
Disallow: /

# To remove a specific image from Google Images
User-agent: Googlebot-Image
Disallow: /images/image.jpg

# To remove a specific file type from Google Images (for example, .gif)
User-agent: Googlebot
Disallow: /*.gif$

# disallow WayBack archiving site
User-agent: ia_archiver
Disallow: /

# disallow all files with ? in url
User-agent: *
Disallow: Disallow: /*?*

# Sitemap
Sitemap: http://www.domain.com/sitemap.xml

All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders your web site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.

Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.

Robots.txt Optimization for WordPress

type=textSpecifying where search engines should look for content in high-quality directories or files you can increase the ranking of your site, and is recommended by Google and all the search engines. An example WordPress robots.txt file:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

Sitemap: http://www.example.com/sitemap.xml

The robots.txt files of big Web sites:

NY Times Robots.txt

type=textUser-agent: *
Allow: /ads/public/
Disallow: /ads/
Disallow: /adx/bin/
Disallow: /aponline/
Disallow: /archives/
Disallow: /auth/
Disallow: /cnet/
Disallow: /college/
Disallow: /external/
Disallow: /financialtimes/
Disallow: /idg/
Disallow: /indexes/
Disallow: /library/
Disallow: /nytimes-partners/
Disallow: /packages/flash/multimedia/TEMPLATES/
Disallow: /pages/college/
Disallow: /paidcontent/
Disallow: /partners/
Disallow: /reuters/
Disallow: /thestreet/

User-agent: Mediapartners-Google
Disallow:

Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz
Sitemap: http://www.nytimes.com/sitemap_news.xml.gz

Spiegel Robots.txt

type=textUser-agent: *
Disallow: /100year
Disallow: /15off
Disallow: /accept
Disallow: /approved
Disallow: /aspnet_client
Disallow: /bin
Disallow: /CheetahMailTag
Disallow: /common
Disallow: /config
Disallow: /controls
Disallow: /coremetrics
Disallow: /css
Disallow: /email
Disallow: /emailpromo
Disallow: /emailspecial
Disallow: /emi
Disallow: /fashion
Disallow: /forthehome
Disallow: /friends
Disallow: /js
Disallow: /member
Disallow: /mindware
Disallow: /moderntail
Disallow: /moderntails
Disallow: /nkreturns
Disallow: /normakamali
Disallow: /normakamalipromo
Disallow: /omniture
Disallow: /pop
Disallow: /reality
Disallow: /reports
Disallow: /request
Disallow: /search
Disallow: /stylist
Disallow: /together
Disallow: /trends
Disallow: /tv
Disallow: /ups
Disallow: /utilities
Disallow: /windows

Ebay Robots.txt

### BEGIN FILE ###
#
# allow-all
#
#
# The use of robots or other automated means to access the eBay site
# without the express permission of eBay is strictly prohibited.
# Notwithstanding the foregoing, eBay may permit automated access to
# access certain eBay pages but soley for the limited purpose of
# including content in publicly available search engines. Any other
# use of robots or failure to obey the robots exclusion standards set
# forth at /www.robotstxt.org/ wc/ exclusion.html> is strictly
# prohibited.
# v3
#

User-agent: *
Disallow: /help/confidence/
Disallow: /help/policies/
Disallow: /disney/
Disallow: *rt=nc

### END FILE ###

CNN Robots.txt

type=textSitemap: http://www.cnn.com/sitemap_index.xml
Sitemap: http://www.cnn.com/sitemap_news.xml
Sitemap: http://www.cnn.com/video_sitemap_index.xml
User-agent: *
Disallow: /.element
Disallow: /editionssi
Disallow: /ads
Disallow: /aol
Disallow: /audio
Disallow: /audioselect
Disallow: /beta
Disallow: /browsers
Disallow: /cl
Disallow: /cnews
Disallow: /cnn_adspaces
Disallow: /cnnbeta
Disallow: /cnnintl_adspaces
Disallow: /development
Disallow: /NewsPass
Disallow: /NOKIA
Disallow: /partners
Disallow: /pipeline
Disallow: /pointroll
Disallow: /POLLSERVER
Disallow: /pr
Disallow: /PV
Disallow: /quickcast
Disallow: /Quickcast
Disallow: /QUICKNEWS
Disallow: /test
Disallow: /virtual
Disallow: /WEB-INF

Facebook Robots.txt

# Notice: if you would like to crawl Facebook you can
# contact us here: http://www.facebook.com/apps/site_scraping_tos.php
# to apply for white listing. Our general terms are available
# at http://www.facebook.com/apps/site_scraping_tos_terms.php

User-agent: baiduspider
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: Googlebot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: msnbot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: naverbot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: seznambot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: Slurp
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: teoma
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: twiceler
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: Yandex
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: *
Disallow: /

# E-mail sitemaps@lists.facebook.com if you are authorized to access these and are getting denied.
Sitemap: http://www.facebook.com/sitemap.php

Amazon Books

Loading

Geek T-Shirs

Loading

Add Your Comment

  • Coach Factory Apr 23, 2012 @ 11:43 pm | delete
    Coach Factory with a light and elegant colors to wipe out rare easy Holiday Lane, simple and quiet moment is captured in the bag swing, the rectangle of the Baoshen to it a noble appearance also gave Coach Factory online enough practicality, simplicity the package surface of the car line in a solid color, clear coloring, overall lightweight feel on the fly in this modulation.The attractive golden people aspire, the chain of small beads into the package gently tied the possession of small suitcase full of secrets, the Coach Outlet and unique golden suitcase must be loaded with the highlight of the festival, with a rotating covered buttons instead of the previous design within the button, not only to avoid the prevalent custom may also play a decorative role, Stock your shining suitcase, you will inadvertently beautiful holiday considerably. Coach Outlet Online play in this series, the possibility of color, package of small coins was painted on a bright holiday colors, the use of the same color skin side wrapped to a small items a lot of aura, but also increased their sense of presence, retro buckle and the magic of color to build a classical luxurious texture, Coach Outlet Store Online is the indispensable element in the festival.The Burberry Outlet brand became more popular after the invention of gabardine, which, durable waterproof, breathable material by Thomas Burberry.Burberry shoes and Burberry Sale are part of the world-famous collection of high-class branded products. The brand can be moved to the brand image of the fashion level of quality and durability. Burberry Bags have absolutely no exception. Burberry Shoes are changed to be high fashion and durability, as they were originally intended for,standing severe climatic conditions.It is designed for adult men who know what they are difficult climatic conditions.Burberry Handbags mall re-launch in the autumn and winter coat is not a variant of styles, but all kinds of woolen coats. This is very practical, which within two years, take their products are somewhat thin, with the 2010 autumn and winter can be a practical Burberry Outlet online jacket is absolutely off the compassionate old design!
  • Coach Factory Apr 23, 2012 @ 11:23 pm | delete
    Coach Factory with a light and elegant colors to wipe out rare easy Holiday Lane, simple and quiet moment is captured in the bag swing, the rectangle of the Baoshen to it a noble appearance also gave Coach Factory online enough practicality, simplicity the package surface of the car line in a solid color, clear coloring, overall lightweight feel on the fly in this modulation.The attractive golden people aspire, the chain of small beads into the package gently tied the possession of small suitcase full of secrets, the Coach Outlet and unique golden suitcase must be loaded with the highlight of the festival, with a rotating covered buttons instead of the previous design within the button, not only to avoid the prevalent custom may also play a decorative role, Stock your shining suitcase, you will inadvertently beautiful holiday considerably. Coach Outlet Online play in this series, the possibility of color, package of small coins was painted on a bright holiday colors, the use of the same color skin side wrapped to a small items a lot of aura, but also increased their sense of presence, retro buckle and the magic of color to build a classical luxurious texture, Coach Outlet Store Online is the indispensable element in the festival.The Burberry Outlet brand became more popular after the invention of gabardine, which, durable waterproof, breathable material by Thomas Burberry.Burberry shoes and Burberry Sale are part of the world-famous collection of high-class branded products. The brand can be moved to the brand image of the fashion level of quality and durability. Burberry Bags have absolutely no exception. Burberry Shoes are changed to be high fashion and durability, as they were originally intended for,standing severe climatic conditions.It is designed for adult men who know what they are difficult climatic conditions.Burberry Handbags mall re-launch in the autumn and winter coat is not a variant of styles, but all kinds of woolen coats. This is very practical, which within two years, take their products are somewhat thin, with the 2010 autumn and winter can be a practical Burberry Outlet online jacket is absolutely off the compassionate old design!
  • Smith D May 31, 2011 @ 2:15 am | delete
    Putting robot.txt is always necessary for SEO. The syntaxes are given in a very nice way. I enjoyed the lens same like hcg diet info
  • John Sep 30, 2010 @ 12:59 pm | delete
    Thanks for the useful info!

    Buy Amoxicillin Online | Amoxicillin 500 mg | Amoxicillin Prescription
  • careergirl Apr 4, 2010 @ 9:50 pm | delete
    How do I fix the robots.txt to work with Technorati??
  • BudgetBath_Inc Feb 22, 2010 @ 9:36 am | delete
    same here, technorati couldnt crawl my lens either... i wish someone had a way around it
  • reasonablerobinson Feb 7, 2010 @ 12:34 pm | delete
    I've just been told that this file has stopped my Squidoo lens being crawled by Technorati. Can I put this right/.

Love This Lens?

If you would like to rate this lens, then you can do so here (Squidoo members only)

This module only appears with actual data when viewed on a live lens. The favorite and lensroll options will appear on a live lens if the viewer is a member of Squidoo and logged in.

Add this to your lens »

by

vojin

Hi, I'm Vojin. My interests are in developing online business, programming, scifi movies and games. You can follow me on Twitter at
http://twitter.co...
more »

Feeling creative? Create a Lens!