Ajax Crawling

Ranked #2,192 in Internet, #128,622 overall

Yes, Bing does the "ajax crawling"

Almost two years ago google proposed the "hash-bang" standard to make JavaScript rich websites crawlable. This was great news for websites making heavy use of JavaScript to augment user experience. Bing is late to the party, but anyway it's here now. Say "hello" to the little spider!

Hard core JavaScript sites

A modern approach to leverage user-agent's processing power

Pulling the content in data-only form (json, xml) and using JavaScript to build the HTML representation brings many advantages:


  • the amount of information transferred through the wire is minimal,

  • new content can be added to the existing one without the need to reload the page,

  • different representational forms can be rendered from the same data without touching the server over and over and

  • user interaction with the page can be refined far beyond what can be done in plain or modestly JavaScript assisted HTML


to mention just a few.

There are downsides too. Besides some technical challenges there was an obstacle of epic proportions: JavaScript generated content was invisible for the search engines condemning that shiny new website to the island of isolation and oblivion.

The problem lies in the way JavaScript rich site fakes URL history and in-site links. The only way to prevent browser reloading the whole page (until HTML 5) was to add content behind "#" which was intended for bookmarking the page. This region of url is called "the fragment" and is never sent to the server.

http://si.draagle.com/#!browse/group/root/

and

http://si.draagle.com/#!source/est/drug=esu&fact=est_cyy

are pointing to the same page from servers (and crawler's) perspective.

The obvious solution would be to build a different tree just for search engines and deliver them some stripped down version of the content. Unfortunately this tactic is commonly used by spammers and is brutally penalised by most search engines. Don't do it at home!

Do ya speak #!?

Google's solution

Google proposed a different deal. Site owners should modify their ajax link to include #! instead of just #. Whenever googlebot sees a #! in the URL, it considers it an ajax crawlable link and converts #! temporally to ?_escaped_fragment=, thus:

(1) http://si.draagle.com/#!drug/kxi/?sub=10

becomes

(2) http://si.draagle.com/?_escaped_fragment=drug%2fkxi%2f%3fsub%3d10

Note, how the fragment part got URL escaped.

Your part of a contract is to generate exactly the same content when goolgebot ask for (2) as the browser would generate for (1).

Bing me too

Copycat was shameful too long

Bing webmaster tool and the ajax crawling check boxFor almost two years the only big search engine providing ability to search ajax content was Google's and considering it's market dominance it wasn't such a big deal. Still, we must admit, Bing is a great search tool in many ways comparable to the big G and it offers sort of second opinion. It's market share has grown noticeably in last year and the new paradigms such as smart phones invasion and social searching might shuffle market shares of the search engines even more.

Few days ago, I noticed an important change in Bing's Webmaster tools, namely the Configure your site to have bingbot crawl escaped fragmented URLs containing #! check box.

The dilemma having your site in most of search engine's index vs providing superior user experience by applying the heavy artillery of JavaScript is diminishing fast. And this is good news equally for users and web builders.

Explore more

Google's proposal for ajax crawling
Consise instructions on how to make your site ajax crwalable.
draagle.com the hash-bang pioneer
draagle.com uses #! to expose it's content to the SE bot for ages ;-)

Guestbook Comments

  • Runnn Sep 8, 2011 @ 10:11 am | delete
    You could do better. Looking forward to see more lens from you.
  • daria369 Aug 3, 2011 @ 4:14 pm | delete
    Great info, keep up the good work!! :)
  • dellgirl Jul 22, 2011 @ 12:18 am | delete
    Very nicely done, thanks for sharing this. You really did a good job of explaining this.
  • pramodbisht Jul 20, 2011 @ 4:50 am | delete
    thanks for sharing nice lens
  • aka_sakabato Jul 18, 2011 @ 7:52 pm | delete
    I never exactly know how these search engines work, thanks for the explainations
  • jseven Jul 18, 2011 @ 6:53 pm | delete
    It's pretty foreign to me, but I'm glad we have people like you to explain. :)
  • Tolovaj Jul 18, 2011 @ 6:23 am | delete
    Interesting info, I do not understand half of it, but it seems very useful. I have to come back later... Thanks for sharing:)
  • pheonix76 Jul 17, 2011 @ 6:24 pm | delete
    Interesting -- I had never thought about this before! Nice lens.
  • mensday Jul 17, 2011 @ 2:50 pm | delete
    nice
  • BFuniv.com Jul 17, 2011 @ 11:48 am | delete
    useful, thanks
  • reasonablerobinson Jul 17, 2011 @ 11:47 am | delete
    Blimey a dark art to a layman like me!
  • sukkran Jul 17, 2011 @ 11:33 am | delete
    thanks for the info. very informative lens.
  • VinkoA Jul 17, 2011 @ 5:48 am | delete
    Nice :)
  • kRRt1979 Jul 17, 2011 @ 5:15 am | delete
    nice insight, well written!

Great Amazon Books

every geek shoud consider reading

Loading

by

aleskotnik

Passionate about computers for as long as I can remember, heavily engaged in early tribal wars know as C64 vs Spectrum and Amiga vs Atari. Religious s... more »

Feeling creative? Create a Lens!