Monday, April 14, 2008

Google crawling forms

Google now admits to Crawling through HTML forms.

For select menus, check boxes, and radio buttons on the form, Google will choose from among the values of the HTML.

After gaining access to content pass the form, Google may or may not index that content

You can block Googlebot from crawling your forms by excluding them in your robots.txt file

Googlebot will only attempt to crawl GET forms

Googlebot tries to avoid forms requesting userids, login, passwords, contact information and so on

This should not impact PageRank

Pros: Google can crawl places they haven't and index more of your content, which gives you more visibility.
Cons: Pages you do not want indexed, might require you do more work to block them.