Olaf,
> and extracting commands/instructions/information for bots, that can be embedded in html (eg meta tags). I want it to be a good webcrawler
Don't forget about the robots.txt file as well. I believe this file should be present in the root folder of a site, i.e.
http://spiegel.com/robots.txtThanks again for all the hard work you're putting into your crawler. I think your utility will benefit a lot of users.
Malcolm