let's say i want to host my own local search engine, i have the application ready.
now i want to activate my crawlers to scrap and index the web.
would i be in hot water for doing this? is there any implementation level rule that i can check other than robots.txt?
any thoughts or inputs on the subject other than it being a huge waste of time and resources :D.

    There isn't a netiquetteas far as I know.

    Usually in all companies I worked for, we banned lol crawlers who ....

    1) ignored robots.txt / sitemaps.xml, especially if they tried to call "randomized" queries / routes

    2) crawled too aggressively - either the number of calls exceeded a certain limit per second or the number of queries for a site exceeded a certain limit (trying to fetch the same site every min isn't nice either)

    3) behaved "weird"... E.g. TLS downgrades, HTTP request smuggling, randomization of user agent header, ...
    Check the fair use law and see if you cover the four conditions. That will cover your of your engine is to be used by public.
