Search engine robots that visit your web site
Contents of this page
Search engine robots and others
Browsers
Link Checkers, Link monitors and bookmark managers
Validators
FTP clients and download managers
Research projects
Software packages
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Other useful sites
...And finally, some fakers
Awards for this page
Search engines and other sites send robots to read and index your pages. This page reverses that process and indexes the robots. This information has been gleaned by looking at the server logs for www.jafsoft.com. You can read a detailed description of how we hunt spiders
Whenever a page is read from a web site, the log file records a number of details including the time, the IP address and usually the referrer page and the user agent. You can see this in our analysis of a server log sample.
Unlike many pages that list web robots, this page actually tries to go visit the robots themselves. Where possible links are provided to the robots home pages, and descriptions are given of what they're up to. This page is updated regularly as more information is found (the last update was on 30-Jan-2006).
Well behaved robots will identify themselves, often supplying web or email addresses you can contact. In any case, the pattern of pages being read and the IP addresses being used soon sorts the men from the robots.
Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.
You can also visit our page describing the engines in some detail.
This page is regularly converted from this text file by the author's own text to HTML converter AscToHTM. The last update was on 30-Jan-2006. This software is available as shareware (cost $30)
Search engine robots and others
The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.
Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. www.snap.com and www.hotbot.com)
Wherever <nn> appears this indicates a number of different digits may be used.
Home page/search engine | Robot identifier | IP address(es) |
---|---|---|
www.abacho.com | AbachoBOT | srv-ze-robot1.tricus.com |
www.abcdatos.com | abcdatos_botlink http://www.abcdatos.com/botlink/ |
217.126.39.167 |
www.aesop.com | AESOP_com_SpiderMan | 209.189.115.49 |
www.ah-ha.com | ah-ha.com crawler (crawler@ah-ha.com) | c7pub-216-250-141-186.center7.com |
www.alexa.com | ia_archiver | green.alexa.com sarah.alexa.com |
www.altavista.com |
Scooter Mercator Scooter2_Mercator_3-1.0 roach.smo.av.com-1.0 Tv<nn>_Merc_resh_26_1_D-1.0 |
test-scooter.pa.alta-vista.net brillo.pa.alta-vista.net av-dev4.pa.alta-vista.net scooter.aveurope.co.uk bigip1-snat.sv.av.com mercator.pa-x.dec.com scooter.pa.alta-vista.net election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com scooter.sv.av.com avfwclient.sv.av.com tv<nn>.sv.av.com |
www.altavista.co.uk | AltaVista-Intranet jan.gelin@av.com |
host-119.altavista.se |
www.alltheweb.com | FAST-WebCrawler crawler@fast.no |
209.67.247.154 |
www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html | ||
Wget | ext-gw.trd.fast.no | |
www.acoon.de | Acoon Robot | 194.231.42.178 |
www.antisearch.net | antibot | 62.210.155.50 |
www.atomz.com | Atomz | router-sc.atomz.com index.atomz.com |
www.axmo.com | AxmoRobot | 194.248.208.82 |
www.buscaplus.com | Buscaplus Robi http://www.buscaplus.com/robi/ |
|
www.canseek.ca | CanSeek/ support@canseek.ca |
216.168.111.111 |
www.christcrawler.com/search.cfm | ChristCRAWLER http://www.christcrawler.com/ |
207.191.111.231 |
www.clush.com | Clushbot http://www.clush.com/bot.html |
209.249.80.242 |
www.crawler.de | Crawler admin@crawler.de |
crawlit.crawler.de |
www.daadle.com | DaAdLe.com ROBOT/ | 216.12.213.32 |
www.daum.net |
RaBot Agent-admin/ phortse@hanmail.net contact/jylee@kies.co.kr |
210.183.28.46 211.50.57.6 |
RaBot Agent-admin/ webmaster@kisco.go.kr |
202.30.94.34 | |
www.en.deepindex.com | DeepIndex | deepindex.net1.nerim.net |
www.ditto.com | DittoSpyder | 65.169.94.188 |
domanova.co.uk | Jack | |
www.earthcom.info | EARTHCOM.info | 194.108.39.74 |
www.entireweb.com | Speedy Spider | 62.13.25.209 |
www.excite.com | ArchitextSpider | Musical instrumentss are used in the name such as viola.excite.com cello.excite.com piano.excite.com kazoo.excite.com ride.excite.com sabian.excite.com sax.excite.com bugle.excite.com snare.excite.com ziljian.excite.com bongos.excite.com maturana.excite.com mandolin.excite.com piccolo.excite.com kettle.excite.com ichiban.excite.com (and the rest of the band) more recently first names are being used like philip.excite.com peter.excite.con perdita.excite.com macduff.excite.com agouti.excite.com |
(excite) | ArchitectSpider | crimpshrine.atext.com ichiban.atext.com |
www.eurip.com | EuripBot | 81.169.172.30 |
www.euroseek.net | Arachnoidea arachnoidea@euroseek.net |
212.209.54.134 |
www.ezresults.com | EZResult | 216.28.23.59 |
www.fastsearch.net |
Fast PartnerSite Crawler FAST Data Search Crawler FAST Data Search Document Retriever |
psprdcrw001.sac2.fastsearch.net 65.198.110.185 69.38.159.128 |
www.fireball.de | KIT-Fireball | ???? |
http://france.misesajour.com/ | france.misesajour.com | 66.98.210.71 |
www.fybersearch.com | FyberSearch | 69.49.241.9 |
www.galaxy.com | GalaxyBot http://www.galaxy.com/galaxybot.html |
63.121.41.175 |
www.geckobot.com | geckobot | ???.rdc1.az.coxatwork.com |
www.gendoor.com (Genealogical Search Engine) |
GenCrawler | ???? |
www.geona.com | GeonaBot | 69.59.142.17 |
www.getrax.com | getRAX | 81.169.156.246 |
www.google.com | Googlebot googlebot@googlebot.com http://googlebot.com/ |
c<nn>.googlebot.com |
www.goo.ne.jp | moget/2.0 moget@goo.ne.jp |
202.229.31.13 |
www.girafa.com | Aranha | Aranha.girafa.com |
(inktomi) |
Slurp.so/1.0 slurp@inktomi.com |
q2004.inktomisearch.com j5006.inktomisearch.com |
(inktomi) |
Slurp/2.0j slurp@inktomi.com www.inktomisearch.com |
202.212.5.34 goo313.goo.ne.jp |
(inktomi) | Slurp/2.0-KiteHourly slurp@inktomi.com; www.inktomi.com/slurp.html |
y400.inktomi.com |
(inktomi) | Slurp/2.0-OwlWeekly spider@aeneid.com www.inktomi.com/slurp.html |
209.185.143.198 |
(inktomi) | Slurp/3.0-AU slurp@inktomi.com |
j6000.inktomi.com |
http://hoppa.com/ (need V5 browsers to view) |
Toutatis 2.5-2 | tisnix.xs4all.nl |
www.hubat.com | Hubater | 209.114.176.250 |
www.almaden.ibm.com (research centre) |
http://www.almaden.ibm.com/cs/crawler | wfp2.almaden.ibm.com |
www.iltrovatore.it | IlTrovatore-Setaccio | 213.26.21.8 |
www.incywincy.com | IncyWincy | 64.81.243.66 |
www.infoseek.com |
UltraSeek InfoSeek Sidewinder |
cde2c923.infoseek.com cde2c91f.infoseek.com cca26215.infoseek.com |
www.intags.de | Mole2/1.0 webmaster@intags.de |
217.160.75.10 |
http://mp3bot.de/ | MP3Bot | <..> |
www.ip3000.com | C-PBWF-ip3000.com-crawler ip3000.com-crawler |
www.ip3000.com |
www.istarthere.com | http://www.istarthere.com spider@istarthere.com |
66.220.24.80 |
www.knowledge.com | Knowledge.com/ | 213.170.2.69 |
www.kuloko.com | kuloko-bot/0.2 | 66.90.81.41 |
www.lexis-nexis.com | LNSpiderguy | firewall5.lexis-nexis.com |
www.linknz.co.nz | Linknzbot | 202.191.32.67 |
www.look.com | lookbot | magma.com |
www.looksmart.com | MantraAgent | fjupiter.looksmart.com |
www.loopimprovements.com (see also www.incywincy.com) |
NetResearchServer www.loopimprovements.com/robot.html |
leg-64-133-109-250-STK.sprinthome.com |
www.lycos.com | Lycos_Spider_(T-Rex) | bos-spider<n>.bos.lycos.com 216.35.194.188 |
www.joocer.com | JoocerBot | 80.46.38.169 |
www.mirago.co.uk | HenryTheMiragoRobot | 194.202.39.46 |
www.mojeek.com | MojeekBot | ??? |
www.mozdex.com | mozDex/ | (within comcast.net) |
http://search.msn.com/ | MSNBOT/0.1 http://search.msn.com/msnbot.htm) |
131.107.163.47 |
www.navadoo.com | Navadoo Crawler | ??? |
www.northernlight.com | Gulliver | marvin.northernlight.com taz.northernlight.com |
www.objectssearch.com | ObjectsSearch/0.01 | 68.88.244.177 |
http://szukaj.onet.pl/ | OnetSzukaj/ | ??? |
www.picosearch.com | PicoSearch/ | pipe.picosearch.com |
www.portaljuice.com | PJspider | timber.nextopia.com |
www.powerinter.net but it won't let us in :-( |
DIIbot | node-d8e93393.powerinter.net |
http://navi.ocn.ne.jp/ |
nttdirectory_robot super-robot@super.navi.ocn.ne.jp griffon griffon@super.navi.ocn.ne.jp |
lilis00.navi.ocn.ne.jp lilis04.navi.ocn.ne.jp |
www.maxbot.com | Spider/maxbot.com admin@maxbot.com |
search.wport.com |
??? | various (fakes agent on each access) | pool0058.cvx2-bradley.dialup.earthlink.net |
??? |
gazz/1.0 gazz@nttrd.com |
deleuze.infobee.ne.jp derrida.infobee.ne.jp |
??? | ??? | search-8.xift.com |
www.nationaldirectory.com | NationalDirectory-SuperSpider | spider.nationaldirectory.com 209.116.58.143 |
www.naver.com | dloader(NaverRobot)/ dumrobo(NaverRobot)/ |
211.218.151.209 |
www.noxtrum.com | noxtrumbot/ | 194.224.199.52 |
www.openfind.com (Chinese language) |
Openfind piranha,Shark robot-response@openfind.com.tw Openbot/ |
??? abovenet4.openfind.com |
www.picsearch.org | psbot www.picsearch.org/bot.html |
217.75.104.26 |
www.pinpoint.com | CrawlerBoy Pinpoint.com | nitrogen.pinpoint.com |
www.petersnews.com | user<n>.ip3000.com | news<n>.petersnews.com |
www.qweery.nl | QweeryBot http://qweerybot.qweery.com) |
84.82.133.41 |
www.vestris.com/alkaline | AlkalineBOT | host130.uv-ray.com |
www.rambler.ru | StackRambler/ | 81.222.64.10 |
www.seznam.cz | SeznamBot | 212.80.76.87 |
www.search-10.com | Search-10 | 82.41.144.99 |
www.searchhippo.com | Fluffy the spider info@searchhippo.com) |
208.148.122.27 |
www.scrubtheweb.com | Scrubby/ | 208.145.190.254 |
www.singingfish.com | asterias | grouper.singingfish.com |
www.speedfind.de | speedfind ramBot xtreme | BWEB.highway.telekom.at |
www.s.u-tokyo.ac.jp | Kototoi/0.1 | crawler-red3.is.s.u-tokyo.ac.jp |
www.searchbyusa.com | SearchByUsa | ??? |
www.searchspider.com | Searchspider/ | 24.90.243.203 |
www.sightquest.com | SightQuestBot/ http://www.sightquest.com/bot.htm |
64.49.245.212 |
www.spidermonkey.ca | Spider_Monkey/ | 66.163.18.197 |
www.surfnomore.com | Surfnomore Spider v1.1 | 165.90.194.245 |
www.supersnooper.com | Robot@SuperSnooper.Com | 207.8.212.162 |
www.teoma.com | teoma_agent1 teoma_admin@hawkholdings.com |
63.236.92.148 |
http://mapper.teradex.com | Teradex_Mapper mapper@teradex.com |
65.110.6.26 |
www.travel-finder.com | ESISmartSpider | 202.46.33.15 |
www.traficdublu.ro | Spider TraficDublu | 81.196.*.*, 193.16.218.66 |
www.tutorgig.com | Tutorial Crawler http://www.tutorgig.com/crawler |
216.40.225.75 |
www.updated.com | updated/0.1beta crawler@updated.com |
38.119.96.107 |
www.uksearcher.co.uk | UK Searcher Spider | - |
www.vivante.com (coming soon) |
Vivante Link Checker | 216.93.167.106 |
www.walhello.com | appie | uses an address at planet.nl, a Dutch ISP |
www.websmostlinked.com | Nazilla | - |
www.webwombat.com.au | www.WebWombat.com.au | 202.139.99.131 |
www.webseek.de | marvin/infoseek marvin-team@webseek.de |
arthur4.sda.t-online.de |
www.webtop.com | MuscatFerret | ferret<nn>.webtop.com |
www.whizbanglabs.com | WhizBang! Lab | 216.250.143.108 |
www.wisenut.com | ZyBorg (info@WISEnut.com) |
- |
www.wire.co.uk | WIRE WebRefiner: webrefiner@wire.co.uk |
brighton.wire.co.uk |
www.worldsearchcenter.com | WSCbot | ??? |
www.yandex.com | Yandex | ya.yandex.ru |
www.yellowpet.com pet-based search engine |
Yellopet-Spider | 212-82-36-23.ip.zeitraum.com |
www.yelo.no | Findexa Crawler | ??? |
www.yourbettersearch.com | YBSbot search engine indexer | 12.25.90.3 |
<client sites> | libwww-perl | www.linpro.no/lwp/ |
http://verno.ueda.info.waseda.ac.jp/ | ||
Iron33 | 207.18.183.251 |
Browsers
Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.
Browser identifier | Information |
---|---|
AmigaVoyager |
http://v3.vapor.com/ Voyager browser for the Amiga |
xChaos_Arachne |
http://browser.arachne.cz/ (DOS-compatible browser. Linux version under development) |
IBrowse |
www.hisoft.co.uk (search for IBrowse) Amiga-based browser |
ICab |
www.icab.de/index.html (Macintosh-only) |
JustView |
http://www3.justsystem.co.jp/download/justview/3.01win1a.html (I think this is a browser. Site is in Japanese) |
KMeleon |
http://kmeleon.sourceforge.net/ (Light browser based on the Mozilla code base) |
Konqueror |
www.konqueror.org/konq-browser.html (Linux KDE browser) |
Lynx |
http://lynx.browser.org/ (Cross-platform text based browser) |
OmniWeb |
www.omnigroup.com/products/omniweb/ (Macintosh-only) |
Opera |
www.opera.com (Cross-platform, small, efficient and standards lead browser) |
Plucker |
www.plkr.org/index.pl/faq#1.1 (Palm handhelds. Written in Python) |
pwWebSpeak |
www.prodworks.com/issound/catalog/catalog_pwwebspeak.html Audio Browser |
QWeb |
http://sunsite.auc.dk/qweb/ (Linux browser) (see also http://browswerwatch.internet.com/news/story/qweb8.html) |
retawq |
http://retawq.sourceforge.net/ Text-based browser for text terminals. Runs under Linux |
SlimBrowser |
www.flashpeak.com/sbrowser/sbrowser.htm Freeware tabbed browser |
Sleipnir |
http://sleipnir.pos.to/software/sleipnir/index.html (Japanese) Japanese browser with apparantly an English version available. |
VMS_Mosaic |
http://vaxa.wvnet.edu/vmswww/vms_mosaic.html (OpenVMS only version of Mosaic, a pre-Netscape browser) |
WannaBe |
http://mindstory.com/wb2/ (Macintosh text-only browser) |
w3m |
http://w3m.sourceforge.net/ (text-based browser) |
Link Checkers, Link monitors and bookmark managers
Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch".
(pause for warm glow :-)
If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.
If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)
It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.
Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.
A page listing various link checkers (and other tools) can be found at www.softwareqatest.com/qatweb1.html#LINK
Robot identifier | IP address(es) | Link Checker home page |
---|---|---|
ActiveBookmark | <client site> | http://libmaster.com/software.php |
ALink |
<client site> |
http://www.info-pack.com/alink/ Reciprocal Link Checker, Manager and Page Generator. |
AMeta |
<client site> |
http://www.info-pack.com/ameta/ Meta Tag Generator |
ASPSearch URL Checker |
<client site> |
http://search.santry.com/downloads/ a site search engine/index maintenance tool |
BlogBot | <client site> | http://sourceforge.net/projects/blogbot/ |
BMChecker |
<client site> |
www.fureai.or.jp/~yoichi37/soft/bmchecker.html (Japanese Bookmark Checker) |
Bookmark Buddy | <client site> | www.bookmarkbuddy.net/about.shtml |
Check&Get | <client site> | www.checkget.com |
CheckWeb | <client site> | www.checkweb.com |
CNET_Snoop |
www.download.com (only if you have software listed at that site) | |
CSE HTML Validator |
<client site> |
www.htmlvalidator.com HTML page validator that includes a link checker amongst it's functions. |
DRKSpider | <client site> | www.drk.com.ar/spider/ (An Open Source project) |
DISCo Watchman | <client site> | www.t-guild.com/gamesite/Software/Disco_w/Disco_w.htm |
DoctorHTML | draco.imagiware.com | http://www2.imagiware.com/RxHTML/ |
Email Extractor |
<client site> |
<email collector> We don't list links to email collectors on this site |
EmailSiphon |
<client site> |
<email collector> We don't list links to email collectors on this site |
EmailWolf | <client site> | www.pixeltech.com.au/~msw/ewolf/index.html |
FavOrg |
<client site> |
http://www.pcmag.com/article2/0,1759,1558477,00.asp A utility written by PC Magazine to fetch icons files (favicon.ico) for your IE favorites |
Favorites Sweeper |
<client site> |
www.manitoolssoftware.cjb.net Another "favorites" tidy-up utility |
FreshLinks.exe | <client site> | www.resqpc.com/features.html |
Funnel Web Profiler |
<client site> |
www.quest.com/funnel_web/profiler/ Profiles your site, including links to/from it |
Html Link Validator | <client site> | www.lithopssoft.com/hlv/index.html |
HTMLParser |
<client site> |
http://htmlparser.sourceforge.net/ an open source HTML parser, that is probably exercising it's link-checking features. |
The Informant The Intraformant |
cosmo.dartmouth.edu |
http://informant.dartmouth.edu/ |
InternetLinkAgent |
<client site> |
http://www1.odn.ne.jp/freeware/rank/ineternet/internetlinkagent.html (in Japanese) |
InternetPeriscope | <client site> | www.lokboxsoftware.com/internetperiscope.asp |
javElink | salix.ingetech.com | www.dailydiffs.com |
jdwhatsnew.cgi | <client site> | www.jdrowell.com/projects/jdwhatsnew/view |
JRTS Check Favorites Utility | <client site> | www.jrtwine.com/Products/CheckFavs/ |
Lambda LinkCheck | 195.139.70.25 | www.stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html |
LinkLint-checkonly | -- | www.goldwarp.com/bowlin/linklint/ |
LinkAlarm | linkalarm.com | www.linkalarm.com |
Linkbot | <client site> | www.tetranetsoftware.com/products/linkbot.htm |
Linkman (Mozilla...) | 66.89.128.242 | http://www.outertech.com/product.php?product=5 |
LinkProver | <client site> | www.tafweb.com/linkprover.html |
Links |
-- |
http://gossamer-threads.com/scripts/links/ (Link management cgi script) |
LinkScan Server | <client site> | www.elsop.com |
LinkSweeper | <client site> | www.lss.com.au/lss/windows/ls/linksweeper.htm |
Link Valet Online | 195.82.114.5 | www.htmlhelp.com/tools/valet/ |
LinkVerify Spider | frances.yourwebhost.com | www.enduser.co.uk/linkverify/ |
LinkWalker |
lw.seventwentyfour.com 209.167.50.23 |
www.seventwentyfour.com |
Morning Paper | <client site> | www.boutell.com/morning/ |
MoveAnnouncer |
-- |
www.moveannouncer.com (notifies webmasters when your pages have moved) |
mylinkcheck | -- | www.mylinkcheck.de (German) |
NetLookout | -- | www.frugalsoft.com |
NetMechanic www.elsop.com |
gamma.netmechanic2.com |
www.netmechanic.com |
NetMind-Minder |
marvin.netmind.com (retired) gary.netmind.com meg.netmind.com inyanga.netmind.com leo.netmind.com gemini.netmind.com |
www.netmind.com |
NetMonitor | -- | www.modemwizard.com/netmonitor.html |
Netprospector JavaCrawler | <client site> | www.actaddons.com/products/netprospector.asp |
online link validator |
216.93.171.138 |
www.dead-links.com (online link checker - submit your URL) |
Rational SiteCheck | <client site> | www.rational.com/products/teamtest/prodinfo/sitecheck.jtmpl |
Robozilla |
h-206-<n>-<n>-<n>.netscape.com |
http://dmoz.org/ (checks links in the dmoz directory) |
RPT-HTTPClient |
<client site> |
www.purplefrog.com/~thoth/jchecklinks/ Java utility that uses the Java HTTPClient class library |
SiteBar | <client site> | www.sitebar.org |
SpurlBot | ??? | www.spurl.net Online bookmark agent |
SurfMaster | <client site> | www.maskbit.com/surfmaster.htm |
SyncIT | <client site> | www.bookmarksync.com |
Watchfire WebXM | <client site> | www.watchfire.com/products/webxm.asp |
WatzNew Agent | <client site> | www.watznew.com |
WebSite-Watcher | <client site> | www.aignes.com |
WebTrends Link Analyzer | <client site> | www.webtrends.com |
Weblink Scanner | <client site> | www.iterix.com/products/WeblinkScanner/weblinkScanner.asp |
Xenu's Link Sleuth | <client site> | www.snafu.de/~tilman/xenulink.html |
Z-Add Link Checker | <client site?> | http://w3.z-add.co.uk/linkcheck/ |
Validators
Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.
However if you choose to validate your own site, then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use.
Robot Identifier | IP address | Validator home page |
---|---|---|
W3C_Validator | abyss.w3.org | http://validator.w3.org/ |
WDG_Validator/ | 64.29.16.182 | www.htmlhelp.com/tools/validator/ |
Tooter |
selfpromotion.com |
www.selfpromotion.com. This is used as part of a link submission agent (trebor@animeigo.com) |
FTP clients and download managers
If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.
If your download files are over 1Mb in size (or if your server is slow), you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes, then chances are the download succeeded.
Research projects
These agents come from research projects. Of course that's how Google started...
citenikbot/ |
http://www.citenik.co.uk/bot.html. One-man project due for release in 2004. |
CLIPS-index |
http://clips-index.imag.fr/ (French) French research robot from a linguistics project (?) |
Computer_and_Automation_Research_Institute_Crawler | |
Robot from the research centre at Hungarian Acedemy of Sciences at www.sztaki.hu Crawls from IP 195.111.1.93 | |
cosmos robot@xyleme.com |
Spider from www.xyleme.com which is a project to locate and index XML content on the web. The company i
피드 구독하기:
댓글 (Atom)
|
댓글 없음:
댓글 쓰기