2010년 8월 25일 수요일

[프로그래밍] Search engine robots and others


Search engine robots that visit your web site


Contents of this page


Search engine robots and others
Browsers
Link Checkers, Link monitors and bookmark managers
Validators
FTP clients and download managers
Research projects
Software packages
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Other useful sites
...And finally, some fakers
Awards for this page

Search engines and other sites send robots to read and index your pages. This page reverses that process and indexes the robots. This information has been gleaned by looking at the server logs for www.jafsoft.com. You can read a detailed description of how we hunt spiders


Whenever a page is read from a web site, the log file records a number of details including the time, the IP address and usually the referrer page and the user agent. You can see this in our analysis of a server log sample.


Unlike many pages that list web robots, this page actually tries to go visit the robots themselves. Where possible links are provided to the robots home pages, and descriptions are given of what they're up to. This page is updated regularly as more information is found (the last update was on 30-Jan-2006).


Well behaved robots will identify themselves, often supplying web or email addresses you can contact. In any case, the pattern of pages being read and the IP addresses being used soon sorts the men from the robots.


Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.


You can also visit our page describing the engines in some detail.


This page is regularly converted from this text file by the author's own text to HTML converter AscToHTM. The last update was on 30-Jan-2006. This software is available as shareware (cost $30)



Search engine robots and others


The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.


Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. www.snap.com and www.hotbot.com)


Wherever <nn> appears this indicates a number of different digits may be used.


















































































































































































































































































































































































































































































































































Home page/search engine Robot identifier IP address(es)
www.abacho.com AbachoBOT srv-ze-robot1.tricus.com
www.abcdatos.com abcdatos_botlink
http://www.abcdatos.com/botlink/
217.126.39.167
www.aesop.com AESOP_com_SpiderMan 209.189.115.49
www.ah-ha.com ah-ha.com crawler (crawler@ah-ha.com) c7pub-216-250-141-186.center7.com
www.alexa.com ia_archiver green.alexa.com
sarah.alexa.com
www.altavista.com


Scooter
Mercator
Scooter2_Mercator_3-1.0
roach.smo.av.com-1.0
Tv<nn>_Merc_resh_26_1_D-1.0
test-scooter.pa.alta-vista.net
brillo.pa.alta-vista.net
av-dev4.pa.alta-vista.net
scooter.aveurope.co.uk
bigip1-snat.sv.av.com
mercator.pa-x.dec.com
scooter.pa.alta-vista.net
election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com
scooter.sv.av.com
avfwclient.sv.av.com
tv<nn>.sv.av.com
www.altavista.co.uk AltaVista-Intranet
jan.gelin@av.com
host-119.altavista.se
www.alltheweb.com FAST-WebCrawler
crawler@fast.no
209.67.247.154
  www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html
  Wget ext-gw.trd.fast.no
www.acoon.de Acoon Robot 194.231.42.178
www.antisearch.net antibot 62.210.155.50
www.atomz.com Atomz router-sc.atomz.com
index.atomz.com
www.axmo.com AxmoRobot 194.248.208.82
www.buscaplus.com Buscaplus Robi
http://www.buscaplus.com/robi/
 
www.canseek.ca CanSeek/
support@canseek.ca
216.168.111.111
www.christcrawler.com/search.cfm ChristCRAWLER
http://www.christcrawler.com/
207.191.111.231
www.clush.com Clushbot
http://www.clush.com/bot.html
209.249.80.242
www.crawler.de Crawler
admin@crawler.de
crawlit.crawler.de
www.daadle.com DaAdLe.com ROBOT/ 216.12.213.32
www.daum.net
RaBot
Agent-admin/ phortse@hanmail.net
contact/jylee@kies.co.kr
210.183.28.46
211.50.57.6
  RaBot
Agent-admin/ webmaster@kisco.go.kr
202.30.94.34
www.en.deepindex.com DeepIndex deepindex.net1.nerim.net
www.ditto.com DittoSpyder 65.169.94.188
domanova.co.uk Jack  
www.earthcom.info EARTHCOM.info 194.108.39.74
www.entireweb.com Speedy Spider 62.13.25.209
www.excite.com ArchitextSpider Musical instrumentss are used
in the name such as viola.excite.com
cello.excite.com
piano.excite.com
kazoo.excite.com
ride.excite.com
sabian.excite.com
sax.excite.com
bugle.excite.com
snare.excite.com
ziljian.excite.com
bongos.excite.com
maturana.excite.com
mandolin.excite.com
piccolo.excite.com
kettle.excite.com
ichiban.excite.com
(and the rest of the band)
more recently first names are being
used like philip.excite.com
peter.excite.con
perdita.excite.com
macduff.excite.com
agouti.excite.com
(excite) ArchitectSpider crimpshrine.atext.com
ichiban.atext.com
www.eurip.com EuripBot 81.169.172.30
www.euroseek.net Arachnoidea
arachnoidea@euroseek.net
212.209.54.134
www.ezresults.com EZResult 216.28.23.59
www.fastsearch.net
Fast PartnerSite Crawler
FAST Data Search Crawler
FAST Data Search Document Retriever
psprdcrw001.sac2.fastsearch.net
65.198.110.185
69.38.159.128
www.fireball.de KIT-Fireball ????
http://france.misesajour.com/ france.misesajour.com 66.98.210.71
www.fybersearch.com FyberSearch 69.49.241.9
www.galaxy.com GalaxyBot
http://www.galaxy.com/galaxybot.html
63.121.41.175
www.geckobot.com geckobot ???.rdc1.az.coxatwork.com
www.gendoor.com
(Genealogical Search Engine)
GenCrawler ????
www.geona.com GeonaBot 69.59.142.17
www.getrax.com getRAX 81.169.156.246
www.google.com Googlebot
googlebot@googlebot.com
http://googlebot.com/
c<nn>.googlebot.com
www.goo.ne.jp moget/2.0
moget@goo.ne.jp
202.229.31.13
www.girafa.com Aranha Aranha.girafa.com
(inktomi)
Slurp.so/1.0
slurp@inktomi.com
q2004.inktomisearch.com
j5006.inktomisearch.com
(inktomi)
Slurp/2.0j
slurp@inktomi.com
www.inktomisearch.com
202.212.5.34
goo313.goo.ne.jp
(inktomi) Slurp/2.0-KiteHourly
slurp@inktomi.com;
www.inktomi.com/slurp.html
y400.inktomi.com
(inktomi) Slurp/2.0-OwlWeekly
spider@aeneid.com
www.inktomi.com/slurp.html
209.185.143.198
(inktomi) Slurp/3.0-AU
slurp@inktomi.com
j6000.inktomi.com
http://hoppa.com/
(need V5 browsers to view)
Toutatis 2.5-2 tisnix.xs4all.nl
www.hubat.com Hubater 209.114.176.250
www.almaden.ibm.com
(research centre)
http://www.almaden.ibm.com/cs/crawler wfp2.almaden.ibm.com
www.iltrovatore.it IlTrovatore-Setaccio 213.26.21.8
www.incywincy.com IncyWincy 64.81.243.66
www.infoseek.com
UltraSeek
InfoSeek Sidewinder
cde2c923.infoseek.com
cde2c91f.infoseek.com
cca26215.infoseek.com
www.intags.de Mole2/1.0
webmaster@intags.de
217.160.75.10
http://mp3bot.de/ MP3Bot <..>
www.ip3000.com C-PBWF-ip3000.com-crawler
ip3000.com-crawler
www.ip3000.com
www.istarthere.com http://www.istarthere.com
spider@istarthere.com
66.220.24.80
www.knowledge.com Knowledge.com/ 213.170.2.69
www.kuloko.com kuloko-bot/0.2 66.90.81.41
www.lexis-nexis.com LNSpiderguy firewall5.lexis-nexis.com
www.linknz.co.nz Linknzbot 202.191.32.67
www.look.com lookbot magma.com
www.looksmart.com MantraAgent fjupiter.looksmart.com
www.loopimprovements.com
(see also www.incywincy.com)
NetResearchServer
www.loopimprovements.com/robot.html
leg-64-133-109-250-STK.sprinthome.com
www.lycos.com Lycos_Spider_(T-Rex) bos-spider<n>.bos.lycos.com
216.35.194.188
www.joocer.com JoocerBot 80.46.38.169
www.mirago.co.uk HenryTheMiragoRobot 194.202.39.46
www.mojeek.com MojeekBot ???
www.mozdex.com mozDex/ (within comcast.net)
http://search.msn.com/ MSNBOT/0.1
http://search.msn.com/msnbot.htm)
131.107.163.47
www.navadoo.com Navadoo Crawler ???
www.northernlight.com Gulliver marvin.northernlight.com
taz.northernlight.com
www.objectssearch.com ObjectsSearch/0.01 68.88.244.177
http://szukaj.onet.pl/ OnetSzukaj/ ???
www.picosearch.com PicoSearch/ pipe.picosearch.com
www.portaljuice.com PJspider timber.nextopia.com
www.powerinter.net
but it won't let us in :-(
DIIbot node-d8e93393.powerinter.net
http://navi.ocn.ne.jp/
nttdirectory_robot
super-robot@super.navi.ocn.ne.jp
griffon
griffon@super.navi.ocn.ne.jp
lilis00.navi.ocn.ne.jp
lilis04.navi.ocn.ne.jp
www.maxbot.com Spider/maxbot.com
admin@maxbot.com
search.wport.com
??? various (fakes agent on each access) pool0058.cvx2-bradley.dialup.earthlink.net
???
gazz/1.0
gazz@nttrd.com
deleuze.infobee.ne.jp
derrida.infobee.ne.jp
??? ??? search-8.xift.com
www.nationaldirectory.com NationalDirectory-SuperSpider spider.nationaldirectory.com
209.116.58.143
www.naver.com dloader(NaverRobot)/
dumrobo(NaverRobot)/
211.218.151.209
www.noxtrum.com noxtrumbot/ 194.224.199.52
www.openfind.com
(Chinese language)
Openfind piranha,Shark
robot-response@openfind.com.tw
Openbot/
???

abovenet4.openfind.com
www.picsearch.org psbot
www.picsearch.org/bot.html
217.75.104.26
www.pinpoint.com CrawlerBoy Pinpoint.com nitrogen.pinpoint.com
www.petersnews.com user<n>.ip3000.com news<n>.petersnews.com
www.qweery.nl QweeryBot
http://qweerybot.qweery.com)
84.82.133.41
www.vestris.com/alkaline AlkalineBOT host130.uv-ray.com
www.rambler.ru StackRambler/ 81.222.64.10
www.seznam.cz SeznamBot 212.80.76.87
www.search-10.com Search-10 82.41.144.99
www.searchhippo.com Fluffy the spider
info@searchhippo.com)
208.148.122.27
www.scrubtheweb.com Scrubby/ 208.145.190.254
www.singingfish.com asterias grouper.singingfish.com
www.speedfind.de speedfind ramBot xtreme BWEB.highway.telekom.at
www.s.u-tokyo.ac.jp Kototoi/0.1 crawler-red3.is.s.u-tokyo.ac.jp
www.searchbyusa.com SearchByUsa ???
www.searchspider.com Searchspider/ 24.90.243.203
www.sightquest.com SightQuestBot/
http://www.sightquest.com/bot.htm
64.49.245.212
www.spidermonkey.ca Spider_Monkey/ 66.163.18.197
www.surfnomore.com Surfnomore Spider v1.1 165.90.194.245
www.supersnooper.com Robot@SuperSnooper.Com 207.8.212.162
www.teoma.com teoma_agent1
teoma_admin@hawkholdings.com
63.236.92.148
http://mapper.teradex.com Teradex_Mapper
mapper@teradex.com
65.110.6.26
www.travel-finder.com ESISmartSpider 202.46.33.15
www.traficdublu.ro Spider TraficDublu 81.196.*.*, 193.16.218.66
www.tutorgig.com Tutorial Crawler
http://www.tutorgig.com/crawler
216.40.225.75
www.updated.com updated/0.1beta
crawler@updated.com
38.119.96.107
www.uksearcher.co.uk UK Searcher Spider -
www.vivante.com
(coming soon)
Vivante Link Checker 216.93.167.106
www.walhello.com appie uses an address at planet.nl, a Dutch ISP
www.websmostlinked.com Nazilla -
www.webwombat.com.au www.WebWombat.com.au 202.139.99.131
www.webseek.de marvin/infoseek
marvin-team@webseek.de
arthur4.sda.t-online.de
www.webtop.com MuscatFerret ferret<nn>.webtop.com
www.whizbanglabs.com WhizBang! Lab 216.250.143.108
www.wisenut.com ZyBorg
(info@WISEnut.com)
-
www.wire.co.uk WIRE WebRefiner:
webrefiner@wire.co.uk
brighton.wire.co.uk
www.worldsearchcenter.com WSCbot ???
www.yandex.com Yandex ya.yandex.ru
www.yellowpet.com
pet-based search engine
Yellopet-Spider 212-82-36-23.ip.zeitraum.com
www.yelo.no Findexa Crawler ???
www.yourbettersearch.com YBSbot search engine indexer 12.25.90.3
<client sites> libwww-perl www.linpro.no/lwp/
http://verno.ueda.info.waseda.ac.jp/  
  Iron33 207.18.183.251



Browsers


Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.
































































Browser identifier Information
AmigaVoyager
http://v3.vapor.com/
Voyager browser for the Amiga
xChaos_Arachne
http://browser.arachne.cz/
(DOS-compatible browser. Linux version under development)
IBrowse
www.hisoft.co.uk (search for IBrowse)
Amiga-based browser
ICab
www.icab.de/index.html
(Macintosh-only)
JustView
http://www3.justsystem.co.jp/download/justview/3.01win1a.html
(I think this is a browser. Site is in Japanese)
KMeleon
http://kmeleon.sourceforge.net/
(Light browser based on the Mozilla code base)
Konqueror
www.konqueror.org/konq-browser.html
(Linux KDE browser)
Lynx
http://lynx.browser.org/
(Cross-platform text based browser)
OmniWeb
www.omnigroup.com/products/omniweb/
(Macintosh-only)
Opera
www.opera.com
(Cross-platform, small, efficient and standards lead browser)
Plucker
www.plkr.org/index.pl/faq#1.1
(Palm handhelds. Written in Python)
pwWebSpeak
www.prodworks.com/issound/catalog/catalog_pwwebspeak.html
Audio Browser
QWeb
http://sunsite.auc.dk/qweb/ (Linux browser)
(see also http://browswerwatch.internet.com/news/story/qweb8.html)
retawq
http://retawq.sourceforge.net/
Text-based browser for text terminals. Runs under Linux
SlimBrowser
www.flashpeak.com/sbrowser/sbrowser.htm
Freeware tabbed browser
Sleipnir
http://sleipnir.pos.to/software/sleipnir/index.html (Japanese)
Japanese browser with apparantly an English version available.
VMS_Mosaic
http://vaxa.wvnet.edu/vmswww/vms_mosaic.html
(OpenVMS only version of Mosaic, a pre-Netscape browser)
WannaBe
http://mindstory.com/wb2/
(Macintosh text-only browser)
w3m
http://w3m.sourceforge.net/
(text-based browser)



Link Checkers, Link monitors and bookmark managers


Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch".





(pause for warm glow :-)


If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.


If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)


It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.


Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.


A page listing various link checkers (and other tools) can be found at www.softwareqatest.com/qatweb1.html#LINK








































































































































































































































































Robot identifier IP address(es) Link Checker home page
ActiveBookmark <client site> http://libmaster.com/software.php
ALink
<client site>
http://www.info-pack.com/alink/
Reciprocal Link Checker, Manager and Page Generator.
AMeta
<client site>
http://www.info-pack.com/ameta/
Meta Tag Generator
ASPSearch URL Checker
<client site>
http://search.santry.com/downloads/
a site search engine/index maintenance tool
BlogBot <client site> http://sourceforge.net/projects/blogbot/
BMChecker
<client site>
www.fureai.or.jp/~yoichi37/soft/bmchecker.html
(Japanese Bookmark Checker)
Bookmark Buddy <client site> www.bookmarkbuddy.net/about.shtml
Check&Get <client site> www.checkget.com
CheckWeb <client site> www.checkweb.com
CNET_Snoop

www.download.com
(only if you have software listed at that site)
CSE HTML Validator
<client site>
www.htmlvalidator.com
HTML page validator that includes a link checker
amongst it's functions.
DRKSpider <client site> www.drk.com.ar/spider/ (An Open Source project)
DISCo Watchman <client site> www.t-guild.com/gamesite/Software/Disco_w/Disco_w.htm
DoctorHTML draco.imagiware.com http://www2.imagiware.com/RxHTML/
Email Extractor
<client site>
<email collector> We don't list links to
email collectors on this site
EmailSiphon
<client site>
<email collector> We don't list links to
email collectors on this site
EmailWolf <client site> www.pixeltech.com.au/~msw/ewolf/index.html
FavOrg
<client site>
http://www.pcmag.com/article2/0,1759,1558477,00.asp
A utility written by PC Magazine to fetch icons files
(favicon.ico) for your IE favorites
Favorites Sweeper
<client site>
www.manitoolssoftware.cjb.net
Another "favorites" tidy-up utility
FreshLinks.exe <client site> www.resqpc.com/features.html
Funnel Web Profiler
<client site>
www.quest.com/funnel_web/profiler/
Profiles your site, including links to/from it
Html Link Validator <client site> www.lithopssoft.com/hlv/index.html
HTMLParser
<client site>
http://htmlparser.sourceforge.net/ an open source
HTML parser, that is probably exercising it's
link-checking features.
The Informant
The Intraformant
cosmo.dartmouth.edu
http://informant.dartmouth.edu/
InternetLinkAgent
<client site>
http://www1.odn.ne.jp/freeware/rank/ineternet/internetlinkagent.html
(in Japanese)
InternetPeriscope <client site> www.lokboxsoftware.com/internetperiscope.asp
javElink salix.ingetech.com www.dailydiffs.com
jdwhatsnew.cgi <client site> www.jdrowell.com/projects/jdwhatsnew/view
JRTS Check Favorites Utility <client site> www.jrtwine.com/Products/CheckFavs/
Lambda LinkCheck 195.139.70.25 www.stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html
LinkLint-checkonly -- www.goldwarp.com/bowlin/linklint/
LinkAlarm linkalarm.com www.linkalarm.com
Linkbot <client site> www.tetranetsoftware.com/products/linkbot.htm
Linkman (Mozilla...) 66.89.128.242 http://www.outertech.com/product.php?product=5
LinkProver <client site> www.tafweb.com/linkprover.html
Links
--
http://gossamer-threads.com/scripts/links/
(Link management cgi script)
LinkScan Server <client site> www.elsop.com
LinkSweeper <client site> www.lss.com.au/lss/windows/ls/linksweeper.htm
Link Valet Online 195.82.114.5 www.htmlhelp.com/tools/valet/
LinkVerify Spider frances.yourwebhost.com www.enduser.co.uk/linkverify/
LinkWalker
lw.seventwentyfour.com
209.167.50.23
www.seventwentyfour.com
Morning Paper <client site> www.boutell.com/morning/
MoveAnnouncer
--
www.moveannouncer.com
(notifies webmasters when your pages have moved)
mylinkcheck -- www.mylinkcheck.de (German)
NetLookout -- www.frugalsoft.com
NetMechanic
www.elsop.com
gamma.netmechanic2.com
www.netmechanic.com
NetMind-Minder




marvin.netmind.com (retired)
gary.netmind.com
meg.netmind.com
inyanga.netmind.com
leo.netmind.com
gemini.netmind.com
www.netmind.com




NetMonitor -- www.modemwizard.com/netmonitor.html
Netprospector JavaCrawler <client site> www.actaddons.com/products/netprospector.asp
online link validator
216.93.171.138
www.dead-links.com
(online link checker - submit your URL)
Rational SiteCheck <client site> www.rational.com/products/teamtest/prodinfo/sitecheck.jtmpl
Robozilla
h-206-<n>-<n>-<n>.netscape.com
http://dmoz.org/
(checks links in the dmoz directory)
RPT-HTTPClient
<client site>
www.purplefrog.com/~thoth/jchecklinks/
Java utility that uses the Java HTTPClient class library
SiteBar <client site> www.sitebar.org
SpurlBot ??? www.spurl.net Online bookmark agent
SurfMaster <client site> www.maskbit.com/surfmaster.htm
SyncIT <client site> www.bookmarksync.com
Watchfire WebXM <client site> www.watchfire.com/products/webxm.asp
WatzNew Agent <client site> www.watznew.com
WebSite-Watcher <client site> www.aignes.com
WebTrends Link Analyzer <client site> www.webtrends.com
Weblink Scanner <client site> www.iterix.com/products/WeblinkScanner/weblinkScanner.asp
Xenu's Link Sleuth <client site> www.snafu.de/~tilman/xenulink.html
Z-Add Link Checker <client site?> http://w3.z-add.co.uk/linkcheck/



Validators


Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.


However if you choose to validate your own site, then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use.




















Robot Identifier IP address Validator home page
W3C_Validator abyss.w3.org http://validator.w3.org/
WDG_Validator/ 64.29.16.182 www.htmlhelp.com/tools/validator/
Tooter
selfpromotion.com
www.selfpromotion.com. This is
used as part of a link submission
agent (trebor@animeigo.com)



FTP clients and download managers


If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.


If your download files are over 1Mb in size (or if your server is slow), you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes, then chances are the download succeeded.























































































































Client Identifier FTP Client home page
Alligator www.nearsoftware.com/alligator/maininfo/
BatchFTP www.dynamicnet.net/products/batchftp.htm
ChinaClaw
http://download.pchome.net/internet/download/860.html (Chinese)
(Chinese download utility)
DA
www.lidan.com
www.downloadaccelerator.com
DLExpert www.yanew.com (English and Chinese versions available)
Download Demon www.netzip.com
Download Master www.one.com.ua/dm/ (Russian)
Download Ninja www.h-fd.org/~mkro/mt/archives/000585.html (Japanese)
Download Wonder www.forty.com
Ez Auto Downloader
www.anatari.com/ezad/index.html
Downloads all files of a given type from a site, so it's
more like a site grabber
FreshDownload www.freshdevices.com/freshdown.html
Go!Zilla www.gozilla.com
GetRight
MyGetRight
www.getright.com
GetSmart http://getsmart.hypermart.net/
HiDownload www.hidownload.com
JetCar (or FlashGet) www.amazesoft.com
Kapere www.kapere.com/menu.php?lang=english
Kontiki Client www.kontiki.com/products/index.html
LeechFTP http://stud.fh-heilbronn.de/~jdebis/leechftp/
LeechGet www.leechget.de
LightningDownload www.lightningdownload.com
Mass Downloader www.geocities.com/SiliconValley/Vista/2865/md.htm
MetaProducts Download Express www.metaproducts.com/DE.html
NetZip Downloader
SmartDownload
www.netzip.com
NetAnts www.netants.com
NetButler www.webcelerator.com/netbutler/
NetPumper www.netpumper.com
Net Vampire www.netvampire.com
Nitro Downloader www.klsofttools.com/nitro.html
Octopus http://moskalyuk.com/octopus/
PuxaRapido www.puxarapido.com.br
RealDownload http://service.real.com/help/faq/rdown4/rdownfaqa01.html
SpeedDownload www.yazsoft.com (for Macintosh)
WebDownloader for X 1.30
www.krasu.ru/soft/chuchelo/features.php3
(Linux web downloader with X GUI)
WebLeacher
www.webleacher.dk (down last time I tried it)
more details at www.davecentral.com/projects/thewebleacher/
WebPictures Downloader
www.fullstrong.com
Locates and downloads pictures
X-Uploader
Can't find the home page, but it's described (in Russian)
on www.compulenta.ru/2002/1/17/24333/



Research projects


These agents come from research projects. Of course that's how Google started...

















citenikbot/
http://www.citenik.co.uk/bot.html. One-man project due
for release in 2004.
CLIPS-index
http://clips-index.imag.fr/ (French)
French research robot from a linguistics project (?)
Computer_and_Automation_Research_Institute_Crawler

Robot from the research centre at Hungarian Acedemy
of Sciences at www.sztaki.hu Crawls from IP 195.111.1.93
cosmos
robot@xyleme.com




Spider from www.xyleme.com which is a project to locate
and index XML content on the web. The company i

댓글 없음:

댓글 쓰기