Internet bots are software applications that are used on the Internet
for both legitimate and malicious purposes. Because of the increasing
number of applications becoming available online, there are many
different types of Internet bots that assist with running applications
such as instant messenger and online gaming applications as well as
analysis and gathering of data files.
Bots and Botnets are commonly associated with cybercriminals stealing
data, identities, credit card numbers and worse. But bots can also
serve good purposes. Separating good bots from bad can also make a big
difference in how you protect your company’s website and ensure that
that your site gets the Internet traffic it deserves.
The Most Good Bots are essentially crawlers sent out from the world’s
biggest web sites to index content for their search engines and social
media platforms. You WANT those bots to visit you. They bring you more
business! Shutting them down as part of strategy to block bad bots is a
Googlebot – Googlebot is Google’s web crawling
bot (sometimes also called a “spider”). Googlebot uses an algorithmic
process: computer programs determine which sites to crawl, how often,
and how many pages to fetch from each site. Googlebot’s crawl process
begins with a list of webpage URLs, generated from previous crawl
processes and augmented with Sitemap data provided by webmasters. As
Googlebot visits each of these websites it detects links (SRC and HREF)
on each page and adds them to its list of pages to crawl. New sites,
changes to existing sites, and dead links are noted and used to update
the Google index.
Baiduspider – Baiduspider is a robot of Baidu
Chinese search engine. Baidu (Chinese: 百度; pinyin: Bǎidù) is the leading
Chinese search engine for websites, audio files, and images.
MSN Bot/Bingbot – Retired October 2010 and
rebranded as Bingbot, this is a web-crawling robot (type of Internet
bot), deployed by Microsoft to supply Bing (search engine). It collects
documents from the web to build a searchable index for the Bing (search
Yandex Bot – Yandex bot is Yandex’s search
engine’s crawler. Yandex is a Russian Internet company which operates
the largest search engine in Russia with about 60% market share in that
country. Yandex ranked as the fifth largest search engine worldwide with
more than 150 million searches per day as of April 2012 and more than
25.5 million visitors.
Soso Spider – Soso.com is a Chinese search engine
owned by Tencent Holdings Limited, which is well known for its other
creation QQ. As of 13 May 2012, Soso.com is ranked as the 36th most
visited website in the world and the 13th most visited website in China,
according to Alexa Internet. On an average, Soso.com gets 21,064,490
page views everyday.
Exabot – Exabot is the crawler for ExaLead out of
France. Founded in 2000 by search engine pioneers, Dassault Systèmes,
ExaLead provides search and unified information access software.
Sogou Spider – Sogou.com is a Chinese search
engine. It was launched August 4, 2004. As of April 2010, it has a rank
of 121 in Alexa’s Internet rankings. Sogou provides an index of up to 10
billion web pages.
Google Plus Share – Google Plus lets you share
recommendations with friends, contacts and the rest of the web – on
Google search. The +1 button helps initialize Google’s instant share
capabilities, and it also provides a way to give something your public
stamp of approval.
Facebook External Hit – Facebook allows its users
to send links to interesting web content to other Facebook users. Part
of how this works on the Facebook system involves the temporary display
of certain images or details related to the web content, such as the
title of the webpage or the embed tag of a video. The Facebook system
retrieves this information only after a user provides a link.
Google Feedfetcher – Used by Google to grab RSS
or Atom feeds when users choose to add them to their Google homepage or
Google Reader. Feedfetcher collects and periodically refreshes these
user-initiated feeds, but does not index them in Blog Search or Google’s
other search services (feeds appear in the search results only if
they’ve been crawled by Googlebot).