据University of Iowa 最近一项调查显示,截至2005年1月全球可以索引的网页( indexable Web )已经达到至少115亿。搜索引擎Google覆盖了大约88亿页居第一,Yahoo 第二,覆盖了8 0亿页,MSN Search覆盖了71亿页,Ask Jeeves覆盖了6 0亿页。这与各大搜索引擎宣布的索引数量基本一致。 分析获得各大搜索引擎的覆盖率为:Google=76.16%, Msn Beta=61.90%, Ask/Teoma=57.62%, Yahoo!=69.32%
stimating the size of the whole Web is quite difficult, due to its dynamic nature (According to Andrei Broder, the size of the whole Web depends strongly on whether his laptop is on the web, since it can be configured to produce links to an infinite number of URLs!). Nevertheless, it is possible to assess the size of the publically indexable Web. The indexable Web [4] is defined as "the part of the Web which is considered for indexing by the major engines". In 1997, K.Bharat and A.Broder, A technique for measuring the relative size and overlap of public web search engines [WWW1998](via here)
与dogpile调查相区甚远的是,他们也发现各大引擎直接收录的url重叠交叉点大约是28.85% 或大约27 亿页。这可能跟他们调查取样广有关。 而且这次调查仅限于搜索引擎覆盖的可索引页(only covers the indexable web)如果算上invisible-web,互联网信息该丰富成啥样啊!(:
发表评论