Google网页收录量大幅上升达80亿

搜索公众号ID：hasiblog

Google网页收录量大幅上升达80亿

星期四, 十一月 11, 2004

今天根据Google首页显示，其网页收录量由4,285,199,774张升至8,058,044,651 张，Google的官方Blog公布说索引数据翻了一翻，索引的页面达到80亿左右。覆盖率是衡量搜索引擎的性能的一项重要指标。增加覆盖率能使你找到那些相关但并不被关注的网页。
同时Google对文档的索引也大幅度增加，例如PDF, PowerPoint（PPT）, Flash（swf）, PostScript 、JavaScript等等都增加。
经简单测试，明显呈现如下特点
搜索确实大大增加了多样性结果，基本query都回增加一倍左右的数据量；
另补充材料的覆盖面增大了；
对动态网页的处理能力也大大增强了，我用“loverty 大学”搜索了一下，竟然找到了很多bbs里的发言。

ps：听说微软这两天要发布新的搜索引擎，google立刻给微软来了个下马威。
----------
关于Google搜索结果中补充材料的详细说明

以我的老网站：sitehome.cn为基础作的调研，发现被收录到补充材料中的内容包括：
只补充了网页，而不包括flash/图片等。如果说你的网页中有flash或图片那同网页一样，在补充材料之列

补充材料还有另一种含义就是n久时间以前的网页那么我们认为用户并不care这些，可以放入补充库，在提高搜索结果精确匹配时候用，而不像一般的搜索结果排序一样排出来会因为时间因素影响搜索体验。
http://www.google.com/search?hl=zh-CN&newwindow=1&c2coff=1&q=%22%E7%AC%AC%E5%85%AB%E7%89%88%E5%8C%97%E4%BA%AC%E4%B8%93%E7%89%88%22+%222001%E5%B9%B407%E6%9C%8803%E6%97%A5%22&lr=

还有一个问题就是google并不会把新闻给以更高的权值，对于某些query来说，用户找他不一定是找新闻，他会把最新的赛到这个补充材料库中，以便于不被随随便便排上来。
http://www.google.com/search?hl=zh-CN&newwindow=1&c2coff=1&q=%E9%83%AD%E6%99%B6%E6%99%B6&btnG=%E6%90%9C%E7%B4%A2&lr=
大于20k的页面可能被在SERPS作为supplemental results
网上有同样的内容被作为补充材料。there are some very similar items (different price, ordernumber etc. minor details, otherwise the same/similar content). Google might treat them as duplicate content, and denomite their value.

站点权值很低的，PR0一定是补充材料，PR1/2的会在深层次的页面显示为补充材料

丹尼.苏利文认为：Google的补充结果确是网上实际存在的页面。如果有的页面打不开或不存在，那只是表明Google还未重访该网页并作出相应更新。
Google Guy说，
补充材料（supplemental results）是一项新的实验的需求是为了提高比较偏的query（是指限制条件比较多，或者搜索目标不明确，或者搜索结果很少-for example have a small number of results.）的召回率。这是一项This is a new technology that can return more results for queries that So it might not affect the results for a popular search, but for a researcher doing a more specific query, it can improve the recall of the results. The supplemental collection of pages has been collected from the web just like the 3.3 billion pages in Google's main index. Hope that helps, GoogleGuy
Hey, pages get added to the supplemental index using automatic algorithms. You can imagine a lot of useful criteria, including that we saw a url during the main crawl but didn't have a have a chance to crawl it when we first saw it. Think of this as icing on the cake. If there's an obscure search, we're willing to do extra work with this new experimental feature to turn up more results. The net outcome is more search results for people doing power searches.
"Sounds like an excuse for hiving off a chunk of the main index. Is this another pointer to capacity problems and people trying to invent ways round them?" Hi valeyard, and welcome to WebmasterWorld! The supplemental results are above and beyond the pages that we already search. So we're not taking away any docs--in fact, we're searching even more docs than before.

参考：
http://www.21cnbj.com/industrynews/articles_2003/Supplemental-Result1.htm http://www.donews.net/yizhizhu/archive/2004/07/16/44463.aspx

十一月 11, 2004 · loverty

用微信扫描二维码
分享此文章

0条评论

发表评论

<< Home

0条评论

发表评论

AI助手