10:08PM

关于谷歌对日期收录的出乎意料的发现

My Works, by Wei.

Google对366个日期的收录数有什么不同?

下面的图是我用程序抓取的Google对全年366个日期的收录量(网页数)统计图。我原以为像圣诞节等重大节日的日期的收录量会很高,但是结果却出乎意料!!

Graph

Graph

网页嵌入版:

网页版:http://spreadsheets.google.com/ccc?key=0AhwZjz8wu-DXdGFSN1lxUUh1S1FjczJ4aF9RMWJvbFE

或者里:

日期	网页数
January 1	236000000
January 2	213000000
January 3	211000000
January 4	213000000
January 5	208000000
January 6	210000000
January 7	209000000
January 8	210000000
January 9	208000000
January 10	210000000
January 11	212000000
January 12	212000000
January 13	210000000
January 14	213000000
January 15	212000000
January 16	212000000
January 17	210000000
January 18	211000000
January 19	211000000
January 20	210000000
January 21	209000000
January 22	210000000
January 23	213000000
January 24	211000000
January 25	213000000
January 26	206000000
January 27	209000000
January 28	208000000
January 29	211000000
January 30	211000000
January 31	215000000
February 1	205000000
February 2	198000000
February 3	211000000
February 4	196000000
February 5	196000000
February 6	196000000
February 7	195000000
February 8	196000000
February 9	194000000
February 10	199000000
February 11	195000000
February 12	198000000
February 13	197000000
February 14	198000000
February 15	198000000
February 16	199000000
February 17	198000000
February 18	196000000
February 19	196000000
February 20	198000000
February 21	197000000
February 22	198000000
February 23	197000000
February 24	197000000
February 25	195000000
February 26	197000000
February 27	196000000
February 28	199000000
February 29	193000000
March 1	248000000
March 2	231000000
March 3	237000000
March 4	233000000
March 5	232000000
March 6	233000000
March 7	235000000
March 8	234000000
March 9	229000000
March 10	789000000
March 11	747000000
March 12	233000000
March 13	236000000
March 14	234000000
March 15	239000000
March 16	237000000
March 17	234000000
March 18	231000000
March 19	232000000
March 20	235000000
March 21	235000000
March 22	235000000
March 23	232000000
March 24	235000000
March 25	237000000
March 26	236000000
March 27	238000000
March 28	235000000
March 29	237000000
March 30	233000000
March 31	238000000
April 1	315000000
April 2	273000000
April 3	274000000
April 4	277000000
April 5	280000000
April 6	274000000
April 7	272000000
April 8	276000000
April 9	276000000
April 10	903000000
April 11	888000000
April 12	866000000
April 13	277000000
April 14	276000000
April 15	275000000
April 16	271000000
April 17	273000000
April 18	275000000
April 19	275000000
April 20	277000000
April 21	270000000
April 22	272000000
April 23	277000000
April 24	272000000
April 25	272000000
April 26	275000000
April 27	271000000
April 28	270000000
April 29	271000000
April 30	274000000
May 1	767000000
May 2	762000000
May 3	789000000
May 4	857000000
May 5	769000000
May 6	795000000
May 7	827000000
May 8	763000000
May 9	761000000
May 10	776000000
May 11	782000000
May 12	759000000
May 13	764000000
May 14	769000000
May 15	759000000
May 16	772000000
May 17	755000000
May 18	755000000
May 19	737000000
May 20	760000000
May 21	758000000
May 22	761000000
May 23	784000000
May 24	773000000
May 25	775000000
May 26	766000000
May 27	759000000
May 28	748000000
May 29	762000000
May 30	788000000
May 31	770000000
June 1	264000000
June 2	257000000
June 3	260000000
June 4	255000000
June 5	292000000
June 6	255000000
June 7	260000000
June 8	255000000
June 9	257000000
June 10	875000000
June 11	832000000
June 12	254000000
June 13	256000000
June 14	260000000
June 15	257000000
June 16	256000000
June 17	262000000
June 18	251000000
June 19	253000000
June 20	258000000
June 21	257000000
June 22	273000000
June 23	252000000
June 24	255000000
June 25	253000000
June 26	256000000
June 27	256000000
June 28	258000000
June 29	251000000
June 30	303000000
July 1	265000000
July 2	252000000
July 3	252000000
July 4	238000000
July 5	296000000
July 6	292000000
July 7	249000000
July 8	248000000
July 9	281000000
July 10	250000000
July 11	845000000
July 12	254000000
July 13	250000000
July 14	248000000
July 15	252000000
July 16	249000000
July 17	248000000
July 18	257000000
July 19	254000000
July 20	248000000
July 21	251000000
July 22	250000000
July 23	247000000
July 24	248000000
July 25	263000000
July 26	254000000
July 27	252000000
July 28	249000000
July 29	248000000
July 30	251000000
July 31	247000000
August 1	282000000
August 2	278000000
August 3	274000000
August 4	272000000
August 5	271000000
August 6	308000000
August 7	271000000
August 8	294000000
August 9	273000000
August 10	299000000
August 11	274000000
August 12	271000000
August 13	274000000
August 14	270000000
August 15	279000000
August 16	277000000
August 17	273000000
August 18	273000000
August 19	272000000
August 20	276000000
August 21	269000000
August 22	272000000
August 23	275000000
August 24	273000000
August 25	272000000
August 26	273000000
August 27	266000000
August 28	267000000
August 29	277000000
August 30	276000000
August 31	267000000
September 1	330000000
September 2	319000000
September 3	319000000
September 4	332000000
September 5	336000000
September 6	324000000
September 7	322000000
September 8	314000000
September 9	310000000
September 10	324000000
September 11	333000000
September 12	321000000
September 13	327000000
September 14	332000000
September 15	324000000
September 16	326000000
September 17	315000000
September 18	321000000
September 19	328000000
September 20	355000000
September 21	316000000
September 22	321000000
September 23	319000000
September 24	315000000
September 25	310000000
September 26	313000000
September 27	326000000
September 28	309000000
September 29	314000000
September 30	316000000
October 1	317000000
October 2	306000000
October 3	320000000
October 4	311000000
October 5	301000000
October 6	289000000
October 7	297000000
October 8	304000000
October 9	293000000
October 10	1090000000
October 11	1020000000
October 12	290000000
October 13	308000000
October 14	308000000
October 15	300000000
October 16	304000000
October 17	313000000
October 18	317000000
October 19	294000000
October 20	290000000
October 21	308000000
October 22	304000000
October 23	290000000
October 24	301000000
October 25	313000000
October 26	294000000
October 27	288000000
October 28	281000000
October 29	280000000
October 30	279000000
October 31	306000000
November 1	345000000
November 2	320000000
November 3	329000000
November 4	352000000
November 5	355000000
November 6	364000000
November 7	349000000
November 8	372000000
November 9	369000000
November 10	897000000
November 11	352000000
November 12	362000000
November 13	362000000
November 14	359000000
November 15	362000000
November 16	355000000
November 17	354000000
November 18	367000000
November 19	359000000
November 20	363000000
November 21	367000000
November 22	367000000
November 23	371000000
November 24	373000000
November 25	375000000
November 26	370000000
November 27	368000000
November 28	370000000
November 29	374000000
November 30	377000000
December 1	211000000
December 2	211000000
December 3	214000000
December 4	214000000
December 5	210000000
December 6	212000000
December 7	204000000
December 8	209000000
December 9	210000000
December 10	683000000
December 11	213000000
December 12	213000000
December 13	217000000
December 14	212000000
December 15	218000000
December 16	212000000
December 17	216000000
December 18	214000000
December 19	217000000
December 20	217000000
December 21	205000000
December 22	216000000
December 23	214000000
December 24	213000000
December 25	213000000
December 26	212000000
December 27	216000000
December 28	216000000
December 29	213000000
December 30	209000000
December 31	231000000

注:关键词均采用了如 November 23 这样的形式。

根据数据,以下的日期的网页收录数明显比其他的日期高:
March 10, March 11, April 10, April 11, April 12, May 1-31, June 10, June 11, July 11, October 10, October 11, November 10, December 10

初步分析:
首先五月日期的网页数都高比较好解释,May一词除了当月份还有很多其他意思,而且还是人名。
其他几个不寻常的日期基本上除了是10号就是11号(只有April 12一个12号),这其中又暗示着什么呢?
这些日期好像都不是什么著名的节日,为什么偏偏是这几天呢?
为什么10 11这两个数字那么受欢迎呢?
对于这一问题,我现在还没有想出具体是为什么。欢迎留言讨论!

下面是我用Ruby写的抓取代码:

require "rubygems"
require "mechanize"
t=Time.now
date=[]
index=['January','February','March','April','May','June','July','August','September','October','November','December']
for i in 1..(index.size)
  f = [4,6,9,11].include?(i) ? 30 : 31
  f -= 2 if i == 2
  f.times {|d| date.push(index[i-1]+" "+(d+1).to_s)}
end
#puts date
agent = WWW::Mechanize.new
data = []
for d in date
  page = agent.get("http://www.google.com/search?q="+d)
  page.body.scan(/about\s<strong>(.*)&lt;')[0].delete(','))}
  puts d
end
File.open('data.txt','w').write(data)
puts "Completed in #{(Time.now-t).to_s}"
</strong>

Back Top

回复自“关于谷歌对日期收录的出乎意料的发现”

发表回复

Back Top

注意: 评论者允许使用'@user:'的方式将自己的评论通知另外评论者。例如, ABC是本文的评论者之一,则使用'@ABC:'(不包括单引号)将会自动将您的评论发送给ABC。