Google对366个日期的收录数有什么不同?
下面的图是我用程序抓取的Google对全年366个日期的收录量(网页数)统计图。我原以为像圣诞节等重大节日的日期的收录量会很高,但是结果却出乎意料!!
网页嵌入版:
网页版:http://spreadsheets.google.com/ccc?key=0AhwZjz8wu-DXdGFSN1lxUUh1S1FjczJ4aF9RMWJvbFE
或者里:
日期 网页数 January 1 236000000 January 2 213000000 January 3 211000000 January 4 213000000 January 5 208000000 January 6 210000000 January 7 209000000 January 8 210000000 January 9 208000000 January 10 210000000 January 11 212000000 January 12 212000000 January 13 210000000 January 14 213000000 January 15 212000000 January 16 212000000 January 17 210000000 January 18 211000000 January 19 211000000 January 20 210000000 January 21 209000000 January 22 210000000 January 23 213000000 January 24 211000000 January 25 213000000 January 26 206000000 January 27 209000000 January 28 208000000 January 29 211000000 January 30 211000000 January 31 215000000 February 1 205000000 February 2 198000000 February 3 211000000 February 4 196000000 February 5 196000000 February 6 196000000 February 7 195000000 February 8 196000000 February 9 194000000 February 10 199000000 February 11 195000000 February 12 198000000 February 13 197000000 February 14 198000000 February 15 198000000 February 16 199000000 February 17 198000000 February 18 196000000 February 19 196000000 February 20 198000000 February 21 197000000 February 22 198000000 February 23 197000000 February 24 197000000 February 25 195000000 February 26 197000000 February 27 196000000 February 28 199000000 February 29 193000000 March 1 248000000 March 2 231000000 March 3 237000000 March 4 233000000 March 5 232000000 March 6 233000000 March 7 235000000 March 8 234000000 March 9 229000000 March 10 789000000 March 11 747000000 March 12 233000000 March 13 236000000 March 14 234000000 March 15 239000000 March 16 237000000 March 17 234000000 March 18 231000000 March 19 232000000 March 20 235000000 March 21 235000000 March 22 235000000 March 23 232000000 March 24 235000000 March 25 237000000 March 26 236000000 March 27 238000000 March 28 235000000 March 29 237000000 March 30 233000000 March 31 238000000 April 1 315000000 April 2 273000000 April 3 274000000 April 4 277000000 April 5 280000000 April 6 274000000 April 7 272000000 April 8 276000000 April 9 276000000 April 10 903000000 April 11 888000000 April 12 866000000 April 13 277000000 April 14 276000000 April 15 275000000 April 16 271000000 April 17 273000000 April 18 275000000 April 19 275000000 April 20 277000000 April 21 270000000 April 22 272000000 April 23 277000000 April 24 272000000 April 25 272000000 April 26 275000000 April 27 271000000 April 28 270000000 April 29 271000000 April 30 274000000 May 1 767000000 May 2 762000000 May 3 789000000 May 4 857000000 May 5 769000000 May 6 795000000 May 7 827000000 May 8 763000000 May 9 761000000 May 10 776000000 May 11 782000000 May 12 759000000 May 13 764000000 May 14 769000000 May 15 759000000 May 16 772000000 May 17 755000000 May 18 755000000 May 19 737000000 May 20 760000000 May 21 758000000 May 22 761000000 May 23 784000000 May 24 773000000 May 25 775000000 May 26 766000000 May 27 759000000 May 28 748000000 May 29 762000000 May 30 788000000 May 31 770000000 June 1 264000000 June 2 257000000 June 3 260000000 June 4 255000000 June 5 292000000 June 6 255000000 June 7 260000000 June 8 255000000 June 9 257000000 June 10 875000000 June 11 832000000 June 12 254000000 June 13 256000000 June 14 260000000 June 15 257000000 June 16 256000000 June 17 262000000 June 18 251000000 June 19 253000000 June 20 258000000 June 21 257000000 June 22 273000000 June 23 252000000 June 24 255000000 June 25 253000000 June 26 256000000 June 27 256000000 June 28 258000000 June 29 251000000 June 30 303000000 July 1 265000000 July 2 252000000 July 3 252000000 July 4 238000000 July 5 296000000 July 6 292000000 July 7 249000000 July 8 248000000 July 9 281000000 July 10 250000000 July 11 845000000 July 12 254000000 July 13 250000000 July 14 248000000 July 15 252000000 July 16 249000000 July 17 248000000 July 18 257000000 July 19 254000000 July 20 248000000 July 21 251000000 July 22 250000000 July 23 247000000 July 24 248000000 July 25 263000000 July 26 254000000 July 27 252000000 July 28 249000000 July 29 248000000 July 30 251000000 July 31 247000000 August 1 282000000 August 2 278000000 August 3 274000000 August 4 272000000 August 5 271000000 August 6 308000000 August 7 271000000 August 8 294000000 August 9 273000000 August 10 299000000 August 11 274000000 August 12 271000000 August 13 274000000 August 14 270000000 August 15 279000000 August 16 277000000 August 17 273000000 August 18 273000000 August 19 272000000 August 20 276000000 August 21 269000000 August 22 272000000 August 23 275000000 August 24 273000000 August 25 272000000 August 26 273000000 August 27 266000000 August 28 267000000 August 29 277000000 August 30 276000000 August 31 267000000 September 1 330000000 September 2 319000000 September 3 319000000 September 4 332000000 September 5 336000000 September 6 324000000 September 7 322000000 September 8 314000000 September 9 310000000 September 10 324000000 September 11 333000000 September 12 321000000 September 13 327000000 September 14 332000000 September 15 324000000 September 16 326000000 September 17 315000000 September 18 321000000 September 19 328000000 September 20 355000000 September 21 316000000 September 22 321000000 September 23 319000000 September 24 315000000 September 25 310000000 September 26 313000000 September 27 326000000 September 28 309000000 September 29 314000000 September 30 316000000 October 1 317000000 October 2 306000000 October 3 320000000 October 4 311000000 October 5 301000000 October 6 289000000 October 7 297000000 October 8 304000000 October 9 293000000 October 10 1090000000 October 11 1020000000 October 12 290000000 October 13 308000000 October 14 308000000 October 15 300000000 October 16 304000000 October 17 313000000 October 18 317000000 October 19 294000000 October 20 290000000 October 21 308000000 October 22 304000000 October 23 290000000 October 24 301000000 October 25 313000000 October 26 294000000 October 27 288000000 October 28 281000000 October 29 280000000 October 30 279000000 October 31 306000000 November 1 345000000 November 2 320000000 November 3 329000000 November 4 352000000 November 5 355000000 November 6 364000000 November 7 349000000 November 8 372000000 November 9 369000000 November 10 897000000 November 11 352000000 November 12 362000000 November 13 362000000 November 14 359000000 November 15 362000000 November 16 355000000 November 17 354000000 November 18 367000000 November 19 359000000 November 20 363000000 November 21 367000000 November 22 367000000 November 23 371000000 November 24 373000000 November 25 375000000 November 26 370000000 November 27 368000000 November 28 370000000 November 29 374000000 November 30 377000000 December 1 211000000 December 2 211000000 December 3 214000000 December 4 214000000 December 5 210000000 December 6 212000000 December 7 204000000 December 8 209000000 December 9 210000000 December 10 683000000 December 11 213000000 December 12 213000000 December 13 217000000 December 14 212000000 December 15 218000000 December 16 212000000 December 17 216000000 December 18 214000000 December 19 217000000 December 20 217000000 December 21 205000000 December 22 216000000 December 23 214000000 December 24 213000000 December 25 213000000 December 26 212000000 December 27 216000000 December 28 216000000 December 29 213000000 December 30 209000000 December 31 231000000
注:关键词均采用了如 November 23 这样的形式。
根据数据,以下的日期的网页收录数明显比其他的日期高:
March 10, March 11, April 10, April 11, April 12, May 1-31, June 10, June 11, July 11, October 10, October 11, November 10, December 10
初步分析:
首先五月日期的网页数都高比较好解释,May一词除了当月份还有很多其他意思,而且还是人名。
其他几个不寻常的日期基本上除了是10号就是11号(只有April 12一个12号),这其中又暗示着什么呢?
这些日期好像都不是什么著名的节日,为什么偏偏是这几天呢?
为什么10 11这两个数字那么受欢迎呢?
对于这一问题,我现在还没有想出具体是为什么。欢迎留言讨论!
下面是我用Ruby写的抓取代码:
require "rubygems" require "mechanize" t=Time.now date=[] index=['January','February','March','April','May','June','July','August','September','October','November','December'] for i in 1..(index.size) f = [4,6,9,11].include?(i) ? 30 : 31 f -= 2 if i == 2 f.times {|d| date.push(index[i-1]+" "+(d+1).to_s)} end #puts date agent = WWW::Mechanize.new data = [] for d in date page = agent.get("http://www.google.com/search?q="+d) page.body.scan(/about\s<strong>(.*)<')[0].delete(','))} puts d end File.open('data.txt','w').write(data) puts "Completed in #{(Time.now-t).to_s}" </strong>



什么东东?