注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

钱五哥の163空间

记录俺的生活和工作历程

 
 
 

日志

 
 
关于我

从事网络通信软件和开发管理开发多年,了解各类软件系统的架构、设计、开发和测试以及相应的开发方法。工作之余,喜欢研究一些自己感兴趣的事情,包括写写小程序、做做木工、看看连续剧、读读军事杂志、养鱼种花等等

网易考拉推荐

Hadoop at Yahoo! Sets New Gray Sort Record[zz]  

2013-09-26 00:54:06|  分类: IT技术 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

去年曾经整理了所有Terasort的记录,最快的是Yahoo 2009年做的TB排序,62秒钟(1460节点,主频2.5Ghz,8G,1Gbe,4HD)。今年7月份Yahoo就搞了一个100TB的测试,72分钟(2100节点,主频2.3Ghz,双路6核,64G,10Gbe,12HD)。如果是线性的,则1TB的数据扩大到100TB,需要103分钟,当然,排序算法并非线性的。今年Yahoo达到这么高的性能,虽然和Hadoop优化有关系,但是和采用高性能服务器的关系更大。

Hadoop at Yahoo! Sets New Gray Sort Record – The Yellow Elephant is Getting Faster

By Thomas Graves – Wed, Jul 3, 2013 5:27 AM EDT
  • ?

hadoop-elephantWe are proud to announce we used Apache Hadoop to set a new Gray sort record for the Jim Gray's Sort benchmark. We nearly doubled the rate of the previous Gray sort entry by sorting at a rate of 1.42 Terabytes per minute. The previous record was 0.725 Terabytes per minute.

Jim Gray's sort benchmark consists of a set of many related benchmarks, each with their own rules. All of the sort benchmarks measure the time to sort different numbers of 100 byte records. The first 10 bytes of each record is the key and the rest is the value. The Gray sort is to measure the sort rate achieved while sorting at least 100 terabytes of data. The Minute sort is the amount of data that can be sorted in less than a minute. There are two different benchmark categories. The Daytona category requires the sort code to be general purpose sort. The Indy category needs to only sort 100-byte records with 10-byte keys. We used Hadoop Terasort with slightly different configurations in both categories.

There were some new rules this year. The biggest rule change is that your sort is required to sort both skewed and non-skewed data for the Daytona benchmark category. The skewed data has to be sorted in no more than twice the elapsed time of the non-skewed data. Other changes include: the input and output data must persist in the case of a single node failure and none of the data can be compressed (input, intermediate, or output)

Results

Our only official entry was to Gray sort, but we also unofficially (we didn’t get the submission in by the deadline - learned that lesson!) broke the previous Minute Sort record. The full report can be found on the sort benchmark page under the Gray sort results.

image

Software, Hardware, and Operating System

The version of Hadoop used was Hadoop 0.23.7. Hadoop 0.23.7 is an early branch of the Hadoop 2.X line that Yahoo! has used to stabilize YARN. It is available for download at hadoop.apache.org.

The hardware and operating system details are:

  • Approximately 2100 nodes for GraySort and 2200 nodes for MinuteSort
  • System: Dell R720xd, 2 x Xeon E5-2630 2.30GHz, 62.3GB / 64GB 1333MHz DDR3, 12 x 3TB SATA
  • Processors: 2 x Xeon E5-2630 2.30GHz, 7.2GT QPI (HT enabled, 12 cores, 24 threads) - Sandy Bridge-EP C2, 64-bit, 6-core, 32nm, L3: 15MB
  • OS: RHEL Server 6.3, Linux 2.6.32-279.19.1.el6.YAHOO.20130104.x86_64 x86_64, 64-bit
  • Network: eth0 (bnx2x): 10Gb/s <full-duplex>
  • 40 nodes/rack 160Gbps rack to spine. 2.5:1 subscription.
  • Oracle JDK 1.7 (u17) - 64 bit

Biography

Thomas Graves is a software developer at Yahoo! and a Hadoop PMC member at the Apache Software Foundation.

Nathan Roberts is a Hadoop architect at Yahoo!.

Balaji Narayanan and Rajiv Chittajallu are on the Grid Operations team at Yahoo!.

?

相关信息:

  1. Terasort 的记录
  2. Hadoop at Yahoo! Sets New Gray Sort Record – The Yellow Elephant is Getting Faster
  评论这张
 
阅读(76)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017