注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

钱五哥の163空间

记录俺的生活和工作历程

 
 
 

日志

 
 
关于我

从事网络通信软件和开发管理开发多年,了解各类软件系统的架构、设计、开发和测试以及相应的开发方法。工作之余,喜欢研究一些自己感兴趣的事情,包括写写小程序、做做木工、看看连续剧、读读军事杂志、养鱼种花等等

网易考拉推荐

Facebook买Vertica是否和Presto定位重合?  

2013-12-29 14:50:58|  分类: ICT行业 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

image

image

2013年12月11日,在HP技术大会上,George Kadifa在演讲中宣布Facebook购买了Vertica Analytical Platform,从新闻上看,主要的用途是预测和决策支持,采购的原因是速度和灵活性。

Facebook selected the HP Vertica Analytics Platform as one component of its big data infrastructure.? Vertica’s value to Facebook can be found in its ability to provide business insights with incredible speed and flexibility. HP Vertica supports Facebook’s business analysts and helps the company be more productive through dramatically reduced query time. It is also valuable for providing accurate forecasting and aiding data driven decisions.

According to Tim:

“Data is incredibly important: it provides the opportunity to create new product enhancements, business insights, and a significant competitive advantage by leveraging the assets companies already have. At Facebook, we move incredibly fast. It’s important for us to be able to handle massive amounts of data in a respectful way without compromising speed, which is why HP Vertica is such a perfect fit.”

As blogger Dana Gardner posted:

“HP, taking a new leap in its marathon to remake itself, has further assembled, refined and delivered the IT infrastructure, big data and cloud means by which other large enterprises can effectively remake themselves.

This week here at the HP Discover 2013 conference — despite the gulf of 70 years but originating in the same Silicon Valley byways — has found a kindred spirit in … Facebook. The social media juggernaut, also based in Palo Alto, is often pointed to with both envy and amazement at its new-found and secretive feats of IT data center scale, reach, efficiency and adaptability. It’s a mantle of technological respect that HP itself once long held.

So for Facebook’s CIO, Tim Campos, to get on stage in Europe and declare that, “A partner like HP Vertica thinks like we do” and is a “key part” of Facebook’s big data capabilities, is one the best endorsements, err … “likes,” that any modern IT infrastructure vendor could hope for. With Facebook’s data growing by 500 terabytes a day, this is quite a coup for HP’s analytics platform, which is part of its HAVEn initiative. “

?

这件事情发生在2013年11月6日Facebook宣布将内部研发的Dremel-like数据仓库(SQL-on-Hadoop)Presto之后的一个月左右就有些奇怪了。在Facebook的Blog中,宣称Presto比Hive速度快十倍,给出的一个运行界面显示13个节点,6秒钟完成了一个1.5亿行数据(22.4GB)的分析(有count、join、order by、limit 5)。

image

image

Presto最早是6月份透露出来的,在Facebook的开发者论坛上,Traverso指出每天有超过850人使用presto查询250PB的数据仓库,扫描量在320PB左右。特别指出Presto基本上填补了Hive的性能+一些简单定制化工具的功能与业务需求的差距。简单查询时间几百毫秒,复杂查询也就几分钟,基本上在内存计算,不写回磁盘。在Exabyte的基础上考虑问题有很大的差别。

“Historically, our data scientists and analysts have relied on Hive for data analysis,” Traverso said. “The problem with Hive is it’s designed for batch processing. We have other tools that are faster than Hive, but they’re either too limited in functionality or too simple to operate against our huge data warehouse. Over the past few months, we’ve been working on Presto to basically fill this gap.”

2013年11月6日,Traverso在blog中介绍产品研发始于2012年秋季,由一个小团队完成,2013年初第一个版本就用于生产系统。“Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions”。Presto用Java实现,一是快,二是与其他工具容易集成,设计中避免的JVM内存管理和GC的问题。目前Presto的单个集群已经达到1000节点,超过1000人使用,在PB规模的数据集上每天超过2万查询,Presto的主要应用是即席查询、探索性查询。目前主要的限制是Join表的规模和度,另外也无法将输出数据保存到表中。这些工作是Presto的Roadmap。

image

image

Facebook and many other Hadoop users still rely heavily on Hive for batch-processing jobs such as regular reporting, but there has been a demand for something letting users perform ad hoc, exploratory queries on Hadoop data similar to how they might do them using a massively parallel relational database

It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries, and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest). The main restrictions at this stage are a size limitation on the join tables and cardinality of unique keys/groups. The system also lacks the ability to write output data back to tables (currently query results are streamed to the client).

Roadmap

We are actively working on extending Presto functionality and improving performance.? In the next few months, we will remove restrictions on join and aggregation sizes and introduce the ability to write output tables.? We are also working on a query “accelerator” by designing a new data format that is optimized for query processing and avoids unnecessary transformations. This feature will allow hot subsets of data to be cached from backend data store, and the system will transparently use cached data to “accelerate” queries.? We are also working on a high performance HBase connector.

?

几乎同时,2013年12月21日,Vertica推出了新产品HP Vertica Analytics Platform 7(Crane),这个版本的最大特点是可以支持半结构化的数据(加载快),与Hadoop集成,提供安全性和性能保证,同时支持传统的BI工具和可视化工具,同时也与HCatalog集成,在应用集成方面也做的更好,价格也更低。考虑到Monash在12月5日就在其Blog介绍了这个产品,没有材料说明Facebook采购的是Vertica 7或者是Vertica Classic。

The HP Vertica Analytics Platform 7 “Crane” enables you to accelerate business value from a vastly expanded variety of data by simplifying the exploration and analysis of semi-structured and “dark data.” This release also includes the most comprehensive SQL on Hadoop, critical security and performance enhancements, and more.

The new version 7, also called Vertica Crane, dramatically simplifies the exploration and analysis of semistructured and “dark” data, provides enhanced integration with Hadoop, and offers significant security and performance enhancements. The HP Vertica Analytics Platform also supports a variety of industry-standard business intelligence and visualization tools. It delivers open “SQL-on-Hadoop” capabilities. Unlike other SQL-on-Hadoop solutions, HP Vertica works with major Hadoop distributions, ensuring high-performance analytics across the broadest range of data types and sources. It also supports direct integration with HCatalog, Hadoop’s table and storage management layer.

?

我想FB是不会浪费自己的研发成果的,尤其是Presto这个重要系统 - Presto是唯一一个FB独立创建网站的开源软件。因此Vertica和Presto的定位应该有所不同,猜测Vertica可能被用来替换原有的Oracle数据仓库,毕竟那是一个老旧系统,恐怕无法满足列式分析的性能需求。FB采购的Vertica应该能够与Hadoop、Hive、HBase、Presto更好的集成,而不仅仅是在xDBC上传输数据。如果基于这种假定,似乎FB应该采购类似Crane这样的系统。

?

相关信息:

1、http://www.vertica.com/2013/12/12/welcoming-facebook-to-the-growing-family-of-hp-vertica-customers/

2、http://voltdb.com/presto-facebook-embraces-sql-everyone-at-voltdb-nods-their-head/

3、https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920

4、http://www.oschina.net/news/45706/facebook-open-sources-presto-homegrown-sql-query-engine

5、http://gigaom.com/2013/06/06/facebook-unveils-presto-engine-for-querying-250-pb-data-warehouse/

6、 Scaling Apache Giraph to a trillion edges. https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920

7、Under the hood: Scheduling MapReduce jobs more efficiently with Coronahttps://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

8、Video of Presto talk at Analytics@Webscale conference, June 2013https://www.facebook.com/photo.php?v=10202463462128185

9、http://www.v3.co.uk/v3-uk/news/2307924/hp-announces-vertica-7-crane-update-for-better-big-data-insights

10、http://www.zdnet.com/vertica-7-to-nosql-dbs-drop-dead-7000023491/

11、http://www.datacenterknowledge.com/archives/2013/11/21/big-data-news-hp-sgi-and-extreme-networks-zettaset/

12、http://www.dbms2.com/2013/12/05/vertica-7/

  评论这张
 
阅读(1313)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017