本帖最后由 trybestying 于 2012-6-26 14:40 编辑
chapter 2
第二章 Workloads and Software Infrastructure工作负载和软件基础设施 Theapplications that run on warehouse-scale computers (WSCs)dominate many system design trade-off decisions. This chapter outlines some ofthe distinguishing characteristics of software that runs in large Internetservices and the system software and tools needed for a complete computingplatform. Here is some terminology that defines the different software layersin a typical WSC deployment: 在仓储级计算机(WSCs)中运行的应用程序引领了许多系统设计权衡决策。这一章概述运行在大型互联网服务上的软件的一些特别要求以及一个完整的计算平台所需要的系统软件和工具。下面是一些典型的WSC部署相关的术语,这些术语用来定义不同的软件层: .Platform-level software——the common firmware, kernel, operating system distribution, andlibraries expected to be present in all individual servers to abstract thehardware of a single machine and provide basic server-level services. .平台层的软件——预计将出现在所有单独的服务器中的常见的固件、内核、分布式操作系统和函数库,抽象单独机器的硬件,并提供基本、服务级的服务。 .Cluster-level infrastructure—the collectionof distributed systems software that manages resources and provides services atthe cluster level; ultimately, we consider these services as an operatingsystem for a datacenter. Examples are distributed file systems, schedulers,remote procedure call (RPC) layers, as well as programming models that simplifythe usage of resources at the scale of datacenters, such as MapReduce [19],Dryad [47], Hadoop [42], Sawzall [64], BigTable [13], Dynamo [20], and Chubby[7]. .集群层基础设施—用来管理资源和提供集群级服务的分布式系统软件集;实际上这些服务可看作数据中心的操作系统,比如分布式文件系统、调度器,远程过程调用(RPC)层,以及可以简化数据中心资源使用规模的编程模型等。像 MapReduce [19],Dryad [47], Hadoop [42],Sawzall [64], BigTable [13], Dynamo [20], and Chubby [7]都是。 .Application-level software—software that implements a specific service. It is often useful tofurther divide application-level software into online services and offlinecomputations because those tend to have different requirements. Examples ofonline services are Google search, Gmail, and Google Maps. Offline computationsare typically used in large-scale data analysis or as part of the pipeline thatgenerates the data used in online services; for example, building an index ofthe Web or processing satellite images to create map tiles for the onlineservice. .
应用层软件—实现特定服务的软件。这对进一步分割应用级软件为在线服务和离线计算通常是有用的, 因为这些应用往往有不同的要求。例如在线服务有谷歌搜索、Gmail和谷歌地图。离线计算通常用在大型数据分析或部分管道产生的用于在线服务的数据;例如,建Web索引或 处理卫星图像以创建地图图像块用来提供在线服务。 2.1DATACENTEr VS. DESKToP
数据中心VS.桌面 Softwaredevelopment in Internet services differs from the traditional desktop/servermodel in many ways: 互联网服务方面的软件开发在许多方面有别于传统的桌面/服务器模式: .Ample parallelism—Typical Internet services exhibit a large amount of parallelismstemming from both data- and request-level parallelism. Usually, the problem isnot to find parallelism but to manage and efficiently harness the explicitparallelism that is inherent in theapplication. Data parallelism arises from the large data sets of relativelyindependent records that need processing, such as collections of billions ofWeb pages or billions of log lines. These very large data sets often requiresignificant computation for each parallel (sub) task, which in turn helps hideor tolerate communication and synchronization overheads. Similarly,request-level parallelism stems from the hundreds or thousands of requests persecond that popular Internet services receive. These requests rarely involveread-write sharing of data or synchronization across requests. For example,search requests are essentially independent and deal with a mostly read-onlydatabase; therefore, the computation can be easily partitioned both within arequest and across different requests. Similarly, whereas Web emailtransactions do modify user data, requests from different users are essentiallyindependent from each other, creating natural units of data partitioning andconcurrency. .足够的并发性—典型的互联网服务表现出大量的并发性堵塞主要源于数据和请求级并发。通常,问题不是发现并发性,而是如何管理和有效处理应用程序内在的显式并发。数据并发来自需要处理的相关性独立记录形成的大型数据集,,比如收藏的数十亿网页或日志线。这些非常大的数据集, 每个并行(子)的任务通常都需要大量计算,这反过来有助于隐藏或容忍通信和同步开销。同样,请求级并行性源于流行的互联网服务接收到的每秒成百上千的请求。 .Workload churn—Users ofInternet services are isolated from the service‘simplementation details by relatively well-defined and stable high-level APIs(e.g., simple URLs), making it much easier to deploy new software quickly. Keypieces of Google‘s services have release cycles on theorder of a couple of weeks compared to months or years for desktop softwareproducts. Google‘s front-end Web server binaries, forexample, are released on a weekly cycle, with nearly a thousand independentcode changes checked in by hundreds of developers-——thecore of Google‘s search services has been reimplementednearly from scratch every 2 to 3 years. This environment creates significantincentives for rapid product innovation but makes it hard for a system designerto extract useful benchmarks even from established applications.Moreover,because Internet services are still a relatively new field, new products andservices frequently emerge, and their success with users directly affects theresulting workload mix in the datacenter. For example, video services such asYouTube have flourished in relatively short periods and may present a verydifferent set of requirements from the existing large customers of computingcycles in the datacenter, potentially affecting the optimal design point ofWSCs in unexpected ways. A beneficial side effect of this aggressive softwaredeployment environment is that hardware architects are not necessarily burdenedwith having to provide good performance for immutable pieces of code. Instead,architects can consider the possibility of significant software rewrites totake advantage of new hardware capabilities or devices. . 工作负载搅动—用户的互联网服务通过定义相对良好、稳定的高级APIs(如,简单的url),来实现,隔离了服务实现细节,从更容易快速部署新软件。谷歌服务的关键部分的发布周期已达到大约几周,相比桌面软件产品则需几个月或几年。例如,谷歌前端Web服务的二进制文件(由数以百计的开发人员完成的近一千个独立的代码变更检入)发布周期仅为一个月—谷歌核心搜索服务的编码每2到3年几乎从零开始重新实现。这种环境极大激励了产品快速创新,然尔使系统设计师提取有用的基准库,甚至建立应用程序变得很难。此外,由于互联网服务仍然是一个相对较新的领域,新产品和服务经常出现,他们的成功与用户的直接影响所产生的工作负载,混合到数据中心。例如,YouTube等视频服务在相对较短的时间蓬勃发展,对于现有的数据中心的大客户的计算周期而言, 可能会呈现一组非常不同的需求集,可能以意想不到的方式影响WSCs优化设计的角度。这种激进的软件部署环境的有益的一面,则是硬件架构师不再纠结于为不可变的代码片断提供良好的性能,相反的, 架构师可以考虑利用新的硬件功能或设备实现大量软件重写的可能性。 .Platform homogeneity—The datacenter is generally a more homogeneous environment than thedesktop as a target platform for software development. Large Internet servicesoperations typically deploy a small number of hardware and system softwareconfigurations at any given time. Significant heterogeneity arises primarilyfrom the incentives to deploy more cost-efficient components that becomeavailable over time. Homogeneity within a platform generation simplifiescluster-level scheduling and load balancing and reduces the maintenance burdenfor platforms software (kernels, drivers, etc.). Similarly, homogeneity canallow more efficient supply chains and more efficient repair processes becauseautomatic and manual repairs benefit from having more experience with fewertypes of systems. In contrast, software for desktop systems can make fewassumptions about the hardware or software platform they are deployed on, andtheir complexity and performance characteristics may suffer from the need tosupport thousands or even millions of hardware and system softwareconfigurations. .平台同质性—相比桌面作为目标平台的软件开发而言,数据中心的环境通常更同构化。在任何给定的时间,大型互联网服务操作通常使用少量的硬件和系统软件配置。显著的异质化主要来自部署更多成本更低的组件的,这些组件日久可用。平台同质简化集群级调度和负载平衡,减轻平台软件(内核,驱动,等等)的维护负担。类似地, 同质性可以使供应链效率更高,修复过程更高效,因为自动和手动修复得益于更少类型系统的更多经验积累。相比之下,运行在桌面系统上的软件,假设部署少量硬件或软件,也需要面对数以千计甚至数以百万计的需要满足复杂特性要求的硬件和系统软件配置。 .Fault-freeoperation—Because Internet service applications run on clusters of thousandsof machines—each of them not dramatically more reliablethan PC-class hardware—the multiplicative effect ofindividual failure rates means that some type of fault is expected every fewhours or less (more details are provided in Chapter 6). As a result, althoughit may be reasonable for desktop-class software to assume a fault-free hardwareoperation for months or years, this is not true for datacenter-level services—Internet services need to work in an environment where faults arepart of daily life. Ideally, the cluster-level system software should provide alayer that hides most of that complexity from application-level software,although that goal may be difficult to accomplish for all types ofapplications. .无故障运行——因为互联网服务应用程序运行在成千上万S机器组成的集群上——相比PC级硬件,他们并不显得更可靠——个体失败率的乘法效应,意味着某种类型的故障将可能每几小时或者更少时间(第6章提供更多细节)发生。因此,假设一个桌面级软件可以在硬件无故障的情况下运行几个月或几年可能是合理的,但对于数据中心级服务缺是不实现的——互联网服务需要在错误为日常生活一部分的工作环境中运行。理想情况下,集群级系统软件应能提供一个层,以隔离应用程序级软件的大部分复杂性,这个目标对于所有类型的应用程序来说很难实现。 Although the plentiful thread-level parallelism and amore homogeneous computing platform help reduce software development complexityin Internet services compared to desktop systems, the scale, the need tooperate under hardware failures, and the speed of workload churn have the oppositeeffect. 相比桌面系统,尽管互联网服务中大量线程级别的并行性和更同质化的计算平台有助于减少软件开发的复杂性,但其规模、在硬件故障下运行的需求以及工作负载搅动速度都将会产生相反的效果。 |