Tuesday, February 26, 2013

ApacheCon LiveBlog: Object Storage with CloudStack & Hadoop


This is a live blog from ApacheCon that I'm attending this week.  This session is with Chiradeep Vittal.

Usual Live Blog Disclaimer: This is more of a brain dump typing as fast as I can, please excuse typos, format, and coherent thought process in general.


  • How does Amazon build a cloud:
    • Commodity Hardware -> OpenSource Xen Server -> AWS Orchestration Software -> AWS API -> Amazon eCommerce Platform
    • How would YOU build the same cloud on CloudStack - You can in much the same way: Hardware -> Hypervisor -> CloudStack -> API -> Customer Solution
  • CloudStack is built in the concept of a Zone (much like an AWS Zone)
    • Under the zone is a logical unit of Pods (think of it as a rack)
  • Secondary Storage is used for Templates, snapshots, etc. (items that are storage and not changed often, need to be shared across pods)
  • Cloud Style Workloads = low cost, standardized hardware, highly automated & efficient (it's the Pets vs. Cattle analogy)
  • At scale, everything breaks eventually
  • Regions and Zones - Region "West", hope a Region will not go down when another Region goes down. - Replication from one Region to another Region is the norm
  • Secondary Storage in CloudStack 4.0 today
    • NFS is the server default - mounted by any CloudStack Hypervisor, easy to set up
    • BUT - doesn't scale well, "chatty", maybe need WAN optimize. What if 1000 hypervisors talk to one NFS share?
    • At large scale NFS shows some issues
    • One solution is use object storage for secondary storage
  • Object Storage has redundancy, replication, auditing built in to the technology typically
  • In addition, this technology enables other applications, API server in front of the object store and you know have "Dropbox", etc.  typically static content and archival kinds of applications
  • Object is 99.9 availability and 99.(eleven 9's) durability according to Amazon S3 and Massive scale (1.3 trillion objects in AWS today serving 800k requests per second
  • Scalable objects can not be modified, only deleted (called an Immutable object)
  • Simple API with a flat namespace - think KISS princisple
  • CloudStack S3 API Server - understands Amazon S3 API with a Pluggable BackEnd, default backend is a POSIX filesystem (not very useful in production), Carringo was mentioned as a replacement, also HDFS replacement
  • Question - Does CloudStack handle all the ACL's / Answer: Yes
  • FollowUp - Does that mean SQL Server is a possible constraint / Answer: Yes
  • Integrations are available with Riak CS and OpenStack Swift
  • Upcoming in CloudStack 4.2 - Framework to expand this much more
  • Given all of this, what could we build? (Topic switch)
  • Want an Open Source, scales to 1 billion objects, reliability & durability on par with S3, S3 API
  • This is now a theoretical design (hasn't been tested)
  • (See picture for architecture)

  • Hadoop meets all of these requirements and is proven to work (200 million objects in 1 cluster, 100PB in 1 cluster), need to scale, just add a node, very easy
  • BUT - Name Node Scalability (at 100's of millions of blocks, could run into GC issues), Name Node is a SPOF (Single Point of Failure) - this is being worked currently, Cross Zone Replication (Hadoop has rack awareness, what if further apart?) - this isn't really tested today, where do you store metadata (ACL's for instance)
  • take a 1 billion objects example (bunch of assumptions here) - needs about 450GB per name node, 16TB / note = 1000 data nodes
  • Name Node management is federated (sorry this is vague, getting beyond my knowledge of Hadoop architecture at this point). Name Node and HA really hasn't been tested to date
  • NameSpace shards, how do you shard them? Do you need a DB just to store this?? What about rebalancing between node names?
  • Replication over lossy/slower links (solution really breaks down here today)
    • Async replication - how do you handle master/slave relationships?
    • Sync - not very feasible if you lose a zone (writes never acknowledged so will not continue)
  • Where do you store Metadata?
    • Store in HDFS along with the object, reads become expensive and meta data is mutable (needs to be edited), needs a layer on top of HDFS
    • Use another storage system (like HBase) - required for Name node federation anyway, but ANOTHER system to manage
    • Modify the Name Node to store the metadata
      • high performance (doesn't exist today)
      • not extensible and not easy to just "plug in"
  • What can you do with Object Store in HDFS today?
    • Viable for small size deployments - up too 100-200 million objects (Facebook does this) with datacenters close together
    • Larger deployments needs development and there is really no effort around this today

No comments: