Wednesday, February 27, 2013

ApacheCon LiveBlog: Powering CloudStack w/ Ceph RBD


This is a live blog from ApacheCon that I'm attending this week.  This session is with Patrick McGarry.

Usual Live Blog Disclaimer: This is more of a brain dump typing as fast as I can, please excuse typos, format, and coherent thought process in general.

(No title slide picture this time - missed it)

  • What is Ceph - storage that does object, block, and file all in one; block is thin provision, snapshots, cloning - object has REST API
  • RADOS (Google it) object store at the lowest level
  • Why Object at the lowest level - more useful than blocks, single namespace, scales better, simple API, workload is easily parallel
  • Because of this: define a pool (1 to 100's), independent namespaces and object collections
  • (Topic change) - Architecture
  • aggregate a bunch of different machines so that you can have a "large enough" front end to handle large number of requests in
  • In this "pile" you will have monitors. Monitors provide consensus for decisions, always an odd number, do not store data (traffic controllers) to the storage nodes (OSD nodes)
  • On an OSD node -> physical disk -> file system -> OSD layer
  • CRUSH - pseudo-random placement algorithm for data placement, CEPH "secret sauce", allows for stable mapping and uniform distribution with additional ruled configuration (can apply weights, topology rules)
  • How does it work, take an object, talk to monitors, CRUSH breaks it up, places it around according to the rules
  • What happens when something breaks? If an OSD node is lost, the ones with the copy of the data replicates the blocks somewhere else according to CRUSH rules and moves on
  • How to talk to it? LIBRADOS - library for RADOS, support for C, C++, Java, Python, Ruby, PHP
  • Also RADOSGW - Rest gateway compatible with S3 & Swift
  • CEPH FS - A POSIX-compliant distributed file system with a Linux kernel
  • RBD - reliable and fully-distributed block device sitting on top of the object store
  • RADOS Block Device (RBD) - storage of disk images in RADOS, allows decouple of VM from the host, images stripped across the pool, snapshots, copy-on-write clones
  • What does this look like? vm's are now split across the cluster, great for large capacity as well as high I/O instances of vm's
  • same model as Amazon EBS
  • it is a shared environment, so you can migrate running instances across cluster
  • Copy-On-Write Cloning (he gets lots of question on this) - think of a Golden Image Master vm and you want 100 copies - You spin the 100 instantly and it takes up additional storage as needed and the vm's grow.
  • Question: Is there a performance impact to this? A: No, but as usual it depends on the architecture (how many devices are hitting it)
  • CloudStack 4.0 and RBD? via KVM, no Xen or VMW support today
  • Live migrations are supported
  • No snapshots yet
  • NFS still required for system vm's
  • Can be added easily as RBD Primary storage in CloudStack
  • snapshot and backup support should be coming in version 4.2, cloning is coming, support for secondary storage in 4.2 (backup storage is coing in 4.2)



No comments: