Total image workload Facebook has:
- 260 billion images (~20 petabytes) of data
- every week 1 billion (~60terabyte) new photos are uploaded
Main charachteristics of Facebook images:
- read often
- written once
- no modification
- rarely deleted
Traditional file systems are not fast for these specifications (too many disk accesses per read) and external CDN won't be enough in near future due to increasing workload -especially for long tail. As a solution, Haystack is designed to provide;
- High throughput low latency:
- keeps metadata in main memory -at most one disk access per read
- keeps metadata in main memory -at most one disk access per read
- Fault tolerance
- replicas are in different geographical regions
- Cost effective and simple
- comparison to NFS based NAS appliance
- each usable terabyte costs ~28% less
- ~4% more reads per sec
Design Previous to Haystack
What is learned from NFS-based Design
- more than 10 disk operation to read an image
- if directory size is reduced, 3 disk operation to fetch an image
- caching file name for highly possible next requests - new kernel func open_by_file_handle
- Focusing only on caching has limited impact on reducing disk operations for long tail
- CDN are not effective for long tail
- Would GoogleFS like system be useful ?
- Lack of correct RAM/disk ratio in current system
- use XFS (extend base file system)
- reduce metadata size per picture so all metadata can fit into RAM
- store multiple photos per file
- so very good price/performance point -better off than buying more NAS appliances
- holding all regular size metadata in RAM would be way expensive
- design your own CDN (Haystack Cache)
- uses distributed hash table
- in requested photo can not be find in cache, fetches from Haystack store
- store multiple photos per file
DESIGN DETAILS
needs to be updated ..
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in Haystack: Facebook’s photo storage. In OSDI ’10