yasemin's notes: Finding a needle in Haystack: Facebook's photo storage

Summary of the idea Haystack project that Facebook started to use for storing pictures

Total image workload Facebook has:

260 billion images (~20 petabytes) of data
every week 1 billion (~60terabyte) new photos are uploaded

Main charachteristics of Facebook images:

read often
written once
no modification
rarely deleted

Traditional file systems are not fast for these specifications (too many disk accesses per read) and external CDN won't be enough in near future due to increasing workload -especially for long tail. As a solution, Haystack is designed to provide;

High throughput low latency:
- keeps metadata in main memory -at most one disk access per read
Fault tolerance

replicas are in different geographical regions

Cost effective and simple

comparison to NFS based NAS appliance
each usable terabyte costs ~28% less
~4% more reads per sec

Design Previous to Haystack

What is learned from NFS-based Design

more than 10 disk operation to read an image
if directory size is reduced, 3 disk operation to fetch an image
caching file name for highly possible next requests - new kernel func open_by_file_handle

Take away from previous design

Focusing only on caching has limited impact on reducing disk operations for long tail
CDN are not effective for long tail
Would GoogleFS like system be useful ?
Lack of correct RAM/disk ratio in current system

Haystack Solution:

use XFS (extend base file system)

reduce metadata size per picture so all metadata can fit into RAM
store multiple photos per file
so very good price/performance point -better off than buying more NAS appliances
holding all regular size metadata in RAM would be way expensive

design your own CDN (Haystack Cache)

uses distributed hash table
in requested photo can not be find in cache, fetches from Haystack store
store multiple photos per file

DESIGN DETAILS
needs to be updated ..

D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in Haystack: Facebook’s photo storage. In OSDI ’10

November 20, 2010

Finding a needle in Haystack: Facebook's photo storage