10:45 AM Sunday Room: AD-211
Over the past few years data has gone from the back office to center stage. Businesses realize that data and the business intelligence gained from it is one of the most valuable assets it possesses. Consequently data has migrated from vertical big iron systems to distributed horizontally scaleable systems with the potential for massive parallel processing.
And that is not all. Data systems, once used for MapReduce and long running batch processing are now processing data at near real time. The innovations in this area are foundational and seismic. Newer data systems will provide personalization, recommendations, business insights in near real time.
However, building, maintaining and managing these systems is non-trivial. There is a paradigm shift when dealing with distributed and partitioned data: how do we provide a consistent view of the distributed data; how do we maintain 24/7 availability; and how do we handle unexpected failures - and failures are always unexpected.
This talk goes into some of the principles and practices in distributed data systems, from messaging systems like Kafka to in-memory file systems and distributed caches. And it suggests best of breed practices to apply when building your own distributed system.