Thoth: Towards Managing a Multi-System Cluster



Following the `no one size fits all’ philosophy, active research in big data platforms is focusing on creating an environment for multiple `one-size’ systems to co-exist and co-operate in the same cluster. Consequently, it has now become imperative to provide an integrated management solution that provides a database-centric view of the underlying multi-system environment. Our prototype implementation of DBMS+, called Thoth, adaptively chooses a best-fit system based on application requirements.

A management layer on top with a database-centric view of systems can use a learning-based approach to recommend the right system to use for an application. This is a functionality of database management systems that will be hard to provide with a purely OS-centric approach. To think of database-as-a-service is more convenient for system administrators and application developers for better resource provisioning and data layout. This also relieves them of significant effort in tuning each system separately, allocating resources, and most importantly choosing a system most suitable for an application. In this work, we introduce a prototype for DBMS+: Thoth. Thoth provides a platform to develop various multi-system manageability applications. The applications could range from a dashboard to monitor running applications to a multi-system optimizer.

A demo of the platform is available at

We have used Thoth platform for memory management in data analytics clusters. The following two major projects provide the details:


Systems for processing big data—e.g., Hadoop, Spark, and massively parallel databases—need to run workloads on behalf of multiple tenants simultaneously. The abundant disk-based storage in these systems is usually complemented by a smaller, but much faster, cache. Cache is a precious resource: Tenants who get to use cache can see two orders of magnitude performance improvement. Cache is also a limited and hence shared resource: Unlike a resource like a CPU core which can be used by only one tenant at a time, a cached data item can be accessed by multiple tenants at the same time. Cache, therefore, has to be shared by a multi-tenancy-aware policy across tenants, each having a unique set of priorities and workload characteristics.

In this project, we develop cache allocation strategies that speed up the overall workload while being fair to each tenant. We build a novel fairness model targeted at the shared resource setting that incorporates not only the more standard concepts of Pareto-efficiency and sharing incentive, but also define envy freeness via the notion of core from cooperative game theory. Our cache management platform, ROBUS, uses randomization over small time batches, and we develop a proportionally fair allocation mechanism that satisfies the core property in expectation. We show that this algorithm and related fair algorithms can be approximated to arbitrary precision in polynomial time. We evaluate these algorithms on a ROBUS prototype implemented on Spark with RDD store used as cache. Our evaluation on a synthetically generated industry-standard workload shows that our algorithms provide a speedup close to performance optimal algorithms while guaranteeing fairness across tenants.

Memory Tuner

Allocation and usage of memory in modern data-processing platforms is based on an interplay of algorithms at multiple levels: (i) at the resource-management level across containers allocated by resource managers like Mesos and Yarn, (ii) at the container level among the OS and processes such as the Java Virtual Machine (JVM), (iii) at the framework level for caching, aggregation, data shuffles, and application data structures, and (iv) at the JVM level across various pools such as the Young and Old Generation as well as the heap versus off-heap. We use Thoth platform to build a deep understanding of different interplays in memory management options. Through multiple memory management apps built in Thoth, we demonstrate how Thoth can deal with multiple levels of memory management.




Publications and Talks: