When we’re talking about conventional IT systems, we rarely question the idea of geo-distributed systems and redundancy. And we don’t usually challenge the notion that load balancing among servers and farms is a smart thing to do. So why don’t we routinely think this way about Hadoop?
Customers can set up multiple Hadoop clusters and use each one for a different workload. Companies can then site these clusters in different geographies, for redundancy, load balancing and/or content distribution. The data can be segregated or, using replication technology, it can be synchronized between sites to create a “logical data lake.” Is utilizing multiple Hadoop clusters in this way is folly, or is it just pragmatism?
In this Gigaom Research webinar, the panel will discuss how the multi-cluster approach can be implemented in real systems, and whether and how it can be made to work. The panel will also talk about best practices for implementing the approach in organizations.
What we’ll discuss:
- Does Apache YARN make all tasks equal or does dedicating clusters to specific workloads make more sense?
- Is the data lake concept best for all, or is partitioning data between clusters right for some customers?
- Can Hadoop inter-cluster replication of data work?
- How do public and private cloud architectures impact the multi-cluster question?
- Can multiple clusters be a vector of parallelism and elasticity?
Who should watch
- CIOs and CTOs
- Data scientists
- Data center managers
- DBAs, developers
- IT decision makers
- Cloud platform providers