Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.

sections
Home > WANdisco > Apache Hadoop: Is one cluster enough?
 

Apache Hadoop: Is one cluster enough?

Free Offer Published By: WANdisco
WANdisco
Published:  Oct 15, 2014
Type:  Free Offer

When we’re talking about conventional IT systems, we rarely question the idea of geo-distributed systems and redundancy. And we don’t usually challenge the notion that load balancing among servers and farms is a smart thing to do. So why don’t we routinely think this way about Hadoop?

Customers can set up multiple Hadoop clusters and use each one for a different workload. Companies can then site these clusters in different geographies, for redundancy, load balancing and/or content distribution. The data can be segregated or, using replication technology, it can be synchronized between sites to create a “logical data lake.” Is utilizing multiple Hadoop clusters in this way is folly, or is it just pragmatism?

In this Gigaom Research webinar, the panel will discuss how the multi-cluster approach can be implemented in real systems, and whether and how it can be made to work. The panel will also talk about best practices for implementing the approach in organizations.

What we’ll discuss:

  • Does Apache YARN make all tasks equal or does dedicating clusters to specific workloads make more sense?
  • Is the data lake concept best for all, or is partitioning data between clusters right for some customers?
  • Can Hadoop inter-cluster replication of data work?
  • How do public and private cloud architectures impact the multi-cluster question?
  • Can multiple clusters be a vector of parallelism and elasticity?

Who should watch

  • CIOs and CTOs
  • Data scientists
  • Data center managers
  • DBAs, developers
  • IT decision makers
  • Cloud platform providers



Tags : 
wandisco, wan, wide area network, hadoop, clusters, clustering, load balancing, data, big data, data storage, storage, wide area networks