Linux Clustering

1 - I can do better 2 - Jury's out 3 - Pretty darn good 4 - Splendiferous 5 - Awesometastic by 0 people | Log in to rate

Ranked #297 in Tech & Geek, #6,103 overall

What is Linux Clustering

Linux clustering is the ability to make more than one Linux system act in coordination with one or more Linux systems. The goal could be to make it so that a system failure won't knock an important service offline, or maybe to make a system that is able to handle problems that are bigger than any individual computer's physical resources.

There are many free and commercial options for the software to enable these systems and to manage them. This page will try to explain the different types of clustering and offer pointers to resources that will help you in your journey to set up your cluster.

Cluster Poll 

Loading poll. Please Wait...

How I started with Linux Clustering 

I was introduced to Linux by a good friend in college. We spent many a night breaking the system and then fixing it, or sometimes having to reinstall to figure out what we did.

At my first job out of college, I was setting up a print server and was curious how I could make it so that if the computer broke, everyone would still be able to print. This first intro, to what Red Hat referred to as their Piranha package at the time (today that functionality has been rolled into the Red Hat Cluster Suite), was my first contact with any kind of clustering in Linux. Since then, I've worked with everything from HA web servers to Top500 systems to nationally distributed grids.

Types of clusters 

When I think of clustering, I tend to classify clusters in three types, high availability, load balancing, and high performance computing.
  • High Availability: No one wants any services they offer to be unreachable, especially since that downtime will almost undoubtably cost them money. High availability clusters offers the solution to this need for a lack of downtime.
  • Load Balancing: Load balancing is almost a special subset of the high availability. The idea is to create a set up where you have multiple systems all serving the same service at the same time because the demand for the service is more than a single server can handle.
  • High Performance Computing: Also known as Beowulf clusters, the idea here is to hook a bunch of regular computers up to act in a coordinated way so that they can do calculations or jobs that are bigger than any single computer. In this way, you can build a supercomputer using Commodity Off The Shelf (COTS) hardware. These systems do everything from calculating the origins of the universe to modeling weather to rendering movie special effects.

High Availability Software 

High availability is a must for any business these days. You lose money if you have a failure and you suffer downtime. Here we'll list Linux HA options.
Corosync Cluster Engine
The Corosync Cluster Engine is an OSI Certified implementation of a complete cluster engine. It is a widely distributed de-facto standard cluster infrastructure for application high availability.
LifeKeeper
SteelEye's LifeKeeper for Linux is a software application that ensures the continuous availability of applications by maintaining system uptime. LifeKeeper maintains the high availability of Linux cluster systems by monitoring system and application health, maintaining client connectivity and providing uninterrupted data access regardless of where clients reside - on the corporate Internet, intranet or extranet.
Linux-HA Project
The Linux-HA project maintains a set of building blocks for high availability cluster systems, including a cluster messaging layer, a huge number of resource agents for a variety of applications, and a plumbing library and error reporting toolkit.

Linux-HA is perhaps the best known of all the Linux HA software, home to the Heartbeat Daemon.
OpenAIS
The OpenAIS Standards Based Cluster Framework is an OSI Certified implementation of the Service Availability Forum Application Interface Specification (AIS). The Application Interface Specification is a software API and policies which are used to develop applications that maintain service during faults. Restarting and failover of applications is also provided for those deploying applications which may not be modified. The OpenAIS software is built to operate on the Corosync Cluster Engine which allows any third party to implement plugin cluster services using the infrastructure provided.
Red Hat Cluster Suite
Red Hat Cluster Suite provides two distinct types of cluster:
     * Application/Service Failover - Create n-node server clusters for failover of key applications and services
     * IP Load Balancing - Load balance incoming IP network requests across a farm of servers
With Red Hat Cluster Suite, applications can be deployed in high availability configurations so that they are always operational-bringing "scale-out" capabilities to Enterprise Linux deployments.
Veritas Cluster Server
VeritasTM Cluster Server from Symantec is a high availability solution for reducing both planned and unplanned downtime. By monitoring the status of applications and automatically moving them to another server in the event of a fault, Cluster Server can dramatically increase the availability of an application or database.

Load Balancing Software 

Load balancing of certain services, such as mail or web, is a must for any organization that needs to service a large client base.
Keepalived
Keepalived is a userspace daemon for LVS cluster nodes healthchecks and LVS directors failover. Keepalived implements a framework that gives the daemon the ability of checking a LVS server pool states. When one of the server of the LVS server pool is down, keepalived informs the Linux kernel to remove this server entry from the LVS topology.
Linux Virtual Server
The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system. The architecture of the server cluster is fully transparent to end users, and the users interact as if it were a single high-performance virtual server. The Linux Virtual Server as an advanced load balancing solution can be used to build highly scalable and highly available network services, such as scalable web, cache, mail, ftp, media and VoIP services.
Red Hat Cluster Suite
Red Hat Cluster Suite provides two distinct types of cluster:
     * Application/Service Failover - Create n-node server clusters for failover of key applications and services
     * IP Load Balancing - Load balance incoming IP network requests across a farm of servers
With Red Hat Cluster Suite, applications can be deployed in high availability configurations so that they are always operational-bringing "scale-out" capabilities to Enterprise Linux deployments.

High Performance Computing Cluster Software 

Computational/High Performance Linux clusters started back in 1994 when Donald Becker and Thomas Sterling built a cluster for NASA. This cluster was made up of 16 DX4 processors connected by 10 Mbit Ethernet, and they named it Beowulf.

There are a variety of software packages to build and manage these kinds of systems.
ISLE Cluster Manager
SGI's ISLE Cluster Manager provides the necessary power and flexibility to monitor essential system metrics from a single point of control. ISLE Cluster Manager reduces the time and resources spent administering the system by improving software maintenance procedures and automating repetitive tasks. The comprehensive features of ISLE Cluster Manager help lower total cost of system ownership, increase productivity, and provide a better return on your investment.
MOSIX
MOSIX is an on-line management system targeted for high performancer computing on Linux clusters, multi-clusters and Clouds. It supports both interactive processes and batch jobs. MOSIX can be viewed as a multi-cluster operating system that incorporates automatic resource discovery and dynamic workload distribution, commonly found on single computers with multiple processors.
OSCAR
OSCAR allows users, regardless of their experience level with a *nix environment, to install a Beowulf type high performance computing cluster. It also contains everything needed to administer and program this type of HPC cluster.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager includes familiar easy-to-use cluster management tools such as Platform Cluster Manager, Platform LSF Workgroup Edition, Platform MPI, Platform LSF web interface, and Platform ISF Adaptive Cluster.
Rocks
Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls.
Scyld Clusterware
Scyld ClusterWare is an HPC cluster management solution. It is designed to make the deployment and management of a Linux cluster as easy as the deployment and management of a single system.
xCAT
xCAT offers complete and ideal management for HPC clusters, RenderFarms, Grids, WebFarms, Online Gaming Infrastructure, Clouds, Datacenters, and whatever tomorrow's buzzwords may be. It is agile, extendable, and based on years of system administration best practices and experience.

Miscellaneous Useful Links 

Pages/sites that you'll probably find helpful.
Cluster Builder
ClusterBuilder.org assists cluster administrators, technical evaluators and purchase evaluators to research popular information about components they may need in their current or future cluster, grid or utility hosting environments.
Cluster Monkey
A portal with news, articles, and features about Linux clustering and high performance computing.
freshmeat
freshmeat maintains the Web's largest index of Unix and cross-platform software, themes and related "eye-candy", and Palm OS software. Thousands of applications, which are preferably released under an open source license, are meticulously cataloged in the freshmeat database, and links to new applications are added daily.
Furbeowulf Cluster Computing
The name says it all.
HPC Wire
HPCwire is a news and information site covering the entire ecosystem of High Productivity Computing (HPC), target at an audience interested in computationally- and data-intensive computing, including infrastructure topics such as software, middleware, hardware, networking, storage, tools and applications.
IEEE Technical Committee On Scalable Computing
The IEEE Technical Committee on Scalable Computing (TCSC) addresses theoretical and experimental aspects of designing, developing, and evaluating scalable network computing systems, especially clusters and grids, and their applications. Specific topics of interest include cluster and grid interconnection networks, middleware, single-system image, resource and scheduling management policies, distributed programming environments, principles of scalable and reliable software engineering, and high-performance and high-availability computing applications. The TCSC sponsors workshops and conferences on these and related topics.
LinuxHPC.org
LinuxHPC.org is a website for system administrators, developers, and technical managers, offering recent industry news, events, mailing lists and links, etc. related to high performance technical computing and clustering with Linux.
Top500
Top500 is the home of the twice annual ranking of the top 500 supercomputers in the world.

Linux Clustering Books 

Books about building computational Linux clusters

Beowulf Cluster Computing with Linux, 2nd Edition (Scientific and Engineering Computation)

Amazon Price: $36.52 (as of 02/08/2010) Buy Now

Building Clustered Linux Systems

Amazon Price: $37.11 (as of 02/08/2010) Buy Now

How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)

Amazon Price: $37.80 (as of 02/08/2010) Buy Now

Linux Cluster Architecture

Amazon Price: $39.99 (as of 02/08/2010) Buy Now

Linux Clustering: Building and Maintaining Linux Clusters

Amazon Price: $29.94 (as of 02/08/2010) Buy Now

General High Performance Computing Books 

Books about high performance computing in general, not specifically about Linux clusters.

High Performance Cluster Computing: Architectures and Systems, Vol. 1

Amazon Price: $66.30 (as of 02/08/2010) Buy Now

High-Performance Computing : Paradigm and Infrastructure

Amazon Price: $128.23 (as of 02/08/2010) Buy Now

High Performance Computing (RISC Architectures, Optimization & Benchmarks)

Amazon Price: (as of 02/08/2010) Buy Now

In Search of Clusters (2nd Edition)

Amazon Price: $32.81 (as of 02/08/2010) Buy Now

High Availability Books 

Books about high availability concepts and techniques.

Blueprints for High Availability: Designing Resilient Distributed Systems

Amazon Price: (as of 02/08/2010) Buy Now

Clusters for High Availability: A Primer of HP Solutions (2nd Edition)

Amazon Price: $49.99 (as of 02/08/2010) Buy Now

High Availability: Design, Techniques and Processes

Amazon Price: $44.99 (as of 02/08/2010) Buy Now

Reliable Linux: Assuring High Availability

Amazon Price: (as of 02/08/2010) Buy Now

Distributed/Parallel Computing Books 

Distributed Systems: Concepts and Design (4th Edition)

Amazon Price: $99.98 (as of 02/08/2010) Buy Now

Distributed Systems: Principles and Paradigms (2nd Edition)

Amazon Price: $94.06 (as of 02/08/2010) Buy Now

Parallel and Distributed Simulation Systems (Wiley Series on Parallel and Distributed Computing)

Amazon Price: $109.28 (as of 02/08/2010) Buy Now

Parallel Processing and Parallel Algorithms: Theory and Computation

Amazon Price: $110.66 (as of 02/08/2010) Buy Now

Parallel Programming Books 

The Art of Multiprocessor Programming

Amazon Price: $41.55 (as of 02/08/2010) Buy Now

MPI: The Complete Reference (Vol. 1) - 2nd Edition, Vol. 1 - The MPI Core

Amazon Price: $47.00 (as of 02/08/2010) Buy Now

Parallel Programming with MPI

Amazon Price: $67.71 (as of 02/08/2010) Buy Now

Principles of Parallel Programming

Amazon Price: $75.29 (as of 02/08/2010) Buy Now

Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation)

Amazon Price: $41.09 (as of 02/08/2010) Buy Now

TODO 

As this page is still quite new, here are the things I that are not in yet but will be added.
  • Software - file systems, installation, monitoring, management/administration, batch queues & schedulers
  • Documentation - useful online docs

by suprmnsdead

I'm a computer geek who's been using Linux since the mid-90s, and got interested in various forms of clustering around 10 years ago. I'll fill this i... (more)

Explore related pages