What is Linux Clustering
Linux clustering is the ability to make more than one Linux system act in coordination with one or more Linux systems. The goal could be to make it so that a system failure won't knock an important service offline, or maybe to make a system that is able to handle problems that are bigger than any individual computer's physical resources.
There are many free and commercial options for the software to enable these systems and to manage them. This page will try to explain the different types of clustering and offer pointers to resources that will help you in your journey to set up your cluster.
There are many free and commercial options for the software to enable these systems and to manage them. This page will try to explain the different types of clustering and offer pointers to resources that will help you in your journey to set up your cluster.
Table of Contents
- Cluster Poll
- How I started with Linux Clustering
- Types of clusters
- High Availability Software
- Load Balancing Software
- High Performance Computing Cluster Software
- Miscellaneous Useful Links
- Linux Clustering Books
- General High Performance Computing Books
- High Availability Books
- Distributed/Parallel Computing Books
- Parallel Programming Books
- TODO
Cluster Poll
Loading poll. Please Wait...
How I started with Linux Clustering
I was introduced to Linux by a good friend in college. We spent many a night breaking the system and then fixing it, or sometimes having to reinstall to figure out what we did.
At my first job out of college, I was setting up a print server and was curious how I could make it so that if the computer broke, everyone would still be able to print. This first intro, to what Red Hat referred to as their Piranha package at the time (today that functionality has been rolled into the Red Hat Cluster Suite), was my first contact with any kind of clustering in Linux. Since then, I've worked with everything from HA web servers to Top500 systems to nationally distributed grids.
At my first job out of college, I was setting up a print server and was curious how I could make it so that if the computer broke, everyone would still be able to print. This first intro, to what Red Hat referred to as their Piranha package at the time (today that functionality has been rolled into the Red Hat Cluster Suite), was my first contact with any kind of clustering in Linux. Since then, I've worked with everything from HA web servers to Top500 systems to nationally distributed grids.
Types of clusters
When I think of clustering, I tend to classify clusters in three types, high availability, load balancing, and high performance computing.
- High Availability: No one wants any services they offer to be unreachable, especially since that downtime will almost undoubtably cost them money. High availability clusters offers the solution to this need for a lack of downtime.
- Load Balancing: Load balancing is almost a special subset of the high availability. The idea is to create a set up where you have multiple systems all serving the same service at the same time because the demand for the service is more than a single server can handle.
- High Performance Computing: Also known as Beowulf clusters, the idea here is to hook a bunch of regular computers up to act in a coordinated way so that they can do calculations or jobs that are bigger than any single computer. In this way, you can build a supercomputer using Commodity Off The Shelf (COTS) hardware. These systems do everything from calculating the origins of the universe to modeling weather to rendering movie special effects.
High Availability Software
High availability is a must for any business these days. You lose money if you have a failure and you suffer downtime. Here we'll list Linux HA options.
- Corosync Cluster Engine
- The Corosync Cluster Engine is an OSI Certified implementation of a complete cluster engine. It is a widely distributed de-facto standard cluster infrastructure for application high availability.
- LifeKeeper
- SteelEye's LifeKeeper for Linux is a software application that ensures the continuous availability of applications by maintaining system uptime. LifeKeeper maintains the high availability of Linux cluster systems by monitoring system and application health, maintaining client connectivity and providing uninterrupted data access regardless of where clients reside - on the corporate Internet, intranet or extranet.
- Linux-HA Project
- The Linux-HA project maintains a set of building blocks for high availability cluster systems, including a cluster messaging layer, a huge number of resource agents for a variety of applications, and a plumbing library and error reporting toolkit.
Linux-HA is perhaps the best known of all the Linux HA software, home to the Heartbeat Daemon. - OpenAIS
- The OpenAIS Standards Based Cluster Framework is an OSI Certified implementation of the Service Availability Forum Application Interface Specification (AIS). The Application Interface Specification is a software API and policies which are used to develop applications that maintain service during faults. Restarting and failover of applications is also provided for those deploying applications which may not be modified. The OpenAIS software is built to operate on the Corosync Cluster Engine which allows any third party to implement plugin cluster services using the infrastructure provided.
- Red Hat Cluster Suite
- Red Hat Cluster Suite provides two distinct types of cluster:
* Application/Service Failover - Create n-node server clusters for failover of key applications and services
* IP Load Balancing - Load balance incoming IP network requests across a farm of servers
With Red Hat Cluster Suite, applications can be deployed in high availability configurations so that they are always operational-bringing "scale-out" capabilities to Enterprise Linux deployments. - Veritas Cluster Server
- VeritasTM Cluster Server from Symantec is a high availability solution for reducing both planned and unplanned downtime. By monitoring the status of applications and automatically moving them to another server in the event of a fault, Cluster Server can dramatically increase the availability of an application or database.
Load Balancing Software
Load balancing of certain services, such as mail or web, is a must for any organization that needs to service a large client base.
- Keepalived
- Keepalived is a userspace daemon for LVS cluster nodes healthchecks and LVS directors failover. Keepalived implements a framework that gives the daemon the ability of checking a LVS server pool states. When one of the server of the LVS server pool is down, keepalived informs the Linux kernel to remove this server entry from the LVS topology.
- Linux Virtual Server
- The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system. The architecture of the server cluster is fully transparent to end users, and the users interact as if it were a single high-performance virtual server. The Linux Virtual Server as an advanced load balancing solution can be used to build highly scalable and highly available network services, such as scalable web, cache, mail, ftp, media and VoIP services.
- Red Hat Cluster Suite
- Red Hat Cluster Suite provides two distinct types of cluster:
* Application/Service Failover - Create n-node server clusters for failover of key applications and services
* IP Load Balancing - Load balance incoming IP network requests across a farm of servers
With Red Hat Cluster Suite, applications can be deployed in high availability configurations so that they are always operational-bringing "scale-out" capabilities to Enterprise Linux deployments.
High Performance Computing Cluster Software
Computational/High Performance Linux clusters started back in 1994 when Donald Becker and Thomas Sterling built a cluster for NASA. This cluster was made up of 16 DX4 processors connected by 10 Mbit Ethernet, and they named it Beowulf.
There are a variety of software packages to build and manage these kinds of systems.
There are a variety of software packages to build and manage these kinds of systems.
- ISLE Cluster Manager
- SGI's ISLE Cluster Manager provides the necessary power and flexibility to monitor essential system metrics from a single point of control. ISLE Cluster Manager reduces the time and resources spent administering the system by improving software maintenance procedures and automating repetitive tasks. The comprehensive features of ISLE Cluster Manager help lower total cost of system ownership, increase productivity, and provide a better return on your investment.
- MOSIX
- MOSIX is an on-line management system targeted for high performancer computing on Linux clusters, multi-clusters and Clouds. It supports both interactive processes and batch jobs. MOSIX can be viewed as a multi-cluster operating system that incorporates automatic resource discovery and dynamic workload distribution, commonly found on single computers with multiple processors.
- OSCAR
- OSCAR allows users, regardless of their experience level with a *nix environment, to install a Beowulf type high performance computing cluster. It also contains everything needed to administer and program this type of HPC cluster.
- Platform HPC Workgroup Manager
- Platform HPC Workgroup Manager includes familiar easy-to-use cluster management tools such as Platform Cluster Manager, Platform LSF Workgroup Edition, Platform MPI, Platform LSF web interface, and Platform ISF Adaptive Cluster.
- Rocks
- Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls.
- Scyld Clusterware
- Scyld ClusterWare is an HPC cluster management solution. It is designed to make the deployment and management of a Linux cluster as easy as the deployment and management of a single system.
- xCAT
- xCAT offers complete and ideal management for HPC clusters, RenderFarms, Grids, WebFarms, Online Gaming Infrastructure, Clouds, Datacenters, and whatever tomorrow's buzzwords may be. It is agile, extendable, and based on years of system administration best practices and experience.
Miscellaneous Useful Links
Pages/sites that you'll probably find helpful.
- Cluster Builder
- ClusterBuilder.org assists cluster administrators, technical evaluators and purchase evaluators to research popular information about components they may need in their current or future cluster, grid or utility hosting environments.
- Cluster Monkey
- A portal with news, articles, and features about Linux clustering and high performance computing.
- freshmeat
- freshmeat maintains the Web's largest index of Unix and cross-platform software, themes and related "eye-candy", and Palm OS software. Thousands of applications, which are preferably released under an open source license, are meticulously cataloged in the freshmeat database, and links to new applications are added daily.
- Furbeowulf Cluster Computing
- The name says it all.
- HPC Wire
- HPCwire is a news and information site covering the entire ecosystem of High Productivity Computing (HPC), target at an audience interested in computationally- and data-intensive computing, including infrastructure topics such as software, middleware, hardware, networking, storage, tools and applications.
- IEEE Technical Committee On Scalable Computing
- The IEEE Technical Committee on Scalable Computing (TCSC) addresses theoretical and experimental aspects of designing, developing, and evaluating scalable network computing systems, especially clusters and grids, and their applications. Specific topics of interest include cluster and grid interconnection networks, middleware, single-system image, resource and scheduling management policies, distributed programming environments, principles of scalable and reliable software engineering, and high-performance and high-availability computing applications. The TCSC sponsors workshops and conferences on these and related topics.
- LinuxHPC.org
- LinuxHPC.org is a website for system administrators, developers, and technical managers, offering recent industry news, events, mailing lists and links, etc. related to high performance technical computing and clustering with Linux.
- Top500
- Top500 is the home of the twice annual ranking of the top 500 supercomputers in the world.
Linux Clustering Books
Books about building computational Linux clusters
General High Performance Computing Books
Books about high performance computing in general, not specifically about Linux clusters.
High Availability Books
Distributed/Parallel Computing Books
Parallel Programming Books
TODO
As this page is still quite new, here are the things I that are not in yet but will be added.
- Software - file systems, installation, monitoring, management/administration, batch queues & schedulers
- Documentation - useful online docs
by suprmnsdead
I'm a computer geek who's been using Linux since the mid-90s, and got interested in various forms of clustering around 10 years ago. I'll fill this i... (more)





