Open Cluster Framework FSG102-1 statement

As part of becoming a working group of the Free Standards Group , it is necessary to make some statements in conformance with their processes. The first of these statements is the FSG102-1 statement. This document is a draft FSG102-1 statement for the Open Cluster Framework effort. The next few paragraphs are taken from the FSG102 document.

FSG102-1 is an initial declaration of the problem and an abstract of the proposed solution. This document should be revised throughout the formation of the Workgroup to reflect any changes and will act as the basic justification of further work. It should include the following:

A general description of the current problem, from as many perspectives (user, developer, etc.) as a standard might help.
A very brief abstract of the proposed solution, in the form of a standard. The abstract should be sufficiently complete to fully describe the benefits of such a standard. Initially, industry or technical hurdles should be largely ignored. Later revisions may take into account more variables.
Existing software, partial solutions, etc.
Existing free software projects related to the proposed standard.
Companies and organizations in the field that would benefit from the standard. This should include general industry descriptions, as well as major potential participants.
Other parties that would benefit from the standard, including free software projects or classes of end users.

The remainder of this document is intended to address these questions. Each of these points in the FSG102-1 statement is a section in the document, named from the text of the outline above.

FSG102-1a: A general description of the current problem

There are currently at least 5 OSS HA clustering solutions, and more than 25 total HA clustering products for Linux. There are also many HP clustering products/projects. These projects and products are largely incompatible with each other.

OSS Clustering developers duplicate each others' effort significantly, yet have little desire to do so. History, a lack of standards and a common framework forces them to duplicate each other's effort. There is one component shared by a few of them which served as a proof of concept of having a common APIs and components to share among OSS systems.

Proprietary clustering systems each provide interfaces to cluster-aware applications. Each of them fights for the "mindshare" of middleware vendors such as database system vendors, web servers, etc. to integrate them into their system.

Middleware vendors like Oracle and IBM are besieged with requests from the many HA product vendors to either make their product interface with the particular clustering system, or to provide technical support for the HA vendor to provide the integration.

Certain OS components (cluster filesystems and volume managers) must either interface with the membership layer of the solution, or they will be incompatible with it. Dire things can happen when you try and have two cluster managers manage a cluster simultaneously. These layers must either have a standard interface they work to, or provide "n" different interfaces to the "n" cluster managers currently available.

End users have a difficult choice when looking at 25 different clustering systems and dozens of different cluster-aware components. Most but not all are mutually incompatible, and each provides different capabilities and interoperates with different components in different ways. In most proprietary OSes, the OS vendor creates defacto standards for clustering on their platform. There is no such "900 pound gorilla" in Linux - and there probably never will be.

To say this extreme fragmentation is a confusing and unworkable situation is an understatement.

FSG102-1b: A brief abstract of the proposed solution

We propose then, to create a set of standard APIs which define some of the interfaces which most clustering systems export to their client programs. Although the architectures of the various clustering systems are quite different from each other, there are certain basic interfaces which serve the same purpose in very similar ways.

For our purposes, we have grouped the possible set of APIs into four areas:

Node services
Group services
Resource services
Lock Services
External Interfaces

In the first version of the standard we propose to cover APIs for certain node services and one class of resource services. The definitions of these areas is provided in detail at http://opencf.org/ .

The particular sets of services which we intend to define in the initial version of the standard:

Resource Agents
Node membership services
Node communication services

These areas were selected because they are common to most clustering systems, well-understood, and provide significant help with the problems discussed earlier.

FSG102-1c,d: Existing software, partial solutions, etc. Existing free software projects related to the proposed standard.

This section combines the answers to FSC-102-1c and FSG102-1d, since they overlap significantly.

There are currently five open source high-availability suites available: Linux-HA (heartbeat), Linux FailSafe, Kimberkite, COMPAQ's Cluster Infrastructure, and Red Hat's Piranha. Three of these share a common component (hence a set of interfaces). There are at lleast three major high-level high-performance projects, OSCAR, NPACI Rocks and Scyld. Each of these projects has expressed some amount of interest in participating in common standards for clustering APIs. The PVM project and the MPI standards are also relevant as well. Each of these projects implements, uses or defines cluster APIs which we think to be of interest.

See the list of software at http://linux-ha.org/ and also see the site http://lcic.org/ and http://foundries.sourceforge.net/clusters/

Specifically, I know these projects are directly related to 'c' above as well:

linux-ha (aka heartbeat)
COMPAQ 's Cluster Infrastructure project
Linux FailSafe (SuSE, SGI)
Piranha (Red Hat)
Kimberlite (Mission Critical)

A few other closely related (cluster-aware) OSS projects:

DLM - Distributed Lock Manager
DRBD
EVMS - Enterprise Volume Management System
GFS / OpenGFS
Intermezzo
Lustre
LVM - Linux Volume Manager (with cluster extensions)
LVS - Linux Virtual Server
OSCAR
MPICH
LAM
PVM

These projects has been involved to some degree so far.

FSG102-1e: Companies and organizations in the field that would benefit from the standard.

In addition to the companies listed in FSG102-1d, there are many other companies which stand to benefit. These include Conectiva Linux , Mandrake , Debian , IBM (several divisions), Oracle , Mysql, PostgreSQL , and other database vendors, SAP , telecommunications vendors and providers, proprietary HA and HP clustering vendors such as LinBit, MSC Linux, HighAvailability.com, Cluster File Systems Inc., Sistina, Stonesoft , Scyld, PolyServe, SteelEye, PolyServe, Clustra, Resonate, TurboLinux, Hewlett-Packard, Veritas, Fujitsu-Siemens and SAP. The following general kinds of companies also stand to benefit: clustering consulting firms and VARs, any server hardware or software vendor who wants their products to work in a Linux HA or HP clustering environment.

FSG102-1f: Other parties that would benefit from the standard

Anyone who wants or benefits from high-performance computing or high-availability computing. End users of clustering technology such as those listed as Linux-HA customers also stand to benefit significantly. More broadly, it also includes anyone who has a stake in the success of Linux as a server platform. This is basically everyone who sells, has, manages, or uses any kind of Linux based server.

In addition to the projects listed earlier, it is expected that any software which wishes to monitor a high-availability cluster, or monitor a service which is sometimes used in a high-availability cluster may also benefit. This includes Mon , Monit , PIKT , NOCOL/SNIPS , Big Brother , Netsaint , MAT , WebRat , the OpenNMS project, and Pegasus (a CIM implementation).