Tuesday, December 2, 2008

Virtual Machine versus Computer Cluster

Virtual machine is a machine that does not exist physically but from the perspective of the one who use it, this machine is the real one. How can we do that? There is a technology so called virtualization that allows in one machine to be installed more than one operating system. You can find how to do this from the other posting. What I want to write here is the background of virtualization. Why do we need virtual machine? See, every computer include a server, most of their time they will be in idle state, let's say only 10% of the cpu capability they are using except for cpu intensive application. To increase that utilization of cpu we make a virtual machine on a real machine, for instance in one machine we can make four virtual machine. If it is linear we can say the number of utilization of cpu became 40%.

In the other hand, for a program such cpu intensive application, we cannot buy such machine that has very powerful processor/machine. Rather than buying a new one which is so expensive we can use the old machine to overcome that problem. How? We can make four old machine as a cluster, from the perspective of the user they see that this is one machine which is very strong. How to do that, see in the other posting in this blog.

The point is that the background of the two things is antipodes. So how can we make virtual cluster? What is the background of it?

Wednesday, July 9, 2008

Network Virtualization

The term “network virtualization” describes the ability to refer to network resources logically rather than having to refer to specific physical network devices, configurations, or collections of related machines. There are many different levels of network virtualization, ranging from single-machine, network-device virtualization that enables multiple virtual machines to share a single physical-network resource, to enterprise-level concepts such as virtual private networks and enterprise-core and edge-routing techniques for creating subnetworks and segmenting existing networks.
Xen relies on network virtualization through the Linux bridge-utils package to enable your virtual machines to appear to have unique physical addresses (Media Access Control, or MAC, addresses) and unique IP addresses. Other server-virtualization solutions, such as UML, use the Linux virtual Point-to-Point (TUN) and Ethernet (TAP) network devices to provide user-space access to the host’s network. Many advanced network switches and routers use techniques such as Virtual Routing and Forwarding (VRF), VRF-Lite, and Multi-VRF to segregate customer traffic into separately routed network segments and support multiple virtual-routing domains within a single piece of network hardware.
Taken from : William Von Hagen "Professional Xen Virtualization" 2008

Desktop Virtualization

The term “desktop virtualization” describes the ability to display a graphical desktop from one computer system on another computer system or smart display device. This term is used to describe software such as Virtual Network Computing (VNC, http://en.wikipedia.org/wiki/VNC), thin clients such as Microsoft’s Remote Desktop (http://en.wikipedia.org/wiki/Remote_Desktop_Protocol) and associated Terminal Server products, Linux terminal servers such as the Linux Terminal Server project (LTSP, http://sourceforge.net/projects/ltsp/), NoMachine’s NX (http://en.wikipedia.org/wiki/NX_technology), and even the X Window System (http://en.wikipedia.org/wiki/X_Window_System) and its XDMCP display manager protocol. Many window managers, particularly those based on the X Window System, also provide internal support for multiple, virtual desktops that the user can switch between and use to display the output of specific applications. In the X Window System, virtual desktops were introduced in versions of Tom LeStrange’s TWM window manager (www.xwinman.org/vtwm.php, with a nice family tree at www.vtwm.org/vtwm-family.html), but are now available in almost every other window manager. The X Window System also supports desktop virtualization at the screen or display level, enabling window managers to use a display region that is larger than the physical size of your monitor.

In my opinion, desktop virtualization is more of a bandwagon use of the term “virtualization” than an exciting example of virtualization concepts. It does indeed make the graphical console of any supported system into a logical entity that can be accessed and used on different physical computer systems, but it does so using standard client/server display software. The remote console, the operating system it is running, and the applications you execute are actually still running on a single, specific physical machine — you’re just looking at them from somewhere else. Calling remote display software a virtualization technology seems to me to be equivalent to considering a telescope to be a set of virtual eyeballs because you can look at something far away using one. Your mileage may vary.
Taken from : William Von Hagen "Professional Xen Virtualization" 2008

Application Virtualization

The term “application virtualization” describes the process of compiling applications into machine- independent byte code that can subsequently be executed on any system that provides the appropriate virtual machine as an execution environment. The best known example of this approach to virtualization is the byte code produced by the compilers for the Java programming language (http://java.sun.com/), although this concept was actually pioneered by the UCSD P-System in the late 1970s (www.threedee.com/jcm/psystem), for which the most popular compiler was the UCSD Pascal compiler. Microsoft has even adopted a similar approach in the Common Language Runtime (CLR) used by .NET applications, where code written in languages that support the CLR are transformed, at compile time, into CIL (Common Intermediate Language, formerly known as MSIL, Microsoft Intermediate Language). Like any byte code, CIL provides a platform-independent instruction set that can be executed in any environment supporting the .NET Framework.

Application virtualization is a valid use of the term “virtualization” because applications compiled into byte code become logical entities that can be executed on different physical systems with different characteristics, operating systems, and even processor architectures.
Taken from : William Von Hagen "Professional Xen Virtualization" 2008

What Is Vir tualization?

Virtualization is simply the logical separation of the request for some service from the physical resources that actually provide that service. In practical terms, virtualization provides the ability to run applications, operating systems, or system services in a logically distinct system environment that is independent of a specific physical computer system. Obviously, all of these have to be running on a certain computer system at any given time, but virtualization provides a level of logical abstraction that liberates applications, system services, and even the operating system that supports them from being tied to a specific piece of hardware. Virtualization’s focus on logical operating environments rather than physical ones makes applications, services, and instances of an operating system portable across different physical computer systems.

The classic example of virtualization that most people are already familiar with is virtual memory, which enables a computer system to appear to have more memory than is physically installed on that system. Virtual memory is a memory-management technique that enables an operating system to see and use noncontiguous segments of memory as a single, contiguous memory space. Virtual memory is traditionally implemented in an operating system by paging, which enables the operating system to use a file or dedicated portion of some storage device to save pages of memory that are not actively in use.

Known as a “paging file” or “swap space,” the system can quickly transfer pages of memory to and from this area as the operating system or running applications require access to the contents of those pages. Modern operating systems such as UNIX-like operating systems (including Linux, the *BSD operating systems, and Mac OS X) and Microsoft Windows all use some form of virtual memory to enable the operating system and applications to access more data than would fit into physical memory.
Taken from : William Von Hagen "Professional Xen Virtualization" 2008

Introduction to Virtualization Techniques

With server virtualization, you can create multiple virtual servers on a single physical server. Each virtual server has its own set of virtual hardware on which operating systems and applications are loaded. IBM systems with virtualization can prioritize system resources and allocate them dynamically to the virtual servers that need them most at any given time—all based on business priorities.

Virtualization was first introduced by IBM in the 1960s to allow the partitioning of large mainframe environments. IBM has continued to innovate around server virtualization and has extended it from the mainframe to the IBM Power Systems, IBM System p, and IBM System i™ product lines. In the industry-standard environment, VMware, Microsoft® Virtual Server, and Xen offerings are available for IBM System x and IBM BladeCenter systems. Today, IBM server virtualization technologies are at the forefront in helping businesses with consolidation, cost management, and business resiliency.

IBM recognized the importance of virtualization with the development of the System/360 Model 67 mainframe. The Model 67 virtualized all of the hardware interfaces through the Virtual Machine Monitor, or VMM. In the early days of computing, the operating system was called the supervisor. With the ability to run operating systems on other operating systems, the term hypervisor resulted (a term coined in the 1970s). Logical partitioning has been available on the mainframe since the 1980s. The Power team began taking advantage of the mainframe partitioning skills and knowledge about 10 years ago and brought forth Dynamic LPARs with POWER4™ and then Advanced POWER Virtualization with POWER5™ in 2004 (which was re-branded to PowerVM™ in 2008).

There are several types of virtualization.1 In this chapter, we describe them in order to position the relative strengths of each and relate them to the systems virtualization offerings from IBM and IBM Business Partners.

Source : IBM Systems Virtualization : System, Application, Software

Tuesday, June 3, 2008

Building Windows Clusters

Hardware
Before starting, you have to have following hardware and software. You have at least two computers with Windows NT, SP6 or Windows 2000 networked with some sort of LAN equipment (hub, switch etc.). Ensure during the Windows set up phase that TCP/IP, and NETBUI are installed, and that the network is started, with all the network cards detected and the correct drivers installed. We will call these two computers as Windows cluster. Ok, now you need some sort of software that will help you to develop, deploy and execute application over this cluster. This software is the core what makes a Windows cluster possible.

Software
The Message Passing Interface (MPI) is an evolving de facto standard for supporting cluster computing based on message passing. There are several implementations of this standard. In this article, we will use MPICH, which is freely available, and you can download it from here for windows clustering, and find related documentation here. Please read Quick Start.pdf and manual before starting following steps.
Step 1: Download and unzip nt-mpich-1.3.0-a.zip onto any folder (for example C:\NT-MPICH) and share this folder with write permission.
Step 2: Copy all files with .dll extension from C:\NT-MPICH\lib to folder C:\Windows\system32
Step 3: Install the Cluster Manager Service on each host you want to use for remote execution of MPI processes. For installation, start rcluma-install.bat (located in subdirectory C:\NT-MPICH\bin) by double-clicking from local or network-drive. You must have administrator rights on the hosts to install the service.
Step 4: Follow step 1 and 2 for each node in the cluster (we will name each computer in the cluster as node)
Step 5: Now Start RexecShell (from folder C:\NT-MPICH\bin) by double-clicking it. Open the configuration dialog by pressing F2. The distribution contains a precompiled example MPI program named cpi.exe (located in NT-MPICH/bin). Choose it as the actual program. Make sure that each host can reach cpi.exe at the specified path. Choose ch_wsock as active plug-in. Select the hosts to compute on. On the tab 'Account', enter your username, domain and password, which need to be valid on each host chosen. Press OK to confirm your selections. The Start Button (from Window RexecShell) is now enabled and can be pressed to start cpi.exe on all chosen hosts. The output will be displayed in separate windows.

Source : http://www.devbuilder.org/article/24

Definition and Benefits from Clustering

Greg Pfister, in his wonderful book In Search of Clusters, defines a cluster as "a type of parallel or distributed system that: consists of a collection of interconnected whole computers, and is used as a single, unified computing resource." Therefore, cluster is a group of computers, bound together into a common resource pool. A given task can be executed on all computers or on any specific computer in the cluster. Lets look into the benefits from clustering:
Scientific applications: Enterprise running scientific applications on supercomputers can benefit from migrating to more cost effective Linux cluster.
Large ISPs and E-Commerce enterprise with large database: Internet service providers or e-commerce web sites that require high availability and load balancing and scalability.
Graphics rendering and animation: a Linux cluster has become important in the film industry for rendering quality graphics. In the movie Titanic, a Linux cluster was used to render background in ocean scenes. Same concept used in movies True Lies and Interview with the Vampire. One may also characterize clusters by their function:
Definition and Benefits from Clustering: Tasks (small piece of executable codes) are broken down and worked on by many small systems rather than one large system, often deployed for task previously handled by supercomputers. This type of cluster is very suitable for scientific or financial analysis.
Fail-over clusters: Clusters are used to increase the availability and serviceability of network services. When an application or server fails, its services are migrated to another system, the identity of failed system also migrated. Failover servers are used for database servers, mail servers or file servers.
High availability load balancing clusters: A given application can run on all computers and a given computer can host multiple applications. The ?outside world? interacts with the cluster and individual computers are ?hidden?. It support large cluster pool and application do not need to be specialized. High availability clustering works best with stateless application ands that can be run concurrently.


Source : http://www.devbuilder.org/article/24

Tuesday, May 27, 2008

Cluster history

The history of cluster computing is best captured by a footnote in Greg Pfister's In Search of Clusters: "Virtually every press release from DEC mentioning clusters says 'DEC, who invented clusters...'. IBM didn't invent them either. Customers invented clusters, as soon as they couldn't fit all their work on one computer, or needed a backup. The date of the first is unknown, but I'd be surprised if it wasn't in the 1960's, or even late 1950's."

The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported "inside" the computer (on for example a customized internal communications bus or network) or "outside" the computer on a commodity network.

Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivation for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world's first commodity-network based computer cluster by linking four different computer centers (each of which was something of a "cluster" in its own right, but probably not a commodity cluster). The ARPANET project grew into the Internet -- which can be thought of as "the mother of all computer clusters" (as the union of nearly all of the compute resources, including clusters, that happen to be connected). It also established the paradigm in use by all computer clusters in the world today -- the use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames.

The development of customer-built and research clusters proceded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at C-MU in 1971. However, it wasn't until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available in commercially, along with a shared filesystem.

The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet wasn't a commercial success and clustering per se didn't really take off until DEC released their VAXcluster product in the 1984 for the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. They were supposed to give you the advantage of parallel processing, while maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems.

Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use).

No history of commodity compute clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputer -- a high performance compute cluster -- made out of any TCP/IP connected systems. Free form heterogeneous clusters built on top of this model rapidly achieved total throughput in FLOPS that greatly exceeded that available even with the most expensive "big iron" supercomputers. PVM and the advent of inexpensive networked PC's led, in1993, to a NASA project to build supercomputers out of commodity clusters. In 1995 the invention of the "beowulf"-style cluster -- a compute cluster built on top of a commodity network for the specific purpose of "being a supercomputer" capable of performing tightly coupled parallel HPC computations. This in turn spurred the independent development of Grid computing as a named entity, although Grid-style clustering had been around at least as long as the Unix operating system and the Arpanet, whether or not it, or the clusters that used it, were named.
Taken from : http://www.clusterbuilder.org/pages/encyclopedia/alphabetized/c/computer-cluster.php

Monday, May 26, 2008

Cluster is available in internet

What is Cluster Computing?

Simply put, if you need more processing power you need more CPUs. You can get the CPUs from a service provider like TTI. Recently, the industry has used confusing terminology to describe this. Sometimes called "grid computing" or "utility computing", it is simply a way of efficiently utilizing the computational power of many servers (called nodes) for one task. The earliest known term for this is "cluster computing". We use Linux clusters to deliver the processing power. Clusters allow the greatest flexibility for us so that we can deliver the best service to you. Also, unlike grid or utility computing where the nodes can be separated geographically, our nodes are managed in one location - at our facility.

What can clusters be used for?

Just about any application. Typically they are used for computationally intensive problems that require a lot of runtime. Some problems require thousands of CPU hours to complete. Examples of these problems include: Rendering, Modeling, Quantum Mechanics, Bioinformatics, Molecular Dynamics, Statistics, Economics, Genetics, OCR, Fluid Dynamics, data processing and much more.

How does it work?:

Does my application need to be designed for a cluster for me to use TTI cluster computing services?

No. Almost any application can be adapted to run on our cluster. A common use for a cluster is data processing. For example, if your workstation takes 10 hours to process your data set, it might take just 1 hour to process the same set using 10 nodes on the TTI cluster. The data is partitioned into 10 smaller units, each cluster node processes one unit. Our service makes it easy to do this.

What do I need to use your service, how do I access the cluster?

You need an executable or source code. We also have a number of compilers available if you need to compile your code. You can also use just about any open source software.

To access the service, you need a Secure Shell (ssh) client. ssh is available free for Windows and comes with most Unix/Linux distributions.

I've never used a cluster before. How do I get help?
We work with you during your free trial period to make your service as easy to use as possible. All our services come with technical support. So, if you are a first-time Linux user or an expert, we want to provide you with hassle-free cluster computing services, so you feel no pressure or obligation.

How do I start?

Apply for a no obligation account. Or give us a call. We'd be happy to discuss your
computing needs.
Source : http://www.tsunamictechnologies.com/how.htm

COD: Cluster-on-Demand

Clustering inexpensive computers is an effective way to obtain reliable, scalable computing power for network services and compute-intensive applications. Since clusters have a high initial cost of ownership, including space, power conditioning, and cooling equipment, leasing or sharing access to a common cluster is an attractive solution when demands vary over time. Shared clusters offer economies of scale and more effective use of resources by multiplexing.

Users of a shared cluster should be free to select the software environments that best support their needs. Cluster-on-Demand (COD) is a system to enable rapid, automated, on-the-fly partitioning of a physical cluster into multiple independent virtual clusters. A virtual cluster (vcluster) is a group of machines (physical or virtual) configured for a common purpose, with associated user accounts and storage resources, a user-specified software environment, and a private IP address block and DNS naming domain. COD vclusters are dynamic; their node allotments may change according to demand or resource availability.

COD was inspired by Oceano, an IBM Research project to automate a Web server farm. Like Oceano, COD leverages remote-boot technology to reconfigure cluster nodes using database-driven network installs from a set of user-specified configuration templates, under the direction of a policy-based resource manager. Emulab uses a similar approach to configure groups of nodes for network emulation experiments on a shared testbed. COD is complementary to both of these efforts: it decouples cluster management functions from network emulation, and adds a hierarchical framework for dynamic resource management that generalizes to multiple classes of cluster applications.

source : http://issg.cs.duke.edu/cod/

Virtual Cluster Markup Language (VCML)

Friday, May 23, at 3.00 pm, I knocked my supervisor's room door and I opened it. I said to him "Good afternoon Sir" and I came in to his room went to his table. I sat there. "Sir, I have collected this research paper" I showed him my collection of research paper. He started reading the abstract paper one by one, and after read the second paper entitled 'Virtual Clusters' he asked me a question. "what is virtual cluster?" He said.

I drew a machine/hardware, and I explained him that on top of that machine we can put a thin layer such as VMWare, and than on top of it we can install more than 1 OS, suppose we install 3 different OS (suppose Linux, Windows and Mac) on top of that thin layer. In that case the thin layer will provide a fake processor, memory, ethernet card, and everything. So, from outside others will sew that three machine as a three independent machine. Completely independent, that why we call that three machine as a virtual machine. In top of that three virtual machine we can put a thin layer such as OpenMosix as a tool to make that three machine as a cluster. Because of that machine that make a cluster is not a real machine (virtual machine) so we can all it as a virtual cluster. Others will see that this three machine as a single machine. That is virtual cluster.

"So what is the purpose of the virtual machine?" he asked me. "To increase the utilization of the machine" I said. OK he said and than he think. From his face I knew that he did not 100% agree. He asked again, "how if we install the same OS in the three machine". "Yes we can do it sir". He thank hard, and he explained me about real cluster. Imagine that there are 100 machine and we can do make a cluster with that machine. That machine already connected. One time we need a cluster that consist of 5 machine with specification this, this, this. Without any change of the wiring we can see the cluster as we want it. So there is a HTML right, why if wwe create a VCML, means that with that language we can make a cluster as what mentioned in the VCML without doing any wiring". "Yes sir" I said.

"Oh God, it is a very good idea", I said to him. "So there are two definition of virtual cluster, number one as my definition, and the second one is your definition, I never think about it sir". "When you came I thank about it", he said. "Oh, very fast sir", he smiled.

"What is the purpose of that sir, the second definition?". "You don't understand, OK I wil send you to Chennai, do you want to visit Chennai". "Yes sir I said". "OK I will arrange your departure, because there is a group about cluster, than you can study about cluster over there, you will be there for one week, I will send a message to my friend the Head of Department in Chennai" OK sir

"OK, in this case, meet me next week in the same day and time, and you explain me about different definition about virtual cluster. And in the and of this semester in DRC (Doctorate Review Committee) you have to have a problem". "OK sir, see you next week"


Wednesday, May 21, 2008

A cluster Computer and Its Architecture

A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
A computer node can be a single or multiprocessor system (PCs, workstations, or SMPs) with memory, I O facilities, and an operating system. A cluster generally refers to two or more computers (nodes) connected together. The nodes can exist in a single cabinet or be physically separated and connected via a LAN. An inter- connected (LAN-based) cluster of computers can appear as a single system to users and applications. Such a system can provide a cost-evective way to gain features and benefits (fast and reliable services) that have historically been found only on more expensive proprietary shared memory systems. The typical architecture of a cluster is shown in Figure above

The following are some prominent components of cluster computers:
Multiple High Performance Computers (PCs, Workstations, or SMPs)
State-of-the-art Operating Systems (Layered or Micro-kernel based)
High Performance Networks Switches (such as Gigabit Ethernet and Myrinet)
Network Interface Cards (NICs)
Fast Communication Protocols and Services (such as Active and Fast Messages)
Cluster Middleware (Single System Image (SSI) and System Availability Infrastructure)
Hardware (such as Digital (DEC) Memory Channel, hardware DSM, and SMP techniques)
Operating System Kernel or Gluing Layer (such as Solaris MC and GLUn ix)
Applications and Subsystems
Applications (such as system management tools and electronic forms)
Run-time Systems (such as software DSM and parallel file-system)
Resource Management and Scheduling software (such as LSF (Load Sharing Facility) and CODINE (COmputing in DIstributed Net-worked Environments))
Parallel Programming Environments and Tools (such as compilers, PVM (Parallel Virtual Machine), and MPI (Message Passing Interface))
Applications
Sequential
Parallel or Distributed
The network interface hardware acts as a communication processor and is responsible for transmitting and receiving packets of data between cluster nodes via a network switch.
Communication software offers a means of fast and reliable data communication among cluster nodes and to the outside world. Often, clusters with a special network switch like Myrinet use communication protocols such as active messages for fast communication among its nodes. They potentially bypass the operating system and thus remove the critical communication overheads providing direct user-level access to the network interface.
The cluster nodes can work collectively as an integrated computing resource, or they can operate as individual computers. The cluster middleware is responsible for offering an illusion of a united system image (single system image) and availability out of a collection on independent but interconnected computers.
Programming environments can offer portable, efficient, and easy-to-use tools for development of applications. They include message passing libraries, debuggers, and profilers. It should not be forgotten that clusters could be used for the execution of sequential or parallel applications.

Reference/Source
Parallel Programming Models and Paradigms
Lui s Moura Silva and Rajkumar Buyya

Monday, May 19, 2008

High-Performance Computing (HPC)

High-Performance Computing (HPC) is a branch of computer science that focuses on developing supercomputers, parallel processing algorithms, and related software. HPC is important because of its lower cost and because it is implemented in sectors where distributed parallel computing is needed to:

 Solve large scientific problems
– Advanced product design
– Environmental studies (weather prediction and geological studies)
– Research
 Store and process large amounts of data
– Data mining
– Genomics research
– Internet engine search
– Image processing

High Availability (HA) clusters

HA clusters are not easily categorized. Indeed, we are sure that many people can offer valid reasons for why a different logical structure of organization would be appropriate. Our logical structure of organization is based on function. For example, we would organize a database cluster or a server consolidation cluster under the heading of an HA cluster, since their paramount design consideration is usually high availability.

In a typical HA cluster, there are two or more fairly robust machines which mirror each other’s functions. Two schemes are typically used to achieve this. In the first scheme, one machine is quietly watching the other machine and waiting to take over in case of a failure.

The other scheme allows both machines to be active. In this environment, care should be taken to keep the load below 50 percent on each box or else there could be capacity issues if a node were to fail. These two nodes typically have a shared disk drive array comprised of either a small computer system interface (SCSI) or a Fibre Channel; both nodes talk to the same disk array.

Or, instead of having both nodes talking to the same array, you can have two separate arrays that constantly replicate each other to provide for fault tolerance. Within this subsystem, it is necessary to guarantee data integrity with file and/or record locking. There must also be a management system in place allowing each system to monitor and control the other in order to detect an error. If there is a problem, one system must be able to incapacitate the other machine, thus preserving data integrity.

There are many ways of designing an HA cluster and the list is growing.

2 Categories of Clusters

All clusters basically fall into two broad categories: High Availability (HA) and High-Performance Computing (HPC). HA clusters strive to provide extremely reliable services. HPC is a cluster configuration designed to provide greater computational power than one computer alone could provide.

In The Beginning of Cluster

Over the years there have been dramatic increases in computing power and capabilities, but none so dramatic as recently. Early mathematical computations were facilitated by lines drawn in the sand. This eventually lead to the abacus, the first mechanical device for assisting with mathematics. Much later came punch cards, a mechanical method to assist with tabulation. Ultimately, this led to ever more complex machines, mechanical and electronic, for computation.

Today, a small handheld calculator has more computing power than that available to the Apollo missions that went to the moon. Early computers used small toroids to store hundreds or thousands of bits of information in an area the size of a broom closet. Modern computers use silicon to store billions of bits of information in a space not much larger than a postage stamp.

But even as computers become more capable, certain constraints still arise. Early computers worked with 8 bits, or a byte, to solve problems. Most modern computers work with 32 bits at a time, with many dealing with 64 bits per operation, which is similar to increasing the width of a highway. Another method for increasing performance is to increase the clock speed, which is similar to raising the speed limits. So, modern computers are the equivalent of very wide highways with very fast speed limits.

However, there are limits to the performance benefits that can be achieved by simply increasing the clock speed or bus width. In this redbook, we present an alternative approach to increasing computing power. Instead of using one computer to solve a problem, why not use many computers, in concert, to solve the same problem?

Logical functions that a node can provide

As we stated before, a cluster is two or more (often many more) computersworking as a single logical system to provide services. Though from the outsidethe cluster may look like a single system, the internal workings to make thishappen can be quite complex.

This figure presents the logical functions that a physical node in a cluster can provide. Remember, these are logical functions; in some cases, multiple logical functions may reside on the same physical node, and in other cases, a logical function may be spread across multiple physical nodes.
Compute node
The compute node is where the real computing is performed. The majority of the nodes in a cluster are typically compute nodes. In order to provide an overall solution, a compute node can execute one or more tasks, based on the scheduling system.

Management node
Clusters are complex environments, and the management of the individual components is very important. The management node provides many capabilities, including:
  •  Monitoring the status of individual nodes
  •  Issuing management commands to individual nodes to correct problems or to provide commands to perform management functions, such as power on/off
You should not underestimate the importance of cluster management. It is an imperative when trying to coordinate the activities of a large numbers of systems.

Install node
In most clusters, the compute nodes (and other nodes) may need to be reconfigured and/or reinstalled with a new image relatively often. The install node provides the images and the mechanism for easily and quickly installing or reinstalling software on the cluster nodes.

User node
Individual nodes of a cluster are often on a private network that cannot be accessed directly from the outside or corporate network. Even if they are accessible, most cluster nodes would not necessarily be configured to provide an optimal user interface. The user node is the one type of node that is configured to provide that interface for users (possibly on outside networks) who may gain access to the cluster to request that a job be run, or to access the results of a previously run job.

Control node
Control nodes provide services that help the other nodes in the cluster work together to obtain the desired result. Control nodes can provide two sets of functions:
  •  Dynamic Host Configuration Protocol (DHCP), Domain Name System (DNS), and other similar functions for the cluster. These functions enable the nodes to easily be added to the cluster and to ensure they can communicate with the other nodes.
  •  Scheduling what tasks are to be done by what compute nodes. For instance,if a compute node finishes one task and is available to do additional work, thecontrol node may assign that node the next task requiring work.
Storage node
For some applications that are run in a cluster, compute nodes must have fast, reliable, and simultaneous access to the storage system. This can be accomplished in a variety of ways depending on the specific requirements of the application. Storage devices may be directly attached to the nodes or connected only to a centralized node that is responsible for hosting the storage requests.

Introduction to Virtual Cluster

In its simplest form, a cluster is two or more computers that work together to provide a solution. This should not be confused with a more common client- server model of computing where an application may be logically divided such that one or more clients request services of one or more servers. The idea behind clusters is to join the computing powers of the nodes involved to provide higher scalability, more combined computing power, or to build in redundancy to provide higher availability. So rather than a simple client making requests of one or more servers, clusters utilize multiple machines to provide a more powerful computing environment through a single system image.

An High-Performance Computing cluster typically has a large number of computers (often called nodes) and, in general, most of these nodes would be configured identically. The idea is that the individual tasks that make up a parallel application should run equally well on whatever node they are dispatched on.

However, some nodes in a cluster often have some physical and logical differences. In the following sub-sections we discuss logical node functions and then physical node types.