Tuesday, May 27, 2008

Cluster history

The history of cluster computing is best captured by a footnote in Greg Pfister's In Search of Clusters: "Virtually every press release from DEC mentioning clusters says 'DEC, who invented clusters...'. IBM didn't invent them either. Customers invented clusters, as soon as they couldn't fit all their work on one computer, or needed a backup. The date of the first is unknown, but I'd be surprised if it wasn't in the 1960's, or even late 1950's."

The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported "inside" the computer (on for example a customized internal communications bus or network) or "outside" the computer on a commodity network.

Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivation for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world's first commodity-network based computer cluster by linking four different computer centers (each of which was something of a "cluster" in its own right, but probably not a commodity cluster). The ARPANET project grew into the Internet -- which can be thought of as "the mother of all computer clusters" (as the union of nearly all of the compute resources, including clusters, that happen to be connected). It also established the paradigm in use by all computer clusters in the world today -- the use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames.

The development of customer-built and research clusters proceded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at C-MU in 1971. However, it wasn't until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available in commercially, along with a shared filesystem.

The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet wasn't a commercial success and clustering per se didn't really take off until DEC released their VAXcluster product in the 1984 for the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. They were supposed to give you the advantage of parallel processing, while maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems.

Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use).

No history of commodity compute clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputer -- a high performance compute cluster -- made out of any TCP/IP connected systems. Free form heterogeneous clusters built on top of this model rapidly achieved total throughput in FLOPS that greatly exceeded that available even with the most expensive "big iron" supercomputers. PVM and the advent of inexpensive networked PC's led, in1993, to a NASA project to build supercomputers out of commodity clusters. In 1995 the invention of the "beowulf"-style cluster -- a compute cluster built on top of a commodity network for the specific purpose of "being a supercomputer" capable of performing tightly coupled parallel HPC computations. This in turn spurred the independent development of Grid computing as a named entity, although Grid-style clustering had been around at least as long as the Unix operating system and the Arpanet, whether or not it, or the clusters that used it, were named.
Taken from : http://www.clusterbuilder.org/pages/encyclopedia/alphabetized/c/computer-cluster.php

Monday, May 26, 2008

Cluster is available in internet

What is Cluster Computing?

Simply put, if you need more processing power you need more CPUs. You can get the CPUs from a service provider like TTI. Recently, the industry has used confusing terminology to describe this. Sometimes called "grid computing" or "utility computing", it is simply a way of efficiently utilizing the computational power of many servers (called nodes) for one task. The earliest known term for this is "cluster computing". We use Linux clusters to deliver the processing power. Clusters allow the greatest flexibility for us so that we can deliver the best service to you. Also, unlike grid or utility computing where the nodes can be separated geographically, our nodes are managed in one location - at our facility.

What can clusters be used for?

Just about any application. Typically they are used for computationally intensive problems that require a lot of runtime. Some problems require thousands of CPU hours to complete. Examples of these problems include: Rendering, Modeling, Quantum Mechanics, Bioinformatics, Molecular Dynamics, Statistics, Economics, Genetics, OCR, Fluid Dynamics, data processing and much more.

How does it work?:

Does my application need to be designed for a cluster for me to use TTI cluster computing services?

No. Almost any application can be adapted to run on our cluster. A common use for a cluster is data processing. For example, if your workstation takes 10 hours to process your data set, it might take just 1 hour to process the same set using 10 nodes on the TTI cluster. The data is partitioned into 10 smaller units, each cluster node processes one unit. Our service makes it easy to do this.

What do I need to use your service, how do I access the cluster?

You need an executable or source code. We also have a number of compilers available if you need to compile your code. You can also use just about any open source software.

To access the service, you need a Secure Shell (ssh) client. ssh is available free for Windows and comes with most Unix/Linux distributions.

I've never used a cluster before. How do I get help?
We work with you during your free trial period to make your service as easy to use as possible. All our services come with technical support. So, if you are a first-time Linux user or an expert, we want to provide you with hassle-free cluster computing services, so you feel no pressure or obligation.

How do I start?

Apply for a no obligation account. Or give us a call. We'd be happy to discuss your
computing needs.
Source : http://www.tsunamictechnologies.com/how.htm

COD: Cluster-on-Demand

Clustering inexpensive computers is an effective way to obtain reliable, scalable computing power for network services and compute-intensive applications. Since clusters have a high initial cost of ownership, including space, power conditioning, and cooling equipment, leasing or sharing access to a common cluster is an attractive solution when demands vary over time. Shared clusters offer economies of scale and more effective use of resources by multiplexing.

Users of a shared cluster should be free to select the software environments that best support their needs. Cluster-on-Demand (COD) is a system to enable rapid, automated, on-the-fly partitioning of a physical cluster into multiple independent virtual clusters. A virtual cluster (vcluster) is a group of machines (physical or virtual) configured for a common purpose, with associated user accounts and storage resources, a user-specified software environment, and a private IP address block and DNS naming domain. COD vclusters are dynamic; their node allotments may change according to demand or resource availability.

COD was inspired by Oceano, an IBM Research project to automate a Web server farm. Like Oceano, COD leverages remote-boot technology to reconfigure cluster nodes using database-driven network installs from a set of user-specified configuration templates, under the direction of a policy-based resource manager. Emulab uses a similar approach to configure groups of nodes for network emulation experiments on a shared testbed. COD is complementary to both of these efforts: it decouples cluster management functions from network emulation, and adds a hierarchical framework for dynamic resource management that generalizes to multiple classes of cluster applications.

source : http://issg.cs.duke.edu/cod/

Virtual Cluster Markup Language (VCML)

Friday, May 23, at 3.00 pm, I knocked my supervisor's room door and I opened it. I said to him "Good afternoon Sir" and I came in to his room went to his table. I sat there. "Sir, I have collected this research paper" I showed him my collection of research paper. He started reading the abstract paper one by one, and after read the second paper entitled 'Virtual Clusters' he asked me a question. "what is virtual cluster?" He said.

I drew a machine/hardware, and I explained him that on top of that machine we can put a thin layer such as VMWare, and than on top of it we can install more than 1 OS, suppose we install 3 different OS (suppose Linux, Windows and Mac) on top of that thin layer. In that case the thin layer will provide a fake processor, memory, ethernet card, and everything. So, from outside others will sew that three machine as a three independent machine. Completely independent, that why we call that three machine as a virtual machine. In top of that three virtual machine we can put a thin layer such as OpenMosix as a tool to make that three machine as a cluster. Because of that machine that make a cluster is not a real machine (virtual machine) so we can all it as a virtual cluster. Others will see that this three machine as a single machine. That is virtual cluster.

"So what is the purpose of the virtual machine?" he asked me. "To increase the utilization of the machine" I said. OK he said and than he think. From his face I knew that he did not 100% agree. He asked again, "how if we install the same OS in the three machine". "Yes we can do it sir". He thank hard, and he explained me about real cluster. Imagine that there are 100 machine and we can do make a cluster with that machine. That machine already connected. One time we need a cluster that consist of 5 machine with specification this, this, this. Without any change of the wiring we can see the cluster as we want it. So there is a HTML right, why if wwe create a VCML, means that with that language we can make a cluster as what mentioned in the VCML without doing any wiring". "Yes sir" I said.

"Oh God, it is a very good idea", I said to him. "So there are two definition of virtual cluster, number one as my definition, and the second one is your definition, I never think about it sir". "When you came I thank about it", he said. "Oh, very fast sir", he smiled.

"What is the purpose of that sir, the second definition?". "You don't understand, OK I wil send you to Chennai, do you want to visit Chennai". "Yes sir I said". "OK I will arrange your departure, because there is a group about cluster, than you can study about cluster over there, you will be there for one week, I will send a message to my friend the Head of Department in Chennai" OK sir

"OK, in this case, meet me next week in the same day and time, and you explain me about different definition about virtual cluster. And in the and of this semester in DRC (Doctorate Review Committee) you have to have a problem". "OK sir, see you next week"


Wednesday, May 21, 2008

A cluster Computer and Its Architecture

A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
A computer node can be a single or multiprocessor system (PCs, workstations, or SMPs) with memory, I O facilities, and an operating system. A cluster generally refers to two or more computers (nodes) connected together. The nodes can exist in a single cabinet or be physically separated and connected via a LAN. An inter- connected (LAN-based) cluster of computers can appear as a single system to users and applications. Such a system can provide a cost-evective way to gain features and benefits (fast and reliable services) that have historically been found only on more expensive proprietary shared memory systems. The typical architecture of a cluster is shown in Figure above

The following are some prominent components of cluster computers:
Multiple High Performance Computers (PCs, Workstations, or SMPs)
State-of-the-art Operating Systems (Layered or Micro-kernel based)
High Performance Networks Switches (such as Gigabit Ethernet and Myrinet)
Network Interface Cards (NICs)
Fast Communication Protocols and Services (such as Active and Fast Messages)
Cluster Middleware (Single System Image (SSI) and System Availability Infrastructure)
Hardware (such as Digital (DEC) Memory Channel, hardware DSM, and SMP techniques)
Operating System Kernel or Gluing Layer (such as Solaris MC and GLUn ix)
Applications and Subsystems
Applications (such as system management tools and electronic forms)
Run-time Systems (such as software DSM and parallel file-system)
Resource Management and Scheduling software (such as LSF (Load Sharing Facility) and CODINE (COmputing in DIstributed Net-worked Environments))
Parallel Programming Environments and Tools (such as compilers, PVM (Parallel Virtual Machine), and MPI (Message Passing Interface))
Applications
Sequential
Parallel or Distributed
The network interface hardware acts as a communication processor and is responsible for transmitting and receiving packets of data between cluster nodes via a network switch.
Communication software offers a means of fast and reliable data communication among cluster nodes and to the outside world. Often, clusters with a special network switch like Myrinet use communication protocols such as active messages for fast communication among its nodes. They potentially bypass the operating system and thus remove the critical communication overheads providing direct user-level access to the network interface.
The cluster nodes can work collectively as an integrated computing resource, or they can operate as individual computers. The cluster middleware is responsible for offering an illusion of a united system image (single system image) and availability out of a collection on independent but interconnected computers.
Programming environments can offer portable, efficient, and easy-to-use tools for development of applications. They include message passing libraries, debuggers, and profilers. It should not be forgotten that clusters could be used for the execution of sequential or parallel applications.

Reference/Source
Parallel Programming Models and Paradigms
Lui s Moura Silva and Rajkumar Buyya

Monday, May 19, 2008

High-Performance Computing (HPC)

High-Performance Computing (HPC) is a branch of computer science that focuses on developing supercomputers, parallel processing algorithms, and related software. HPC is important because of its lower cost and because it is implemented in sectors where distributed parallel computing is needed to:

 Solve large scientific problems
– Advanced product design
– Environmental studies (weather prediction and geological studies)
– Research
 Store and process large amounts of data
– Data mining
– Genomics research
– Internet engine search
– Image processing

High Availability (HA) clusters

HA clusters are not easily categorized. Indeed, we are sure that many people can offer valid reasons for why a different logical structure of organization would be appropriate. Our logical structure of organization is based on function. For example, we would organize a database cluster or a server consolidation cluster under the heading of an HA cluster, since their paramount design consideration is usually high availability.

In a typical HA cluster, there are two or more fairly robust machines which mirror each other’s functions. Two schemes are typically used to achieve this. In the first scheme, one machine is quietly watching the other machine and waiting to take over in case of a failure.

The other scheme allows both machines to be active. In this environment, care should be taken to keep the load below 50 percent on each box or else there could be capacity issues if a node were to fail. These two nodes typically have a shared disk drive array comprised of either a small computer system interface (SCSI) or a Fibre Channel; both nodes talk to the same disk array.

Or, instead of having both nodes talking to the same array, you can have two separate arrays that constantly replicate each other to provide for fault tolerance. Within this subsystem, it is necessary to guarantee data integrity with file and/or record locking. There must also be a management system in place allowing each system to monitor and control the other in order to detect an error. If there is a problem, one system must be able to incapacitate the other machine, thus preserving data integrity.

There are many ways of designing an HA cluster and the list is growing.

2 Categories of Clusters

All clusters basically fall into two broad categories: High Availability (HA) and High-Performance Computing (HPC). HA clusters strive to provide extremely reliable services. HPC is a cluster configuration designed to provide greater computational power than one computer alone could provide.

In The Beginning of Cluster

Over the years there have been dramatic increases in computing power and capabilities, but none so dramatic as recently. Early mathematical computations were facilitated by lines drawn in the sand. This eventually lead to the abacus, the first mechanical device for assisting with mathematics. Much later came punch cards, a mechanical method to assist with tabulation. Ultimately, this led to ever more complex machines, mechanical and electronic, for computation.

Today, a small handheld calculator has more computing power than that available to the Apollo missions that went to the moon. Early computers used small toroids to store hundreds or thousands of bits of information in an area the size of a broom closet. Modern computers use silicon to store billions of bits of information in a space not much larger than a postage stamp.

But even as computers become more capable, certain constraints still arise. Early computers worked with 8 bits, or a byte, to solve problems. Most modern computers work with 32 bits at a time, with many dealing with 64 bits per operation, which is similar to increasing the width of a highway. Another method for increasing performance is to increase the clock speed, which is similar to raising the speed limits. So, modern computers are the equivalent of very wide highways with very fast speed limits.

However, there are limits to the performance benefits that can be achieved by simply increasing the clock speed or bus width. In this redbook, we present an alternative approach to increasing computing power. Instead of using one computer to solve a problem, why not use many computers, in concert, to solve the same problem?

Logical functions that a node can provide

As we stated before, a cluster is two or more (often many more) computersworking as a single logical system to provide services. Though from the outsidethe cluster may look like a single system, the internal workings to make thishappen can be quite complex.

This figure presents the logical functions that a physical node in a cluster can provide. Remember, these are logical functions; in some cases, multiple logical functions may reside on the same physical node, and in other cases, a logical function may be spread across multiple physical nodes.
Compute node
The compute node is where the real computing is performed. The majority of the nodes in a cluster are typically compute nodes. In order to provide an overall solution, a compute node can execute one or more tasks, based on the scheduling system.

Management node
Clusters are complex environments, and the management of the individual components is very important. The management node provides many capabilities, including:
  •  Monitoring the status of individual nodes
  •  Issuing management commands to individual nodes to correct problems or to provide commands to perform management functions, such as power on/off
You should not underestimate the importance of cluster management. It is an imperative when trying to coordinate the activities of a large numbers of systems.

Install node
In most clusters, the compute nodes (and other nodes) may need to be reconfigured and/or reinstalled with a new image relatively often. The install node provides the images and the mechanism for easily and quickly installing or reinstalling software on the cluster nodes.

User node
Individual nodes of a cluster are often on a private network that cannot be accessed directly from the outside or corporate network. Even if they are accessible, most cluster nodes would not necessarily be configured to provide an optimal user interface. The user node is the one type of node that is configured to provide that interface for users (possibly on outside networks) who may gain access to the cluster to request that a job be run, or to access the results of a previously run job.

Control node
Control nodes provide services that help the other nodes in the cluster work together to obtain the desired result. Control nodes can provide two sets of functions:
  •  Dynamic Host Configuration Protocol (DHCP), Domain Name System (DNS), and other similar functions for the cluster. These functions enable the nodes to easily be added to the cluster and to ensure they can communicate with the other nodes.
  •  Scheduling what tasks are to be done by what compute nodes. For instance,if a compute node finishes one task and is available to do additional work, thecontrol node may assign that node the next task requiring work.
Storage node
For some applications that are run in a cluster, compute nodes must have fast, reliable, and simultaneous access to the storage system. This can be accomplished in a variety of ways depending on the specific requirements of the application. Storage devices may be directly attached to the nodes or connected only to a centralized node that is responsible for hosting the storage requests.

Introduction to Virtual Cluster

In its simplest form, a cluster is two or more computers that work together to provide a solution. This should not be confused with a more common client- server model of computing where an application may be logically divided such that one or more clients request services of one or more servers. The idea behind clusters is to join the computing powers of the nodes involved to provide higher scalability, more combined computing power, or to build in redundancy to provide higher availability. So rather than a simple client making requests of one or more servers, clusters utilize multiple machines to provide a more powerful computing environment through a single system image.

An High-Performance Computing cluster typically has a large number of computers (often called nodes) and, in general, most of these nodes would be configured identically. The idea is that the individual tasks that make up a parallel application should run equally well on whatever node they are dispatched on.

However, some nodes in a cluster often have some physical and logical differences. In the following sub-sections we discuss logical node functions and then physical node types.