Simple inexpensive cloud computing

By Uwe Meding

How to build a cloud computing development environment using commercial off-the-shelf hardware and open-source software. This is a small but powerful setup for cloud-computing “simulations” and exploring different system configurations. It provides an environment for the majority of the enterprise software development and deployment:

web server + several web services
distributed application servers
database server
shared file infrastructure
small Hadoop cluster

Design
The two critical aspects we need to keep simple are the hardware, and the networking setup. Either one can be improved later on or in deployment, without affecting the core purpose of the data center. The actual choice of a motherboard and other hardware is not that critical, but should be in the “desktop” or “server” class. One of the keys is to select components that are close to the mainstream. They tend to be more reliable, less fussy and generally have better support.

Data center overview

All systems have two NICs so that we have two independent network paths. Use two independent network switches to aggregate each network. (If your motherboard only offers one NIC, simply add an additional network card)

The two networking paths are great for separating data access concerns and for performance.
For example:

the database server can server data across the internal network whereas internet traffic is handled across the external network.
the Hadoop cluster uses the internal network to run, and the external network to present the results.

Database systems
The database systems are built with the same hardware as the other servers, except they have three NICs.

Database system connections

Two database systems are a sufficient (experimental) setup for most enterprise systems.
The easy way to create a network between the two systems is using a cross-over cable. That way we do not have to add an additional switch or burden one of the other switches with additional network traffic.

The separation of the traffic puts us is a great position to experiment with

high-availability (HA) database setup
redundant database/file server setups
do fail-over tests

Software

The operating system and system operating environment is based on the CentOS 5.x or 6.x Linux distributions. Depending on the designation of the various systems in the data center, we are using different software and configurations:

Web-server: Apache httpd, PHP etc.
Application server: Java, Tomcat, JBoss etc
Compute servers: Hadoop software
Storage server: Database software (MySQL etc), file servers (NFS etc)

This is where we get to leverage the different networks to separate data and compute concerns, for example:

the storage server offers the database connection on the internal network, and the file storage on the external network
the application server uses the data base connectivity on the internal network, and serves the application aspect onto the external network.
the compute server connect to the database and manage all the Hadoop-internal traffic on the internal network.

Database systems

The configuration of the database systems is based on the same configuration as all the other systems in the cluster. The obvious difference is that they run the actual database software (MySQL in my case). To achieve redundancy install and configure DRBD on both systems. DRBD is a key software at the core of (many) highly available systems. Data is copied between the two systems (in near real time actually). We are using cross-over network connection for this because,

the data traffic is “private” as far as the database systems are concerned.
there is no need to burden one of the other switches with this traffic
a good, short cable ensures that we are transporting data at the near maximum speed of the networking cards

This is very high level overview and only gives a rough outline of the “lay of the land”, this barely scratches the surface. The configurations and setup for each item are quite interesting – they deserve separate blog entries all together.

Firewall/Router

The firewall setup can be a “small” system, i.e. an Intel Atom or Jetway size system. The important part is to remember that we need two NICs on this system. One to server the WAN side of the network traffic and one to serve the LAN side of the traffic.

The software to run on this system is either Monowall or pfSense. Either one is open-software and offer capabilities that are comparable to professional systems. The heart of the firewall is to manage the network access and traffic (port forwards, traffic shaping etc). There are additional features we can use to setup and deploy our software, for example,

using the load balancing feature we can scale applications services across multiple systems
using the fail over system to ensure a high availability of the servers and software

Implementation

Front view

My implementation of this data center architecture looks like this: 10 systems, divided into

1 web server
2 Java application servers
5 Hadoop compute servers
2 storage servers (database and files)

The enclosures are actually (way) too large for the motherboard and hardware for each server, however, there were very easy to mount in a simple rack frame.

Wiring

This is s picture from the back of the server rack. You can clearly see the internal (yellow) versus the external (blue) networking wiring. I designated the on-board NIC to be part of the external network and the additional NIC to be the internal network. Obviously, this assignment is completely arbitrary.

Shopping list

All of the items are available online,

server systems: Gigabyte motherboard, 8 GB RAM, 500GB hard drive, 1 additional NIC
storage server: Gigabyte motherboard, 16 GB RAM, 4TB storage, 2 additonal NICs
firewall system: Jetway motherboard with 2 NICs
Ancillary items like cables, enclosures etc.

Conclusion

Building the data center was great geeky fun, especially when things start life.