Mar 22

The future of scalable, distributed computing is in the cloud. This is a nebulous concept that’s supposed to describe a cloud’s perpetual elasticity in providing online storage, processing and bandwidth. There are user-facing cloud services such as Google’s web-based applications, the Ulteo online desktop and Canonical’s One service for silently cross-computer folder synchronicity. And there are the more ambitious services of Amazon’s EC2 platform, enabling websites like Facebook to expand and contract their resource requirements in realtime, catering for peaks and troughs in demand and only paying for the capacity and the CPU cycles it actually uses.

Tux: the ideal combination of experienced mountain climber and feathered networking expert.

This latter kind of cloud is a big enterprise oriented subject, and Facebook-like scenarios are way off the scale when it comes to the ordinary Linux user. But Linux is at the heart of many of these installations, and there’s plenty of enterprise cloud technology left lying around for us mortals to play with. Incredibly, considering it’s user-friendly approach to Linux, that includes the latest release of Ubuntu.

Ubuntu 9.10 bundles something called ‘Eucalyptus’, an open source tool for generating private clouds that can dynamically connect to Amazon EC2. Eucalyptus, an acronym for ‘Elastic Utility Computing Architecture Linking Your Programs To Useful Systems’, originated as an ambitious project run by the Computer Science Department at the University of California. But it quickly became apparent that the technology it was developing was in great demand, and professor Rich Wolski, as well as many of his students took sabbatical leave from their day jobs and founded Eucalyptus. So far, they haven’t gone back.

STEP 1: Getting Started

Before trying Eucalyptus for yourself, there are some demanding requirements. Installation is relatively straightforward, but you will need to use the Linux command line, and possibly, troubleshoot any problems by reading the log files.

You will need at least two machines. One for front-end management and another to act as a single nodes on the cluster. Clouds like these rely on virtualisation to provide the elasticity and software scaleability of the hardware doing the processing. They run virtualised instances, called images, of your chosen operating system. In the case of Eucalyptus, virtualisation is handled by either KVM or Xen at the kernel level, which is why you need a VT-enabled CPU for the node machines, along with plenty of horsepower, memory and storage. It also means that you will need to configure your cloud to do something useful after you’ve got it working, just as you would a standard Ubuntu server installation.

You’ll be given the option of installing a new cluster or adding a node to an existing one.

The machine used to control and manage your cloud is usually referred to as the backend, and to get started you need to insert the Ubuntu 9.10 Server into its drive and reboot. When you see the boot menu, select a language followed by ‘Install Ubuntu Enterprise Cloud’. Choose language, location and keyboard layout, then enter a host name, we used ubuntu1. The next step will ask whether you want to create a ‘Cluster’ or a ‘Node’, and you need to select the first option, ‘Cluster’.

You will then need to work through the standard Ubuntu partition options. Ideally, use the entire disk for the installation unless you need to keep data on the machine, and leave the most options at their default values. The installer will then go off and create the partitions it needs then download and install a few packages. After this, it will ask for the default username and password for the machine, and whether you need your home directory encrypting. Skip the HTTP proxy question and leave the automatic updates off for now, and set Postfix to ‘No Configuration’.

STEP 2: Network and Nodes

We now have a couple of questions that deal specifically with the Eucalyptus configuration. The first asks for a cluster name, and you can call it what you like. This is the name people will see if they access your cloud. The second question asks for a range of IP addresses on your LAN that Eucalyptus can safely use to assign to each node. To answer this, you need to know the range of addresses that your router is using.

Many routers on a home network, for example, will issue IP addresses in the range of 192.168.1.2 – 192.168.1.100, or similar. You need to find this information from your router and enter a range that isn’t going to be assigned automatically from the router but is on the same domain. ‘192.168.1.100-192.168.1.200’ would work with our previous example. But for the sake of our experiment, you only need to find a couple of spare addresses on the same domain.

It’s best to assign a range of currently-free IP addresses to make up your cluster. It saves a lot of hassle later on.

After entering these details, the remainder of the packages will be installed and you’ll be asked to reboot the machine without the disc in the drive. When the backend reboots, login to your account. You should see the IP address for the server displayed and the message about documentation, and you need to make sure this address in within your network’s range.

It’s now time to tackle the node. Take the same disc you used for the backend and use it to boot your node machine. Choose ‘Install Ubuntu Enterprise Cloud’ from the boot menu again, and go through the first few questions. After entering a new hostname, you should be told that there’s already a Eucalyptus cluster controller on your network, and ‘Node’ will be selected automatically. Just press return. Installation will now be identical to the earlier install, only without any further Eucalyptus questions. Even your user name and password are grabbed from the backend machine, and at the end of the process, you can reboot and the node is now running.

STEP 3: Configuration

On each machine, you should login and type ‘apt-get upgrade’ to download and install the latest package updates for the system. You’ll probably have to reboot each machine again.

We now need to tell the backend about the existence of our single node. Login to the backend machine, and type ‘sudo euca_conf –no-rsync –discover-nodes’. You should see something like the following:

New node found on 192.168.1.62; add it? [Yn]

Just press return, and you should see that the two machines connect and synchronise a pair of keys that will be used to authenticate future connections. Now launch a browser from any other machine on the LAN and go to ‘https://backend_ip:8443’. You will get a security warning about the unverified nature of the certificate used for the connection, but you’ll need to add an exception for this site. Firefox will step you through this process automatically.

You should run the configuration tool as a super user.

You will then see the ‘Eucalyptus Enterprise Cloud’ login screen. Enter ‘admin’ for username and the password, and you’ll be immediately asked to enter a new administrator password, and check the IP address of the server so that a new certificate can be generated. After clicking submit, you’ll find yourself at the Eucalyptus management console, but before we can get stuck into the details, we need to download something called a credentials file, that we can use to authenticate our own tinkering with the server, as well as any other cloud service you may want to use to expand your installed.

STEP 4: Credentials

Click on the Credentials tab followed by the ‘Download Credentials’ button. This will leave you with a zip file that you will need to transfer to your user account on the backend machine. The easiest way is from the command line, using ‘sftp backend_ip’ to connect to the machine, followed by the command ‘put’ with by the path to the zip file, ‘put euca2-admin-x509.zip’ for instance. On the backend, you then need to unzip the file into a hidden folder in your home directory with ‘unzip -d ~/.euca euca2-admin-x509.zip’ and run the script it contains that configures various environmental variables and keys for managing the cloud, ‘. ~/.euca/eucarc’. This script will need to be run whenever you re-connect to your backend system.

Once it’s properly installed, Eucalyptus’ web interface looks after much of the heavy lifting.

To check that everything is working as it should be, type ‘euca-describe-availability-zones verbose’. The output should look like the following:

AVAILABILITYZONE pcp_cloud 192.168.1.48
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5

This is important information. The critical data is beneath the free/max columns. This shows the CPU cores available on your cloud for immediate use. The more nodes you have on the network, the higher the number here. If you see only zeros, then there has been a problem with the node starting the appropriate controller, and you’ll need to check its /var/log/eucalyptus’ directory for the log files.

STEP 5: Run an image

It’s now time to create a virtual instance of a machine to run on our node. Eucalyptus makes this very easy because it allows you to download pre-built packages.

Go back to the browser and click on the ‘Store’ tab in the management console and choose a pre-built image to download. We opted for Karmic Koala (i386), which is a 174MB download. You will also have to wait a couple of minutes for the image to be installed after the download has completed, but before long you should see the message ‘How to run?’, and you’re ready to go. We did have problems with our backend machine hanging at this point, but we added more memory to the system and it worked though a hitch.

You’re provided with a selection of pre-configured virtual machines just begging to be run on your new cloud.

Before we go back to the command line, click on the ‘How to run’ link next to the download, and make a note of the ‘emi’ value at the end of the command. This is the unique identifier for the image, and we’ll need to use this when we run it. Back on the command line, type ‘touch ~/.euca/mykey.priv; chmod 0600 ~/.euca/mykey.priv; euca-add-keypair mykey > ~/.euca/mykey.priv’ to create a key pair to authenticate the connection to your running image. Then type ‘euca-describe-groups’ followed by ‘euca-authorize default -P tcp -p 22 -s 0.0.0.0/0’ to enable SSH access to the virtualised machine, and finally, type the following to launch it:

euca-run-instances -k mykey -t c1.medium emi

Replace the emi value with the identifier you took from the web interface. You should also notice that we’ve specified ‘c1.medium’ as the type of image, and this is a preset for the amount of resources the image is given. These can be viewed and modified using the Configuration page of the web management interface. It may take a while to initialise as there’s a lot of data being copied across the network, but you can check it’s status with the ‘euca-describe-instances’ command, and wait for it’s state to switch to ‘running’ rather than ‘pending’. On our hardware, this took about 20 minutes.

You’ll also see the IP address of the new instance, and you can now connect to your new cloud server by typing ‘ssh -i ~/.euca/mykey.priv ubuntu@ip_address’. You’re now ready to install and run your mind-blowing web 2.0 application!

See also: Amazon EC2

Private clouds, of the kind we’ve started to build above, are an excellent raw resource if you need a dynamic pool of processing power. But the beauty of cloud computing is that it’s designed to be elastic, and this means you needn’t be restricted by the physical limitations of your own setup. With Eucalyptus, for example, you can take exactly the same images you’re running on your local machines, and move them to Amazon’s EC2 service as and when you need the extra horsepower.

This gives you relatively unlimited resources in terms of processing power, storage and network bandwidth, and you only have to pay for what you need, rather than the old model of paying for a rack of servers somewhere that spend most of their time idle. Eucalyptus manages this by building an API that’s compatible with Amazon’s, making the images that you configure on your local network drop-in compatible with the images that run on Amazon’s servers. Also, the tools that you use to manage and maintain your private cloud are the same you use to manage an Amazon-hosted cloud, making the transition between the two almost seamless.

There are other advantages to this compatibility too. You can create, test and experiment with private clouds before committing your concept to the costs and scrutiny of a public cloud running on Amazon’s servers. And private clouds, such as the one we create in the main text, can give you an excellent feel for how the technology works, and how you and your company may find it useful.

Troubleshooting

As you might have noticed, despite the Ubuntu installation being the easiest way of getting a usable cloud system, it’s still far from easy. There are a lot of aspect to the configuration that can go wrong, from the hardware that you choose, to the network that the various machines are communicating across. Another problem is that the Eucalyptus project is only in its infancy, with Ubuntu 9.10 being the first distribution to include packages by default. This means there isn’t a great deal of support, especially if you’re new to the world of clouds.

But there are several things you can do to help solve problems yourself. Firstly, you should run ‘euca-describe-availability-zones verbose’ to check whether your node machine has been detected and added to the cloud. If you have zero resources available, then it hasn’t. The most common cause for this problem is that the synchronisation of the keys from the backend to the node has failed, stopping the node from registering itself. Check ‘/var/lib/eucalyptus/keys’ on both machines to make sure the keys are available. If the keys on the backend are missing, trying running ‘/etc/init.d/eucalyptus-cc-registration restart’ to regenerate the keys, then then try to add the node again using ‘euca_conf –no-rsync –discover-nodes’. If all else fails, try ‘sudo /etc/init.d/eucalyptus-sc-registration restart’ too.

We also had problems trying to register nodes after updating the system with new packages. The solution to this problem is to either add nodes to the cluster before you update any packages, or leave your installation with the packages include with the default install. This shouldn’t matter for an experimental installation. Finally, if all else fails, trawl through files within the /var/log/eucalyptus directory on both machines, as this should give you some idea of where your setup may be failing.

Aug 07

It’s encouraging that many of the conversations we are having at the moment in relation to IT and sustainability are moving beyond power management in the data centre. It is not that optimising the use of central IT isn’t important, but it really is only one way to drive an organisation’s environmental agenda. And even before we get to main question of how technology can enable more eco-friendly working practices, there is another place we can look to for operational IT power savings – the desktop.

When looking in this direction, though, I have noticed that there is a tendency to apply the same kind of thinking that is used on the server side of the equation. Fair enough, accelerating hardware refresh to introduce more power efficient kit into the equation reflects a similar game to that being played in the data centre, but with the carbon cost of manufacture/disposal taken into account, the net gains are hard to establish. In the data centre of course, hardware modernisation is augmented by consolidation and virtualisation to drive up average server utilisation and thus improve energy efficiency.

Virtualisation is a different game on the desktop, however. Sure, some will go down the route of running virtual PCs on the server and accessing them through thin client configurations, but it will be a long time before this is the norm. The reality is that most organisations will remain wed to their fat clients for the foreseeable future, so we need to think of the energy question a bit differently. Essentially, the challenge boils down to optimising the power consumption of desktop machines that typically idle for the majority of time they are switched on.

In order to deal with this problem, we need to think less about utilisation and inherent power efficiency of hardware and software, and more about controlling the state of machines in terms of their sleep/wake cycle. In practice, a configuration exhibiting a high degree of runtime energy efficiency, but has no active policy to transition to a low power state when idle will consume considerably more power than a less efficient machine whose state is properly managed.

This something that Microsoft makes a big point of when talking about Vista in the green context, and indeed early adopters with large Vista estates corroborate Microsoft’s claims that Vista’s enhanced manageability translates directly to power savings. The problem is, however, that Windows XP isn’t going away in a hurry, so what about all of those organisations who are interested in desktop power management but will be maintaining older versions of the operating system for some time to come?

Well the one approach that is generally acknowledged not to work that well is to educate, encourage or threaten users in an attempt to get them to keep their power configuration set in accordance with environmental policy, and/or to manually shut down their PCs or put them to sleep when they are not in use. IT managers relying on this kind of user discipline are probably not going to see the results they were hoping for unless they’re working for a totally green-tinted organisation.

Fortunately, third party solutions exist that can help to enable/enforce centralised power management – a couple of examples being Verdiem and 1E. Using such technology, you can not only cure PC insomnia from a policy enforcement perspective, but also allow real-time remote control of power state so machines can be woken up for backup or software distribution purposes then put to sleep again afterwards. So, if you are serious about saving energy across a large XP estate, the options are there.

Something I haven’t had time to look into is whether similar solutions exist for alternative desktops – namely Mac OS X and Linux. Apple kit is certainly not renowned for its enterprise management friendliness, but perhaps ‘right on’ Mac users aren’t so much of a problem as they are of course more environmentally aware. As for Linux, I would be interested in any views, recommendations or experiences.

Meanwhile, it would be great to see a bit more awareness raising from Microsoft on the availability of solutions to centrally manage power consumption by Windows XP, rather than automatically seguéing from this discussion into a Vista upgrade pitch.

May 18

Don’t you just hate it when another woolly ambiguous term is forced upon us? When I was approached by yet another journalist the other day asking me my thoughts on the impact of cloud computing, I simply sighed and told them it is a bit like Web 2.0. In itself, it is difficult to pin down exactly what is meant by it. The best you can do is say that both of these terms refer to a general direction in which the industry appears to be moving.

In the case of Web 2.0, it is about the Web becoming a generally more interactive medium. This can manifest itself at a technology level through everything from Ajax through mash-ups to SOA, and at a behavioural level through social media and the simple fact that websites are generally now more geared up to a two-way dialogue than they used to be.

In the case of cloud computing, it is about the evolution of dynamic virtualised infrastructure that allows us to think more in terms of resource pools than individual IT components. This in turn opens the door to delivering computing resource on a utility basis, which is equally applicable both internally (i.e. with regard to the way you use your data centre) and externally – which takes you into the realm of utility computing and software as a service.

The point about both Web 2.0 and cloud computing is that they both sprung up arbitrarily on the evolutionary timeline, and seeming embraced anything and everything that could be thrown into the mix. While the very specific phenomenon of social networking is certainly noteworthy, this bears little relationship to evolution of rich user interfaces and composite applications, in fact many social networking sites have appalling UIs by traditional standards. Yet Web 2.0 can mean either of these things, and, confusingly, lots of other concepts too.

Similarly, we have been talking about virtualisation ultimately leading to computing grids and utility computing for years, and giving it a new name doesn’t actually change anything in terms of the underlying trend. In fact, you knew where you stood much better when you could talk about virtualisation and grid technology as the enabling stuff, and utility computing and application services as what it enables. As everyone jumps onto the cloud computing bandwagon, it all gets mixed up and confused, just like Web 2.0.

So, if you are one of those people wondering what cloud computing is really all about after listening the IBM explanation, the Microsoft one, and the evangelical rhetoric we have heard recently from the Google and Salesforce.com camp, don’t worry, you are not alone. The trick is to think of it as a label for a trend at one level, and an industry bandwagon at another, and keep your expectations pretty low in terms of clarity and consistency for the time being. Don’t however, dismiss the underlying trend it itself. While we are not looking at a revolution here, some of the developments in this general area are really quite interesting and valuable – though, you probably knew that already, even before the marketing hype was thrust upon us.