Apr 26

No crime, no lag, no malware: 2020’s internet sounds like heaven. PC Plus checks out its foundations.

Safe, secure and speedy: that’s the internet of 2020. In a decade’s time, the web will be a very different place. There will be no crime, no malware and no fake online banking sites. Latency won’t be a problem. High-definition video will be smooth, and buffering will be a distant, nightmarish memory.

And that’s not all. The internet will have grown dramatically, making room for a new generation of connected devices: cars, phones, TVs, everything. Super-fast speeds are the rule, not the exception. To borrow a phrase, it just works.

At least, that’s what we hope the web will be like. To make it happen, engineers merely need to rethink the way the internet works and change pretty much everything. What could be simpler? Some big changes are already in progress. The explosion of internet-
enabled devices means that we’re running out of IP addresses even more quickly than expected: RIPE NCC’s Managing Director Axel Pawlik noted in January that the pool of unassigned IPv4 addresses would run out as early as 2011. But the move to IPv6, which can handle around “a trillion trillion trillion” addresses – 3.4×1038 if you’re feeling pedantic – is largely a software, not hardware, issue. “In most cases it’s very easy to reprogram connectivity software on a chip to ensure a device is IPv6 compatible,” Pawlik says.

But things aren’t progressing as straightforwardly as you would think. “Despite the simplicity of ensuring compatibility, widespread IPv6 take-up has so far been slow, and many of the best known digital devices available today, including the iPhone, do not yet support the next generation of IP addressing,” warns Pawlik. That lack of urgency is disappearing fast, with big names like Google implementing IPv6 support, router firms embracing the new system and new operating systems – including Windows and OS X – supporting it.

If we’re late embracing IPv6, the internet won’t grind to a halt – existing IP addresses will keep working – but as the European Commission reports, “the growth and also the capacity for innovation in IP-based networks would be hindered”. The EU is pushing IPv6 hard, and it expects European ISPs and “the top 100 European sites” to be IPv6-enabled this year.

As a happy by-product of IPv6, widespread adoption will make the internet more secure too. The IPsec security protocol is a compulsory part of IPv6, which means all IPv6 communications can be encrypted and authenticated.

Route masters

We’re using the internet in ways its creators couldn’t possibly have imagined, from the rise of video to the sheer number of connected devices. We’re constantly pushing the internet’s capacity, stability and security, and inevitably cracks are beginning to show.

Aaron Falk is the Chair of the Internet Research Task Force (IRTF) and Engineering Lead with the Global Environment for Network Innovations (GENI). “There are many areas where the current architecture is straining to meet the needs of the users,” he says. “In particular, the areas of mobility, security, and network management were not well addressed in the original architecture, leading to a patchwork of mechanisms. The greatest concern is not so much that today’s traffic is challenged but that the ad-hoc machinery being inserted into the network will inhibit future innovations. I worry about tomorrow’s applications more than today’s.”

The IRTF is a technological trouble-shooter for internet architecture, as Falk explains: “The IRTF hosts research groups that work in areas ‘adjacent’ to the IETF (Internet Engineering Task Force). This can be pre-standards technologies, hard problems that emerge from the IETF or operations communities, technologies where the internet may be one of many possible communications strategies, or architectural issues.”

He continues: “Sometimes research groups assist IETF working groups by bringing researcher expertise or otherwise ‘pre-baking’ technologies so they are ready for standardisation. For example, the Mobility Optimizations Research Group has been working on IP mobility solutions that feed into the MIPSHOP (Mobility for IP: Performance, Signalling and Handoff Optimization) working group for standardisation. Another example is the IRTF Research Group on Internet Congestion Control (ICCRG) which evaluates new congestion control proposals that arise in the IETF.”

I dream of GENI

One of the problems with the current web is that it’s too big and too important to muck around with. That’s where GENI comes in. The Global Environment for Network Innovations is funded by the US National Science Foundation, and it’s best described as a (serious) playground where new ideas can be tested out. “GENI will support two major types of experiments,” the organisation says. “Controlled and repeatable experiments, which will greatly help improve our scientific understanding of complex, large-
scale networks, and ‘in the wild’ trials of experimental services that ride atop or connect to today’s internet and that engage large numbers of human participants.

“We’re well underway on the second year of GENI prototyping, GENI Spiral 2,” Falk says. “One of our more exciting activities is what we are calling ‘meso-scale deployments’ of virtualisable, programmable routers, switches, and WiMax base stations on 14 campuses and two national research backbone networks. Deployments like these are particularly exciting because they’ll allow experimental applications and services built on GENI to directly reach real users on university campuses. Thus researchers will have the ability to build new services – perhaps incompatible with the current internet – and test them at-scale with real end-users.” One area of concern is routing tables, which the net’s backbone routers use to direct online traffic. The BGP (border gateway protocol) routing table has grown hugely, doubling in size between 2003 and 2009, and there are concerns that if the level of growth continues, router hardware won’t be able to cope. The IRTF’s Routing Research Group (RRG) is investigating alternatives, and its goal is to produce solid recommendations that the IETF can implement. Another related program is Rochester Institute of Technology’s Floating Cloud initiative, which hopes to address the problem of routing table growth by moving the routing tables from inside routers to network clouds. Initial testing took place on a dozen Linux boxes, and the next step is to try it on GENI.

The BGP routing table doubled in size between 2003 and 2009, and it’s still getting bigger.

GENI isn’t the only initiative that the NSF is helping to fund. Its Future Internet Architectures (FIA) program is offering $30million to fund projects that will transform the net. As the NSF puts it: “Proposals should not focus on making the existing internet better through incremental changes, but rather should focus on designing comprehensive architectures that can meet the challenges and opportunities of the 21st century.”

FIA is a continuation of FIND, the NSF’s Future Internet Design project. FIND asked researchers to redesign the internet from scratch, and FIA will narrow around 50 FIND projects down to two, three or four serious contenders.

Safety and security

With the existing internet, security is something that’s largely been bolted on as an afterthought – but the FIA program expects security to be a key consideration from the outset. That’s leading to some interesting ideas, including one security system that takes its cues from Facebook. Davis Social Links (DSL) adds a “social control layer” to the network that identifies you not by your IP address but by your social connections. If it works – and DSL is in the very, very early stages of development – it could make a major dent in problems such as spam and denial of service attacks.

Eugene Kaspersky, CEO of Kaspersky Lab, would like to take things even further. In October, he argued that the internet’s biggest weakness was anonymity, and that everyone should have online passports. “I’d like to change the design of the internet by introducing regulation – internet passports, internet police and international agreement – about following [web] standards,” he told ZDNet Asia.

Kaspersky explained further on the Viruslist.com blog: “When I say ‘no anonymity’, I mean only ‘no anonymity for security control’,” he writes, explaining that he couldn’t care less what people posted on blogs or downloaded through BitTorrent. “The only [requirement] – you must present your ID to your internet provider when you connect.” Kaspersky argues that such requirements are inevitable, with some EU countries already introducing digital IDs. “Another prototype of e-passports is the two-factor authentication we use to access corporate networks,” he says. “The only thing missing today is a common standard.”

Security guru Bruce Schneier isn’t convinced. “Mandating universal identity and attribution is the wrong goal,” he writes on Techtarget. “Accept that there will always be anonymous speech on the internet. Accept that you’ll never truly know where a packet came from. Work on the problems you can solve: software that’s secure in the face of whatever packet it receives, identification systems that are secure enough in the face of the risks. We can do far better at these things than we’re doing, and they’ll do more to improve security than trying to fix insoluble problems.”

The quest for improved security is attracting a lot of attention – and a lot of money. The US Defense Advanced Research Projects Agency (DARPA) awarded contracts worth $56million in January to two firms as part of its National Cyber Range security programme, which will enable network infrastructure experiments, new cyber testing capabilities and realistic testing of network technology. A month previously, Raytheon BBN Technologies was awarded an $81million contract by the Army Research Laboratory to build the largest communications lab in the US, again to research network security.

David Emm is part of Kaspersky Lab’s Global Research and Analysis Team. “It would be unrealistic to expect a wholesale re-architecture of the internet, or even of some of the technologies that are used online,” he says. “If we fix the problem by removing the facility, we run the risk of damaging legitimate activity too.”

There’s also the issue of displacement: if the internet becomes tougher to compromise, villains will simply switch to social engineering instead. As Emm points out, corporate email filtering to remove attached ‘.exe’ files simply spawned the use of links rather than attachments to spread viruses and other malware. “There has always been a human dimension to PC attacks,” he says. “Patching code is fairly straightforward once you know what you need to fix. But patching humans takes longer and requires ongoing investment.”

The last mile

There’s another big piece of architecture that needs upgrading: the bit between your ISP and you. Whether that’s a wired connection or a wireless one, today’s technology needs a serious speed boost. As Tim Johnson of broadband analyst Point Topic explains, “ Over the past 15 years or so we’ve seen the data speeds that typical home users get going up roughly 10 times every five years. I think that will continue over the next decade so that by 2020 many users will be getting a gigabit on their home broadband.

BT’s 21CN project is a software-driven network that aims to drive innovation.

“The big barriers that must be overcome to get there are (a) extending fibre all the way to the home, and (b) providing the backhaul capacity and the interconnect standards to make it useful,” he elaborates. “Both of those are do-able but I think it will be quite late in the teens before they are achieved.”

Johnson reckons that things will get particularly interesting when 100Mbps+ connections are the norm, as they will be able to deliver immersive, high-definition environments and “a huge new space of technology, applications and lifestyle possibilities”. But he’s not convinced the internet can even handle that – not in its current form, anyway.

“This kind of application is rather different from what the internet was designed for and is good at,” he says. “From an engineering point of view it will mean provisioning capacity that will allow users to set up assured end-to-end symmetrical calls of at least 20Mbps each way. There also needs to be a huge amount of standards development and investment to support setup and switching. […] It’s possible that this could all be done across the open internet, but my own belief is that as this type of traffic grows it will create the need for more dedicated capacity. IP and intelligent multiplexing will still rule, but the basic architecture will be different.”

Going mobile

In developed countries, the internet is moving away from the desktop and onto mobile phones and other wireless devices, while in developing countries the internet is primarily a mobile medium already. In both developed and developing countries the number of mobile internet users will increase dramatically in the next decade. So if you think the mobile networks are creaky now, things could get considerably worse in a decade.

For the mobile internet at least, the future may look an awful lot like the past. As Jon Crowcroft of the University of Cambridge writes: “We are so used to networks that are ‘always there’ – so-called infrastructural networks such as the phone system, the internet, the cellular networks (GSM, CDMA, 3G) – and so on that we forget that once upon a time (why, only in the 1970s) computer communications were fraught with problems of reliability, and challenged by very high cost or availability of connectivity and capacity.”

Noting that technologies such as email coped fine in those conditions, Crowcroft suggests that, “It appears that it’s worth revisiting these ideas for a variety of reasons: it looks like we cannot afford to build a Solar System-wide internet just yet, [but] it looks like one can build effective end-to-end mobile applications out of wireless communication opportunities that arise out of infrequent and short contacts between devices carried by people in close proximity, and then wait until these people move on geographically to the next hop. It’s interesting to speculate that these systems may actually have much higher potential capacity than infrastructural wireless access networks, although they present other challenges (notably higher delay).”

Such systems – variously called Intermittent, Opportunistic or Delay Tolerant networks – have a wide range of applications. They’re useful in emergencies and in areas where there isn’t an existing network infrastructure, and they’re particularly well suited to emerging applications where a constant signal can’t be guaranteed, such as internet-enabled cars.

While such networks could ultimately be deployed in remote areas, for most of us the future of the mobile internet is very similar to what we’ve already got. LTE (Long Term Evolution) is a kind of 3G network with knobs on, and in the UK at least it’s generating much more interest than the rival WiMax technology. When LTE begins to roll out later this year it will deliver theoretical speeds of up to 140Mbps, rising to 340Mbps after a 2011 upgrade. An even faster version of the network, LTE Advanced, is in the works. It’s worth noting, though, that even the first version of the LTE network will take several years to roll out nationwide.

And WiMax? In February this year, Patrick Plas – Alcatel-Lucent’s Chief Operating Officer for Wireless – told reporters that the company “is not putting a lot of effort into this technology any longer” as mobile networks were showing “a clear direction taken by the industry towards LTE”. That’s an honest indication of where the mobile internet is heading.

Looking ahead

Predicting the future is a tricky business, and predicting the future of the internet is doubly so. However, it’s clear that the next decade will see some dramatic changes in the way the web works. Some changes are definite – the move to IPv6 will happen, albeit more slowly than many would like – while other developments such as opportunistic networks may never become mainstream.

What we can predict is that the internet of 2020 will be coping with user numbers and traffic volumes that we can barely imagine. To be able to cope with that, the net will probably become a hybrid: a mix of old and new. As Falk puts it: “Recent interest in ‘clean slate’ network architectures encourages researchers to consider how the internet might be designed differently if, say, we knew then what we know now about how it will be used,” he says. “But that is not to say we must discard the current internet to fix the problems. The internet has tremendous value, has supported astronomical growth and changed the lives of millions of people. I believe research in new internet designs will provide insights on where the high-leverage points are on the current design thus allowing us to understand, justify, and deploy changes that will bring the greatest benefit.”

Tags: application, authentication, blog, business, cell, ceo, Communications, Computer, connectivity, dba, desktop, Development, device, email, Environment, facebook, generation, google, Hardware, ims, Innovation, Internet, ip address, ip addresses, iphone, iss, linux, memory, mobile phones, money, network, Network Security, new operating system, Operating Systems, performance, requirement, Research, rms, sap, Science, security, sla, Software, space, Spam, switches, system, Technology, type, viruses, web, Windows, Wireless, XP
Mar 23

The future of scalable, distributed computing is in the cloud. This is a nebulous concept that’s supposed to describe a cloud’s perpetual elasticity in providing online storage, processing and bandwidth. There are user-facing cloud services such as Google’s web-based applications, the Ulteo online desktop and Canonical’s One service for silently cross-computer folder synchronicity. And there are the more ambitious services of Amazon’s EC2 platform, enabling websites like Facebook to expand and contract their resource requirements in realtime, catering for peaks and troughs in demand and only paying for the capacity and the CPU cycles it actually uses.

Tux: the ideal combination of experienced mountain climber and feathered networking expert.

This latter kind of cloud is a big enterprise oriented subject, and Facebook-like scenarios are way off the scale when it comes to the ordinary Linux user. But Linux is at the heart of many of these installations, and there’s plenty of enterprise cloud technology left lying around for us mortals to play with. Incredibly, considering it’s user-friendly approach to Linux, that includes the latest release of Ubuntu.

Ubuntu 9.10 bundles something called ‘Eucalyptus’, an open source tool for generating private clouds that can dynamically connect to Amazon EC2. Eucalyptus, an acronym for ‘Elastic Utility Computing Architecture Linking Your Programs To Useful Systems’, originated as an ambitious project run by the Computer Science Department at the University of California. But it quickly became apparent that the technology it was developing was in great demand, and professor Rich Wolski, as well as many of his students took sabbatical leave from their day jobs and founded Eucalyptus. So far, they haven’t gone back.

STEP 1: Getting Started

Before trying Eucalyptus for yourself, there are some demanding requirements. Installation is relatively straightforward, but you will need to use the Linux command line, and possibly, troubleshoot any problems by reading the log files.

You will need at least two machines. One for front-end management and another to act as a single nodes on the cluster. Clouds like these rely on virtualisation to provide the elasticity and software scaleability of the hardware doing the processing. They run virtualised instances, called images, of your chosen operating system. In the case of Eucalyptus, virtualisation is handled by either KVM or Xen at the kernel level, which is why you need a VT-enabled CPU for the node machines, along with plenty of horsepower, memory and storage. It also means that you will need to configure your cloud to do something useful after you’ve got it working, just as you would a standard Ubuntu server installation.

You’ll be given the option of installing a new cluster or adding a node to an existing one.

The machine used to control and manage your cloud is usually referred to as the backend, and to get started you need to insert the Ubuntu 9.10 Server into its drive and reboot. When you see the boot menu, select a language followed by ‘Install Ubuntu Enterprise Cloud’. Choose language, location and keyboard layout, then enter a host name, we used ubuntu1. The next step will ask whether you want to create a ‘Cluster’ or a ‘Node’, and you need to select the first option, ‘Cluster’.

You will then need to work through the standard Ubuntu partition options. Ideally, use the entire disk for the installation unless you need to keep data on the machine, and leave the most options at their default values. The installer will then go off and create the partitions it needs then download and install a few packages. After this, it will ask for the default username and password for the machine, and whether you need your home directory encrypting. Skip the HTTP proxy question and leave the automatic updates off for now, and set Postfix to ‘No Configuration’.

STEP 2: Network and Nodes

We now have a couple of questions that deal specifically with the Eucalyptus configuration. The first asks for a cluster name, and you can call it what you like. This is the name people will see if they access your cloud. The second question asks for a range of IP addresses on your LAN that Eucalyptus can safely use to assign to each node. To answer this, you need to know the range of addresses that your router is using.

Many routers on a home network, for example, will issue IP addresses in the range of 192.168.1.2 – 192.168.1.100, or similar. You need to find this information from your router and enter a range that isn’t going to be assigned automatically from the router but is on the same domain. ‘192.168.1.100-192.168.1.200’ would work with our previous example. But for the sake of our experiment, you only need to find a couple of spare addresses on the same domain.

It’s best to assign a range of currently-free IP addresses to make up your cluster. It saves a lot of hassle later on.

After entering these details, the remainder of the packages will be installed and you’ll be asked to reboot the machine without the disc in the drive. When the backend reboots, login to your account. You should see the IP address for the server displayed and the message about documentation, and you need to make sure this address in within your network’s range.

It’s now time to tackle the node. Take the same disc you used for the backend and use it to boot your node machine. Choose ‘Install Ubuntu Enterprise Cloud’ from the boot menu again, and go through the first few questions. After entering a new hostname, you should be told that there’s already a Eucalyptus cluster controller on your network, and ‘Node’ will be selected automatically. Just press return. Installation will now be identical to the earlier install, only without any further Eucalyptus questions. Even your user name and password are grabbed from the backend machine, and at the end of the process, you can reboot and the node is now running.

STEP 3: Configuration

On each machine, you should login and type ‘apt-get upgrade’ to download and install the latest package updates for the system. You’ll probably have to reboot each machine again.

We now need to tell the backend about the existence of our single node. Login to the backend machine, and type ‘sudo euca_conf –no-rsync –discover-nodes’. You should see something like the following:

New node found on 192.168.1.62; add it? [Yn]

Just press return, and you should see that the two machines connect and synchronise a pair of keys that will be used to authenticate future connections. Now launch a browser from any other machine on the LAN and go to ‘https://backend_ip:8443’. You will get a security warning about the unverified nature of the certificate used for the connection, but you’ll need to add an exception for this site. Firefox will step you through this process automatically.

You should run the configuration tool as a super user.

You will then see the ‘Eucalyptus Enterprise Cloud’ login screen. Enter ‘admin’ for username and the password, and you’ll be immediately asked to enter a new administrator password, and check the IP address of the server so that a new certificate can be generated. After clicking submit, you’ll find yourself at the Eucalyptus management console, but before we can get stuck into the details, we need to download something called a credentials file, that we can use to authenticate our own tinkering with the server, as well as any other cloud service you may want to use to expand your installed.

STEP 4: Credentials

Click on the Credentials tab followed by the ‘Download Credentials’ button. This will leave you with a zip file that you will need to transfer to your user account on the backend machine. The easiest way is from the command line, using ‘sftp backend_ip’ to connect to the machine, followed by the command ‘put’ with by the path to the zip file, ‘put euca2-admin-x509.zip’ for instance. On the backend, you then need to unzip the file into a hidden folder in your home directory with ‘unzip -d ~/.euca euca2-admin-x509.zip’ and run the script it contains that configures various environmental variables and keys for managing the cloud, ‘. ~/.euca/eucarc’. This script will need to be run whenever you re-connect to your backend system.

Once it’s properly installed, Eucalyptus’ web interface looks after much of the heavy lifting.

To check that everything is working as it should be, type ‘euca-describe-availability-zones verbose’. The output should look like the following:

AVAILABILITYZONE pcp_cloud 192.168.1.48
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5

This is important information. The critical data is beneath the free/max columns. This shows the CPU cores available on your cloud for immediate use. The more nodes you have on the network, the higher the number here. If you see only zeros, then there has been a problem with the node starting the appropriate controller, and you’ll need to check its /var/log/eucalyptus’ directory for the log files.

STEP 5: Run an image

It’s now time to create a virtual instance of a machine to run on our node. Eucalyptus makes this very easy because it allows you to download pre-built packages.

Go back to the browser and click on the ‘Store’ tab in the management console and choose a pre-built image to download. We opted for Karmic Koala (i386), which is a 174MB download. You will also have to wait a couple of minutes for the image to be installed after the download has completed, but before long you should see the message ‘How to run?’, and you’re ready to go. We did have problems with our backend machine hanging at this point, but we added more memory to the system and it worked though a hitch.

You’re provided with a selection of pre-configured virtual machines just begging to be run on your new cloud.

Before we go back to the command line, click on the ‘How to run’ link next to the download, and make a note of the ‘emi’ value at the end of the command. This is the unique identifier for the image, and we’ll need to use this when we run it. Back on the command line, type ‘touch ~/.euca/mykey.priv; chmod 0600 ~/.euca/mykey.priv; euca-add-keypair mykey > ~/.euca/mykey.priv’ to create a key pair to authenticate the connection to your running image. Then type ‘euca-describe-groups’ followed by ‘euca-authorize default -P tcp -p 22 -s 0.0.0.0/0’ to enable SSH access to the virtualised machine, and finally, type the following to launch it:

euca-run-instances -k mykey -t c1.medium emi

Replace the emi value with the identifier you took from the web interface. You should also notice that we’ve specified ‘c1.medium’ as the type of image, and this is a preset for the amount of resources the image is given. These can be viewed and modified using the Configuration page of the web management interface. It may take a while to initialise as there’s a lot of data being copied across the network, but you can check it’s status with the ‘euca-describe-instances’ command, and wait for it’s state to switch to ‘running’ rather than ‘pending’. On our hardware, this took about 20 minutes.

You’ll also see the IP address of the new instance, and you can now connect to your new cloud server by typing ‘ssh -i ~/.euca/mykey.priv ubuntu@ip_address’. You’re now ready to install and run your mind-blowing web 2.0 application!

See also: Amazon EC2

Private clouds, of the kind we’ve started to build above, are an excellent raw resource if you need a dynamic pool of processing power. But the beauty of cloud computing is that it’s designed to be elastic, and this means you needn’t be restricted by the physical limitations of your own setup. With Eucalyptus, for example, you can take exactly the same images you’re running on your local machines, and move them to Amazon’s EC2 service as and when you need the extra horsepower.

This gives you relatively unlimited resources in terms of processing power, storage and network bandwidth, and you only have to pay for what you need, rather than the old model of paying for a rack of servers somewhere that spend most of their time idle. Eucalyptus manages this by building an API that’s compatible with Amazon’s, making the images that you configure on your local network drop-in compatible with the images that run on Amazon’s servers. Also, the tools that you use to manage and maintain your private cloud are the same you use to manage an Amazon-hosted cloud, making the transition between the two almost seamless.

There are other advantages to this compatibility too. You can create, test and experiment with private clouds before committing your concept to the costs and scrutiny of a public cloud running on Amazon’s servers. And private clouds, such as the one we create in the main text, can give you an excellent feel for how the technology works, and how you and your company may find it useful.

Troubleshooting

As you might have noticed, despite the Ubuntu installation being the easiest way of getting a usable cloud system, it’s still far from easy. There are a lot of aspect to the configuration that can go wrong, from the hardware that you choose, to the network that the various machines are communicating across. Another problem is that the Eucalyptus project is only in its infancy, with Ubuntu 9.10 being the first distribution to include packages by default. This means there isn’t a great deal of support, especially if you’re new to the world of clouds.

But there are several things you can do to help solve problems yourself. Firstly, you should run ‘euca-describe-availability-zones verbose’ to check whether your node machine has been detected and added to the cloud. If you have zero resources available, then it hasn’t. The most common cause for this problem is that the synchronisation of the keys from the backend to the node has failed, stopping the node from registering itself. Check ‘/var/lib/eucalyptus/keys’ on both machines to make sure the keys are available. If the keys on the backend are missing, trying running ‘/etc/init.d/eucalyptus-cc-registration restart’ to regenerate the keys, then then try to add the node again using ‘euca_conf –no-rsync –discover-nodes’. If all else fails, try ‘sudo /etc/init.d/eucalyptus-sc-registration restart’ too.

We also had problems trying to register nodes after updating the system with new packages. The solution to this problem is to either add nodes to the cluster before you update any packages, or leave your installation with the packages include with the default install. This shouldn’t matter for an experimental installation. Finally, if all else fails, trawl through files within the /var/log/eucalyptus directory on both machines, as this should give you some idea of where your setup may be failing.

Tags: acronym, API, application, cell, Computer, Computing, cores, CPU, desktop, directory, Environment, facebook, google, Hardware, information, interface, ip address, ip addresses, iss, Jobs, linux, memory, network, Networking, partition, processing power, requirement, rms, Science, security, Server, servers, sftp, Software, storage, system, Technology, tools, type, utility computing, virtual machine, virtualisation, web, XP
Mar 22

The future of scalable, distributed computing is in the cloud. This is a nebulous concept that’s supposed to describe a cloud’s perpetual elasticity in providing online storage, processing and bandwidth. There are user-facing cloud services such as Google’s web-based applications, the Ulteo online desktop and Canonical’s One service for silently cross-computer folder synchronicity. And there are the more ambitious services of Amazon’s EC2 platform, enabling websites like Facebook to expand and contract their resource requirements in realtime, catering for peaks and troughs in demand and only paying for the capacity and the CPU cycles it actually uses.

Tux: the ideal combination of experienced mountain climber and feathered networking expert.

This latter kind of cloud is a big enterprise oriented subject, and Facebook-like scenarios are way off the scale when it comes to the ordinary Linux user. But Linux is at the heart of many of these installations, and there’s plenty of enterprise cloud technology left lying around for us mortals to play with. Incredibly, considering it’s user-friendly approach to Linux, that includes the latest release of Ubuntu.

Ubuntu 9.10 bundles something called ‘Eucalyptus’, an open source tool for generating private clouds that can dynamically connect to Amazon EC2. Eucalyptus, an acronym for ‘Elastic Utility Computing Architecture Linking Your Programs To Useful Systems’, originated as an ambitious project run by the Computer Science Department at the University of California. But it quickly became apparent that the technology it was developing was in great demand, and professor Rich Wolski, as well as many of his students took sabbatical leave from their day jobs and founded Eucalyptus. So far, they haven’t gone back.

STEP 1: Getting Started

Before trying Eucalyptus for yourself, there are some demanding requirements. Installation is relatively straightforward, but you will need to use the Linux command line, and possibly, troubleshoot any problems by reading the log files.

You will need at least two machines. One for front-end management and another to act as a single nodes on the cluster. Clouds like these rely on virtualisation to provide the elasticity and software scaleability of the hardware doing the processing. They run virtualised instances, called images, of your chosen operating system. In the case of Eucalyptus, virtualisation is handled by either KVM or Xen at the kernel level, which is why you need a VT-enabled CPU for the node machines, along with plenty of horsepower, memory and storage. It also means that you will need to configure your cloud to do something useful after you’ve got it working, just as you would a standard Ubuntu server installation.

You’ll be given the option of installing a new cluster or adding a node to an existing one.

The machine used to control and manage your cloud is usually referred to as the backend, and to get started you need to insert the Ubuntu 9.10 Server into its drive and reboot. When you see the boot menu, select a language followed by ‘Install Ubuntu Enterprise Cloud’. Choose language, location and keyboard layout, then enter a host name, we used ubuntu1. The next step will ask whether you want to create a ‘Cluster’ or a ‘Node’, and you need to select the first option, ‘Cluster’.

You will then need to work through the standard Ubuntu partition options. Ideally, use the entire disk for the installation unless you need to keep data on the machine, and leave the most options at their default values. The installer will then go off and create the partitions it needs then download and install a few packages. After this, it will ask for the default username and password for the machine, and whether you need your home directory encrypting. Skip the HTTP proxy question and leave the automatic updates off for now, and set Postfix to ‘No Configuration’.

STEP 2: Network and Nodes

We now have a couple of questions that deal specifically with the Eucalyptus configuration. The first asks for a cluster name, and you can call it what you like. This is the name people will see if they access your cloud. The second question asks for a range of IP addresses on your LAN that Eucalyptus can safely use to assign to each node. To answer this, you need to know the range of addresses that your router is using.

Many routers on a home network, for example, will issue IP addresses in the range of 192.168.1.2 – 192.168.1.100, or similar. You need to find this information from your router and enter a range that isn’t going to be assigned automatically from the router but is on the same domain. ‘192.168.1.100-192.168.1.200’ would work with our previous example. But for the sake of our experiment, you only need to find a couple of spare addresses on the same domain.

It’s best to assign a range of currently-free IP addresses to make up your cluster. It saves a lot of hassle later on.

After entering these details, the remainder of the packages will be installed and you’ll be asked to reboot the machine without the disc in the drive. When the backend reboots, login to your account. You should see the IP address for the server displayed and the message about documentation, and you need to make sure this address in within your network’s range.

It’s now time to tackle the node. Take the same disc you used for the backend and use it to boot your node machine. Choose ‘Install Ubuntu Enterprise Cloud’ from the boot menu again, and go through the first few questions. After entering a new hostname, you should be told that there’s already a Eucalyptus cluster controller on your network, and ‘Node’ will be selected automatically. Just press return. Installation will now be identical to the earlier install, only without any further Eucalyptus questions. Even your user name and password are grabbed from the backend machine, and at the end of the process, you can reboot and the node is now running.

STEP 3: Configuration

On each machine, you should login and type ‘apt-get upgrade’ to download and install the latest package updates for the system. You’ll probably have to reboot each machine again.

We now need to tell the backend about the existence of our single node. Login to the backend machine, and type ‘sudo euca_conf –no-rsync –discover-nodes’. You should see something like the following:

New node found on 192.168.1.62; add it? [Yn]

Just press return, and you should see that the two machines connect and synchronise a pair of keys that will be used to authenticate future connections. Now launch a browser from any other machine on the LAN and go to ‘https://backend_ip:8443’. You will get a security warning about the unverified nature of the certificate used for the connection, but you’ll need to add an exception for this site. Firefox will step you through this process automatically.

You should run the configuration tool as a super user.

You will then see the ‘Eucalyptus Enterprise Cloud’ login screen. Enter ‘admin’ for username and the password, and you’ll be immediately asked to enter a new administrator password, and check the IP address of the server so that a new certificate can be generated. After clicking submit, you’ll find yourself at the Eucalyptus management console, but before we can get stuck into the details, we need to download something called a credentials file, that we can use to authenticate our own tinkering with the server, as well as any other cloud service you may want to use to expand your installed.

STEP 4: Credentials

Click on the Credentials tab followed by the ‘Download Credentials’ button. This will leave you with a zip file that you will need to transfer to your user account on the backend machine. The easiest way is from the command line, using ‘sftp backend_ip’ to connect to the machine, followed by the command ‘put’ with by the path to the zip file, ‘put euca2-admin-x509.zip’ for instance. On the backend, you then need to unzip the file into a hidden folder in your home directory with ‘unzip -d ~/.euca euca2-admin-x509.zip’ and run the script it contains that configures various environmental variables and keys for managing the cloud, ‘. ~/.euca/eucarc’. This script will need to be run whenever you re-connect to your backend system.

Once it’s properly installed, Eucalyptus’ web interface looks after much of the heavy lifting.

To check that everything is working as it should be, type ‘euca-describe-availability-zones verbose’. The output should look like the following:

AVAILABILITYZONE pcp_cloud 192.168.1.48
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5

This is important information. The critical data is beneath the free/max columns. This shows the CPU cores available on your cloud for immediate use. The more nodes you have on the network, the higher the number here. If you see only zeros, then there has been a problem with the node starting the appropriate controller, and you’ll need to check its /var/log/eucalyptus’ directory for the log files.

STEP 5: Run an image

It’s now time to create a virtual instance of a machine to run on our node. Eucalyptus makes this very easy because it allows you to download pre-built packages.

Go back to the browser and click on the ‘Store’ tab in the management console and choose a pre-built image to download. We opted for Karmic Koala (i386), which is a 174MB download. You will also have to wait a couple of minutes for the image to be installed after the download has completed, but before long you should see the message ‘How to run?’, and you’re ready to go. We did have problems with our backend machine hanging at this point, but we added more memory to the system and it worked though a hitch.

You’re provided with a selection of pre-configured virtual machines just begging to be run on your new cloud.

Before we go back to the command line, click on the ‘How to run’ link next to the download, and make a note of the ‘emi’ value at the end of the command. This is the unique identifier for the image, and we’ll need to use this when we run it. Back on the command line, type ‘touch ~/.euca/mykey.priv; chmod 0600 ~/.euca/mykey.priv; euca-add-keypair mykey > ~/.euca/mykey.priv’ to create a key pair to authenticate the connection to your running image. Then type ‘euca-describe-groups’ followed by ‘euca-authorize default -P tcp -p 22 -s 0.0.0.0/0’ to enable SSH access to the virtualised machine, and finally, type the following to launch it:

euca-run-instances -k mykey -t c1.medium emi

Replace the emi value with the identifier you took from the web interface. You should also notice that we’ve specified ‘c1.medium’ as the type of image, and this is a preset for the amount of resources the image is given. These can be viewed and modified using the Configuration page of the web management interface. It may take a while to initialise as there’s a lot of data being copied across the network, but you can check it’s status with the ‘euca-describe-instances’ command, and wait for it’s state to switch to ‘running’ rather than ‘pending’. On our hardware, this took about 20 minutes.

You’ll also see the IP address of the new instance, and you can now connect to your new cloud server by typing ‘ssh -i ~/.euca/mykey.priv ubuntu@ip_address’. You’re now ready to install and run your mind-blowing web 2.0 application!

See also: Amazon EC2

Private clouds, of the kind we’ve started to build above, are an excellent raw resource if you need a dynamic pool of processing power. But the beauty of cloud computing is that it’s designed to be elastic, and this means you needn’t be restricted by the physical limitations of your own setup. With Eucalyptus, for example, you can take exactly the same images you’re running on your local machines, and move them to Amazon’s EC2 service as and when you need the extra horsepower.

This gives you relatively unlimited resources in terms of processing power, storage and network bandwidth, and you only have to pay for what you need, rather than the old model of paying for a rack of servers somewhere that spend most of their time idle. Eucalyptus manages this by building an API that’s compatible with Amazon’s, making the images that you configure on your local network drop-in compatible with the images that run on Amazon’s servers. Also, the tools that you use to manage and maintain your private cloud are the same you use to manage an Amazon-hosted cloud, making the transition between the two almost seamless.

There are other advantages to this compatibility too. You can create, test and experiment with private clouds before committing your concept to the costs and scrutiny of a public cloud running on Amazon’s servers. And private clouds, such as the one we create in the main text, can give you an excellent feel for how the technology works, and how you and your company may find it useful.

Troubleshooting

As you might have noticed, despite the Ubuntu installation being the easiest way of getting a usable cloud system, it’s still far from easy. There are a lot of aspect to the configuration that can go wrong, from the hardware that you choose, to the network that the various machines are communicating across. Another problem is that the Eucalyptus project is only in its infancy, with Ubuntu 9.10 being the first distribution to include packages by default. This means there isn’t a great deal of support, especially if you’re new to the world of clouds.

But there are several things you can do to help solve problems yourself. Firstly, you should run ‘euca-describe-availability-zones verbose’ to check whether your node machine has been detected and added to the cloud. If you have zero resources available, then it hasn’t. The most common cause for this problem is that the synchronisation of the keys from the backend to the node has failed, stopping the node from registering itself. Check ‘/var/lib/eucalyptus/keys’ on both machines to make sure the keys are available. If the keys on the backend are missing, trying running ‘/etc/init.d/eucalyptus-cc-registration restart’ to regenerate the keys, then then try to add the node again using ‘euca_conf –no-rsync –discover-nodes’. If all else fails, try ‘sudo /etc/init.d/eucalyptus-sc-registration restart’ too.

We also had problems trying to register nodes after updating the system with new packages. The solution to this problem is to either add nodes to the cluster before you update any packages, or leave your installation with the packages include with the default install. This shouldn’t matter for an experimental installation. Finally, if all else fails, trawl through files within the /var/log/eucalyptus directory on both machines, as this should give you some idea of where your setup may be failing.

Tags: acronym, API, application, cell, Computer, Computing, cores, CPU, desktop, directory, Environment, facebook, google, Hardware, information, interface, ip address, ip addresses, iss, Jobs, linux, memory, network, Networking, partition, processing power, requirement, rms, Science, security, Server, servers, sftp, Software, storage, system, Technology, tools, type, utility computing, virtual machine, virtualisation, web, XP