Mar 23

The future of scalable, distributed computing is in the cloud. This is a nebulous concept that’s supposed to describe a cloud’s perpetual elasticity in providing online storage, processing and bandwidth. There are user-facing cloud services such as Google’s web-based applications, the Ulteo online desktop and Canonical’s One service for silently cross-computer folder synchronicity. And there are the more ambitious services of Amazon’s EC2 platform, enabling websites like Facebook to expand and contract their resource requirements in realtime, catering for peaks and troughs in demand and only paying for the capacity and the CPU cycles it actually uses.

Tux: the ideal combination of experienced mountain climber and feathered networking expert.

This latter kind of cloud is a big enterprise oriented subject, and Facebook-like scenarios are way off the scale when it comes to the ordinary Linux user. But Linux is at the heart of many of these installations, and there’s plenty of enterprise cloud technology left lying around for us mortals to play with. Incredibly, considering it’s user-friendly approach to Linux, that includes the latest release of Ubuntu.

Ubuntu 9.10 bundles something called ‘Eucalyptus’, an open source tool for generating private clouds that can dynamically connect to Amazon EC2. Eucalyptus, an acronym for ‘Elastic Utility Computing Architecture Linking Your Programs To Useful Systems’, originated as an ambitious project run by the Computer Science Department at the University of California. But it quickly became apparent that the technology it was developing was in great demand, and professor Rich Wolski, as well as many of his students took sabbatical leave from their day jobs and founded Eucalyptus. So far, they haven’t gone back.

STEP 1: Getting Started

Before trying Eucalyptus for yourself, there are some demanding requirements. Installation is relatively straightforward, but you will need to use the Linux command line, and possibly, troubleshoot any problems by reading the log files.

You will need at least two machines. One for front-end management and another to act as a single nodes on the cluster. Clouds like these rely on virtualisation to provide the elasticity and software scaleability of the hardware doing the processing. They run virtualised instances, called images, of your chosen operating system. In the case of Eucalyptus, virtualisation is handled by either KVM or Xen at the kernel level, which is why you need a VT-enabled CPU for the node machines, along with plenty of horsepower, memory and storage. It also means that you will need to configure your cloud to do something useful after you’ve got it working, just as you would a standard Ubuntu server installation.

You’ll be given the option of installing a new cluster or adding a node to an existing one.

The machine used to control and manage your cloud is usually referred to as the backend, and to get started you need to insert the Ubuntu 9.10 Server into its drive and reboot. When you see the boot menu, select a language followed by ‘Install Ubuntu Enterprise Cloud’. Choose language, location and keyboard layout, then enter a host name, we used ubuntu1. The next step will ask whether you want to create a ‘Cluster’ or a ‘Node’, and you need to select the first option, ‘Cluster’.

You will then need to work through the standard Ubuntu partition options. Ideally, use the entire disk for the installation unless you need to keep data on the machine, and leave the most options at their default values. The installer will then go off and create the partitions it needs then download and install a few packages. After this, it will ask for the default username and password for the machine, and whether you need your home directory encrypting. Skip the HTTP proxy question and leave the automatic updates off for now, and set Postfix to ‘No Configuration’.

STEP 2: Network and Nodes

We now have a couple of questions that deal specifically with the Eucalyptus configuration. The first asks for a cluster name, and you can call it what you like. This is the name people will see if they access your cloud. The second question asks for a range of IP addresses on your LAN that Eucalyptus can safely use to assign to each node. To answer this, you need to know the range of addresses that your router is using.

Many routers on a home network, for example, will issue IP addresses in the range of 192.168.1.2 – 192.168.1.100, or similar. You need to find this information from your router and enter a range that isn’t going to be assigned automatically from the router but is on the same domain. ‘192.168.1.100-192.168.1.200’ would work with our previous example. But for the sake of our experiment, you only need to find a couple of spare addresses on the same domain.

It’s best to assign a range of currently-free IP addresses to make up your cluster. It saves a lot of hassle later on.

After entering these details, the remainder of the packages will be installed and you’ll be asked to reboot the machine without the disc in the drive. When the backend reboots, login to your account. You should see the IP address for the server displayed and the message about documentation, and you need to make sure this address in within your network’s range.

It’s now time to tackle the node. Take the same disc you used for the backend and use it to boot your node machine. Choose ‘Install Ubuntu Enterprise Cloud’ from the boot menu again, and go through the first few questions. After entering a new hostname, you should be told that there’s already a Eucalyptus cluster controller on your network, and ‘Node’ will be selected automatically. Just press return. Installation will now be identical to the earlier install, only without any further Eucalyptus questions. Even your user name and password are grabbed from the backend machine, and at the end of the process, you can reboot and the node is now running.

STEP 3: Configuration

On each machine, you should login and type ‘apt-get upgrade’ to download and install the latest package updates for the system. You’ll probably have to reboot each machine again.

We now need to tell the backend about the existence of our single node. Login to the backend machine, and type ‘sudo euca_conf –no-rsync –discover-nodes’. You should see something like the following:

New node found on 192.168.1.62; add it? [Yn]

Just press return, and you should see that the two machines connect and synchronise a pair of keys that will be used to authenticate future connections. Now launch a browser from any other machine on the LAN and go to ‘https://backend_ip:8443’. You will get a security warning about the unverified nature of the certificate used for the connection, but you’ll need to add an exception for this site. Firefox will step you through this process automatically.

You should run the configuration tool as a super user.

You will then see the ‘Eucalyptus Enterprise Cloud’ login screen. Enter ‘admin’ for username and the password, and you’ll be immediately asked to enter a new administrator password, and check the IP address of the server so that a new certificate can be generated. After clicking submit, you’ll find yourself at the Eucalyptus management console, but before we can get stuck into the details, we need to download something called a credentials file, that we can use to authenticate our own tinkering with the server, as well as any other cloud service you may want to use to expand your installed.

STEP 4: Credentials

Click on the Credentials tab followed by the ‘Download Credentials’ button. This will leave you with a zip file that you will need to transfer to your user account on the backend machine. The easiest way is from the command line, using ‘sftp backend_ip’ to connect to the machine, followed by the command ‘put’ with by the path to the zip file, ‘put euca2-admin-x509.zip’ for instance. On the backend, you then need to unzip the file into a hidden folder in your home directory with ‘unzip -d ~/.euca euca2-admin-x509.zip’ and run the script it contains that configures various environmental variables and keys for managing the cloud, ‘. ~/.euca/eucarc’. This script will need to be run whenever you re-connect to your backend system.

Once it’s properly installed, Eucalyptus’ web interface looks after much of the heavy lifting.

To check that everything is working as it should be, type ‘euca-describe-availability-zones verbose’. The output should look like the following:

AVAILABILITYZONE pcp_cloud 192.168.1.48
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5

This is important information. The critical data is beneath the free/max columns. This shows the CPU cores available on your cloud for immediate use. The more nodes you have on the network, the higher the number here. If you see only zeros, then there has been a problem with the node starting the appropriate controller, and you’ll need to check its /var/log/eucalyptus’ directory for the log files.

STEP 5: Run an image

It’s now time to create a virtual instance of a machine to run on our node. Eucalyptus makes this very easy because it allows you to download pre-built packages.

Go back to the browser and click on the ‘Store’ tab in the management console and choose a pre-built image to download. We opted for Karmic Koala (i386), which is a 174MB download. You will also have to wait a couple of minutes for the image to be installed after the download has completed, but before long you should see the message ‘How to run?’, and you’re ready to go. We did have problems with our backend machine hanging at this point, but we added more memory to the system and it worked though a hitch.

You’re provided with a selection of pre-configured virtual machines just begging to be run on your new cloud.

Before we go back to the command line, click on the ‘How to run’ link next to the download, and make a note of the ‘emi’ value at the end of the command. This is the unique identifier for the image, and we’ll need to use this when we run it. Back on the command line, type ‘touch ~/.euca/mykey.priv; chmod 0600 ~/.euca/mykey.priv; euca-add-keypair mykey > ~/.euca/mykey.priv’ to create a key pair to authenticate the connection to your running image. Then type ‘euca-describe-groups’ followed by ‘euca-authorize default -P tcp -p 22 -s 0.0.0.0/0’ to enable SSH access to the virtualised machine, and finally, type the following to launch it:

euca-run-instances -k mykey -t c1.medium emi

Replace the emi value with the identifier you took from the web interface. You should also notice that we’ve specified ‘c1.medium’ as the type of image, and this is a preset for the amount of resources the image is given. These can be viewed and modified using the Configuration page of the web management interface. It may take a while to initialise as there’s a lot of data being copied across the network, but you can check it’s status with the ‘euca-describe-instances’ command, and wait for it’s state to switch to ‘running’ rather than ‘pending’. On our hardware, this took about 20 minutes.

You’ll also see the IP address of the new instance, and you can now connect to your new cloud server by typing ‘ssh -i ~/.euca/mykey.priv ubuntu@ip_address’. You’re now ready to install and run your mind-blowing web 2.0 application!

See also: Amazon EC2

Private clouds, of the kind we’ve started to build above, are an excellent raw resource if you need a dynamic pool of processing power. But the beauty of cloud computing is that it’s designed to be elastic, and this means you needn’t be restricted by the physical limitations of your own setup. With Eucalyptus, for example, you can take exactly the same images you’re running on your local machines, and move them to Amazon’s EC2 service as and when you need the extra horsepower.

This gives you relatively unlimited resources in terms of processing power, storage and network bandwidth, and you only have to pay for what you need, rather than the old model of paying for a rack of servers somewhere that spend most of their time idle. Eucalyptus manages this by building an API that’s compatible with Amazon’s, making the images that you configure on your local network drop-in compatible with the images that run on Amazon’s servers. Also, the tools that you use to manage and maintain your private cloud are the same you use to manage an Amazon-hosted cloud, making the transition between the two almost seamless.

There are other advantages to this compatibility too. You can create, test and experiment with private clouds before committing your concept to the costs and scrutiny of a public cloud running on Amazon’s servers. And private clouds, such as the one we create in the main text, can give you an excellent feel for how the technology works, and how you and your company may find it useful.

Troubleshooting

As you might have noticed, despite the Ubuntu installation being the easiest way of getting a usable cloud system, it’s still far from easy. There are a lot of aspect to the configuration that can go wrong, from the hardware that you choose, to the network that the various machines are communicating across. Another problem is that the Eucalyptus project is only in its infancy, with Ubuntu 9.10 being the first distribution to include packages by default. This means there isn’t a great deal of support, especially if you’re new to the world of clouds.

But there are several things you can do to help solve problems yourself. Firstly, you should run ‘euca-describe-availability-zones verbose’ to check whether your node machine has been detected and added to the cloud. If you have zero resources available, then it hasn’t. The most common cause for this problem is that the synchronisation of the keys from the backend to the node has failed, stopping the node from registering itself. Check ‘/var/lib/eucalyptus/keys’ on both machines to make sure the keys are available. If the keys on the backend are missing, trying running ‘/etc/init.d/eucalyptus-cc-registration restart’ to regenerate the keys, then then try to add the node again using ‘euca_conf –no-rsync –discover-nodes’. If all else fails, try ‘sudo /etc/init.d/eucalyptus-sc-registration restart’ too.

We also had problems trying to register nodes after updating the system with new packages. The solution to this problem is to either add nodes to the cluster before you update any packages, or leave your installation with the packages include with the default install. This shouldn’t matter for an experimental installation. Finally, if all else fails, trawl through files within the /var/log/eucalyptus directory on both machines, as this should give you some idea of where your setup may be failing.

Mar 22

The future of scalable, distributed computing is in the cloud. This is a nebulous concept that’s supposed to describe a cloud’s perpetual elasticity in providing online storage, processing and bandwidth. There are user-facing cloud services such as Google’s web-based applications, the Ulteo online desktop and Canonical’s One service for silently cross-computer folder synchronicity. And there are the more ambitious services of Amazon’s EC2 platform, enabling websites like Facebook to expand and contract their resource requirements in realtime, catering for peaks and troughs in demand and only paying for the capacity and the CPU cycles it actually uses.

Tux: the ideal combination of experienced mountain climber and feathered networking expert.

This latter kind of cloud is a big enterprise oriented subject, and Facebook-like scenarios are way off the scale when it comes to the ordinary Linux user. But Linux is at the heart of many of these installations, and there’s plenty of enterprise cloud technology left lying around for us mortals to play with. Incredibly, considering it’s user-friendly approach to Linux, that includes the latest release of Ubuntu.

Ubuntu 9.10 bundles something called ‘Eucalyptus’, an open source tool for generating private clouds that can dynamically connect to Amazon EC2. Eucalyptus, an acronym for ‘Elastic Utility Computing Architecture Linking Your Programs To Useful Systems’, originated as an ambitious project run by the Computer Science Department at the University of California. But it quickly became apparent that the technology it was developing was in great demand, and professor Rich Wolski, as well as many of his students took sabbatical leave from their day jobs and founded Eucalyptus. So far, they haven’t gone back.

STEP 1: Getting Started

Before trying Eucalyptus for yourself, there are some demanding requirements. Installation is relatively straightforward, but you will need to use the Linux command line, and possibly, troubleshoot any problems by reading the log files.

You will need at least two machines. One for front-end management and another to act as a single nodes on the cluster. Clouds like these rely on virtualisation to provide the elasticity and software scaleability of the hardware doing the processing. They run virtualised instances, called images, of your chosen operating system. In the case of Eucalyptus, virtualisation is handled by either KVM or Xen at the kernel level, which is why you need a VT-enabled CPU for the node machines, along with plenty of horsepower, memory and storage. It also means that you will need to configure your cloud to do something useful after you’ve got it working, just as you would a standard Ubuntu server installation.

You’ll be given the option of installing a new cluster or adding a node to an existing one.

The machine used to control and manage your cloud is usually referred to as the backend, and to get started you need to insert the Ubuntu 9.10 Server into its drive and reboot. When you see the boot menu, select a language followed by ‘Install Ubuntu Enterprise Cloud’. Choose language, location and keyboard layout, then enter a host name, we used ubuntu1. The next step will ask whether you want to create a ‘Cluster’ or a ‘Node’, and you need to select the first option, ‘Cluster’.

You will then need to work through the standard Ubuntu partition options. Ideally, use the entire disk for the installation unless you need to keep data on the machine, and leave the most options at their default values. The installer will then go off and create the partitions it needs then download and install a few packages. After this, it will ask for the default username and password for the machine, and whether you need your home directory encrypting. Skip the HTTP proxy question and leave the automatic updates off for now, and set Postfix to ‘No Configuration’.

STEP 2: Network and Nodes

We now have a couple of questions that deal specifically with the Eucalyptus configuration. The first asks for a cluster name, and you can call it what you like. This is the name people will see if they access your cloud. The second question asks for a range of IP addresses on your LAN that Eucalyptus can safely use to assign to each node. To answer this, you need to know the range of addresses that your router is using.

Many routers on a home network, for example, will issue IP addresses in the range of 192.168.1.2 – 192.168.1.100, or similar. You need to find this information from your router and enter a range that isn’t going to be assigned automatically from the router but is on the same domain. ‘192.168.1.100-192.168.1.200’ would work with our previous example. But for the sake of our experiment, you only need to find a couple of spare addresses on the same domain.

It’s best to assign a range of currently-free IP addresses to make up your cluster. It saves a lot of hassle later on.

After entering these details, the remainder of the packages will be installed and you’ll be asked to reboot the machine without the disc in the drive. When the backend reboots, login to your account. You should see the IP address for the server displayed and the message about documentation, and you need to make sure this address in within your network’s range.

It’s now time to tackle the node. Take the same disc you used for the backend and use it to boot your node machine. Choose ‘Install Ubuntu Enterprise Cloud’ from the boot menu again, and go through the first few questions. After entering a new hostname, you should be told that there’s already a Eucalyptus cluster controller on your network, and ‘Node’ will be selected automatically. Just press return. Installation will now be identical to the earlier install, only without any further Eucalyptus questions. Even your user name and password are grabbed from the backend machine, and at the end of the process, you can reboot and the node is now running.

STEP 3: Configuration

On each machine, you should login and type ‘apt-get upgrade’ to download and install the latest package updates for the system. You’ll probably have to reboot each machine again.

We now need to tell the backend about the existence of our single node. Login to the backend machine, and type ‘sudo euca_conf –no-rsync –discover-nodes’. You should see something like the following:

New node found on 192.168.1.62; add it? [Yn]

Just press return, and you should see that the two machines connect and synchronise a pair of keys that will be used to authenticate future connections. Now launch a browser from any other machine on the LAN and go to ‘https://backend_ip:8443’. You will get a security warning about the unverified nature of the certificate used for the connection, but you’ll need to add an exception for this site. Firefox will step you through this process automatically.

You should run the configuration tool as a super user.

You will then see the ‘Eucalyptus Enterprise Cloud’ login screen. Enter ‘admin’ for username and the password, and you’ll be immediately asked to enter a new administrator password, and check the IP address of the server so that a new certificate can be generated. After clicking submit, you’ll find yourself at the Eucalyptus management console, but before we can get stuck into the details, we need to download something called a credentials file, that we can use to authenticate our own tinkering with the server, as well as any other cloud service you may want to use to expand your installed.

STEP 4: Credentials

Click on the Credentials tab followed by the ‘Download Credentials’ button. This will leave you with a zip file that you will need to transfer to your user account on the backend machine. The easiest way is from the command line, using ‘sftp backend_ip’ to connect to the machine, followed by the command ‘put’ with by the path to the zip file, ‘put euca2-admin-x509.zip’ for instance. On the backend, you then need to unzip the file into a hidden folder in your home directory with ‘unzip -d ~/.euca euca2-admin-x509.zip’ and run the script it contains that configures various environmental variables and keys for managing the cloud, ‘. ~/.euca/eucarc’. This script will need to be run whenever you re-connect to your backend system.

Once it’s properly installed, Eucalyptus’ web interface looks after much of the heavy lifting.

To check that everything is working as it should be, type ‘euca-describe-availability-zones verbose’. The output should look like the following:

AVAILABILITYZONE pcp_cloud 192.168.1.48
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5

This is important information. The critical data is beneath the free/max columns. This shows the CPU cores available on your cloud for immediate use. The more nodes you have on the network, the higher the number here. If you see only zeros, then there has been a problem with the node starting the appropriate controller, and you’ll need to check its /var/log/eucalyptus’ directory for the log files.

STEP 5: Run an image

It’s now time to create a virtual instance of a machine to run on our node. Eucalyptus makes this very easy because it allows you to download pre-built packages.

Go back to the browser and click on the ‘Store’ tab in the management console and choose a pre-built image to download. We opted for Karmic Koala (i386), which is a 174MB download. You will also have to wait a couple of minutes for the image to be installed after the download has completed, but before long you should see the message ‘How to run?’, and you’re ready to go. We did have problems with our backend machine hanging at this point, but we added more memory to the system and it worked though a hitch.

You’re provided with a selection of pre-configured virtual machines just begging to be run on your new cloud.

Before we go back to the command line, click on the ‘How to run’ link next to the download, and make a note of the ‘emi’ value at the end of the command. This is the unique identifier for the image, and we’ll need to use this when we run it. Back on the command line, type ‘touch ~/.euca/mykey.priv; chmod 0600 ~/.euca/mykey.priv; euca-add-keypair mykey > ~/.euca/mykey.priv’ to create a key pair to authenticate the connection to your running image. Then type ‘euca-describe-groups’ followed by ‘euca-authorize default -P tcp -p 22 -s 0.0.0.0/0’ to enable SSH access to the virtualised machine, and finally, type the following to launch it:

euca-run-instances -k mykey -t c1.medium emi

Replace the emi value with the identifier you took from the web interface. You should also notice that we’ve specified ‘c1.medium’ as the type of image, and this is a preset for the amount of resources the image is given. These can be viewed and modified using the Configuration page of the web management interface. It may take a while to initialise as there’s a lot of data being copied across the network, but you can check it’s status with the ‘euca-describe-instances’ command, and wait for it’s state to switch to ‘running’ rather than ‘pending’. On our hardware, this took about 20 minutes.

You’ll also see the IP address of the new instance, and you can now connect to your new cloud server by typing ‘ssh -i ~/.euca/mykey.priv ubuntu@ip_address’. You’re now ready to install and run your mind-blowing web 2.0 application!

See also: Amazon EC2

Private clouds, of the kind we’ve started to build above, are an excellent raw resource if you need a dynamic pool of processing power. But the beauty of cloud computing is that it’s designed to be elastic, and this means you needn’t be restricted by the physical limitations of your own setup. With Eucalyptus, for example, you can take exactly the same images you’re running on your local machines, and move them to Amazon’s EC2 service as and when you need the extra horsepower.

This gives you relatively unlimited resources in terms of processing power, storage and network bandwidth, and you only have to pay for what you need, rather than the old model of paying for a rack of servers somewhere that spend most of their time idle. Eucalyptus manages this by building an API that’s compatible with Amazon’s, making the images that you configure on your local network drop-in compatible with the images that run on Amazon’s servers. Also, the tools that you use to manage and maintain your private cloud are the same you use to manage an Amazon-hosted cloud, making the transition between the two almost seamless.

There are other advantages to this compatibility too. You can create, test and experiment with private clouds before committing your concept to the costs and scrutiny of a public cloud running on Amazon’s servers. And private clouds, such as the one we create in the main text, can give you an excellent feel for how the technology works, and how you and your company may find it useful.

Troubleshooting

As you might have noticed, despite the Ubuntu installation being the easiest way of getting a usable cloud system, it’s still far from easy. There are a lot of aspect to the configuration that can go wrong, from the hardware that you choose, to the network that the various machines are communicating across. Another problem is that the Eucalyptus project is only in its infancy, with Ubuntu 9.10 being the first distribution to include packages by default. This means there isn’t a great deal of support, especially if you’re new to the world of clouds.

But there are several things you can do to help solve problems yourself. Firstly, you should run ‘euca-describe-availability-zones verbose’ to check whether your node machine has been detected and added to the cloud. If you have zero resources available, then it hasn’t. The most common cause for this problem is that the synchronisation of the keys from the backend to the node has failed, stopping the node from registering itself. Check ‘/var/lib/eucalyptus/keys’ on both machines to make sure the keys are available. If the keys on the backend are missing, trying running ‘/etc/init.d/eucalyptus-cc-registration restart’ to regenerate the keys, then then try to add the node again using ‘euca_conf –no-rsync –discover-nodes’. If all else fails, try ‘sudo /etc/init.d/eucalyptus-sc-registration restart’ too.

We also had problems trying to register nodes after updating the system with new packages. The solution to this problem is to either add nodes to the cluster before you update any packages, or leave your installation with the packages include with the default install. This shouldn’t matter for an experimental installation. Finally, if all else fails, trawl through files within the /var/log/eucalyptus directory on both machines, as this should give you some idea of where your setup may be failing.

Feb 23

Magnetic force microscopes may be able to read overwritten data on a platter, even if it can’t spin.

Data loss: we’ve all experienced it. Maybe you emptied the Recycle Bin milliseconds before realising that you’d deleted the wrong folder the day before; or perhaps your hard disk simply packed up, leaving behind it nothing more than an odd clicking sound and a system error screen. It might have been that the dog really did eat your homework by mistaking your newly burnt DVD for its latest toy. Whatever happened in your case, it resulted in your precious data ascending to that great filesystem in the sky.

Fortunately, dead disks, trigger-happy fingers and scratched CDs don’t always mean that you have to wave goodbye to your data. While not every problem has a happy ending, there are many that do. So when disaster strikes, don’t reach for the proverbial pistol: grab our guide to recovering data instead.

Lost files

Disaster rating: 1/5

Losing a file is a disconcerting experience. If you’re looking for a document but can’t find it anywhere, stay calm. Stop any programs that are writing huge amounts of data to your hard disk and exit as many applications as possible. Deleted files aren’t really deleted: they’re merely marked as disk space ripe for reuse. This means that you want to stop anything that’s writing to disk in case it overwrites the lost data.

When everything has been stopped, have another search for your lost file using Windows’ Search utility. If it’s really not there, your first port of call should be the Recycle Bin. If it’s not in there either, or you habitually use the more brutal [Shift]+[Delete] option, you should advance to the ‘Truly deleted files’ section below.

However, some programs may have already come to your rescue. Word and other Office applications, for example, store temporary versions of files as you’re working on them. Finding and extracting usable data from these temporary files can be complex, but it can enable you to retrieve missing data. Visit Microsoft’s support site for a very comprehensive guide.

Truly deleted files

Disaster rating: 3/5

Although files deleted normally (and therefore simply sent to the Recycle Bin) are easily restored, those ‘properly’ deleted are trickier to recover. However, it’s sometimes possible to resurrect these files using an undelete utility. If you’ve accidentally truly deleted a file, reach for one of these first.

To understand how these tools work their apparent magic, you need to know how a filesystem works and what happens when you delete a file. Each disk (or each partition if your disk is divided into more than one partition) contains a system area that contains directory information about all of its files. For each file, it contains the name of that file and the number of the first cluster (the smallest usable area of a disk) in that file. Another important system area is the File Allocation Table (FAT), which contains information about all the other clusters associated with a file. The FAT entry for a file’s first cluster is the number of the second cluster; in the entry for the second cluster is the number of the third cluster and so on until the entry for the last cluster in the file, which is an end-of-file indicator.

When a file is deleted, Windows does two things. First, it overwrites the first character of the filename in the directory with a question mark to indicate that the entry in the directory can be reused. Second, it overwrites the entries for all the file’s clusters in the FAT with zeros to indicate that those clusters are free to be reused. None of the data in the file is actually overwritten or erased, so, if you move quickly, it may be possible to recover the file. Undelete utilities start by looking in the directory for any filenames that start with a question mark. If the user opts to undelete any such file, the utility goes to the first cluster, as shown in the directory, and reads data from that and subsequent clusters until an end-
of-file marker is found. Note that this method will work only if a file is sequential – if it’s fragmented then it can’t be recovered since the information in the file allocation table that is needed to find non-
sequential clusters will have been overwritten. It’s also important to recognise that although it’s possible to successfully recover a file if you act immediately, the longer you leave it the more likely it is that Windows will have overwritten one or more clusters.

Corrupted filesystem

Disaster rating: 3/5

If your PC is shut down improperly, perhaps due to a power cut or system failure while Windows is in the process of writing to the system areas of the disk, the filesystem could become corrupted. This could result in data being present on your disk that Windows has no knowledge of – not an ideal situation to be in.

To understand exactly how this is corrected, we’d have to get into the intricacies of the filesystem. To cut a long story short, let’s say that software is used to analyse the filesystem and spot inconsistencies. Having found some, and based on information regarding the most likely ways in which a filesystem can become corrupted, the software attempts to rebuild the filesystem so that Windows can access those lost files. The technique is intended to recover data, not mend Windows and its intricate data structures. When your data has been rescued, you’ll need to reformat the disc and reinstall Windows.

Once it’s found the all-important inconsistencies, the software will attempt to figure out where the lost files are, based on information regarding the most likely ways in which a filesystem can become corrupted. Once identified, the lost files can be restored to a separate drive. If you want to have a go, try GetDataBack or >a href=”http://www.r-tt.com”>R-Studio NTFS.

Sometimes this process – and the undelete process described above – isn’t successful even though the data is still present on the disk. In these cases, you can recover files by analysing clusters to determine what type of data they contain. This takes a fair amount of research into the clusters of different types of files. However, if your disk isn’t particularly fragmented, it could be easier to reassemble a lost file than you might think.

Mechanical failure

Disaster rating: 4/5

A hard disk drive consists of a spindle motor, a voice coil motor, a read/write head, a circuit board and a platter, of which only the latter stores any data. So if your drive fails, it’s possible that the data is still present on the platter and the problem is due to a failure of one of the other components. Even an evidently mechanical sound might be nothing more sinister than a failure of the servo circuitry.

Replacing parts is a delicate operation that must be carried out in the safety of a clean-room environment.

If something other than the platter has failed, then the solution is obvious – replace the offending part. You might be tempted to try this yourself, but be warned: it’s tricky. Firstly, unless you’re an expert you won’t know for sure which part needs replacing. Secondly, attempting the operation in anything other than a spotless room is doomed to failure. Finally, you’d have to buy an exactly identical disk drive from which to salvage the parts. Thus we highly recommend employing the services of a data recovery company.

Software like the custom-writter application by MjM Data Recovery enables the company’s engineers to diagnose and correct logical errors to the filesystem

This isn’t a cheap option, so you’ll have to decide whether your lost data is worth it. The good news is that some companies offer a ‘no fix – no fee’ guarantee on repairs, or will first carry out a diagnosis and then provide you with a report that will specify how much data they can be sure of salvaging, should you accept the quotation. If you want to attempt this, try contacting Xytron.

Overwritten files

Disaster rating: 5/5

If your file has been truly deleted and then overwritten, the sad news is that you probably won’t be able to retrieve it. However, there are two theoretical methods that seem promising, so perhaps the situation will be more hopeful in the future.

The first method is to do with variations in magnetic flux. When data is written to disk, the resultant magnetic flux depends mostly on the value written. However, the flux is also provided with a tiny contribution from the overwritten data. So, if you replace the normal read electronics that decide whether a bit is a 1 or a 0 with circuitry that can extract the analogue value from the head, subtracting the known contribution of the most recently written data should make it possible to determine what value was in place before it.

This method of extracting overwritten data is commonly reported and academic papers have been written on the subject, but we’ve been unable to find any organisation that claims to have done it successfully. The problem is that the signal to the previously written data is so small that it effectively gets lost in the random electrical fluctuations commonly referred to as noise. However, Western Digital’s Gerardo Bertero – while questioning whether the technique is really a practical proposition – did suggest one possible solution. By reading each bit thousands if not millions of times and averaging all those signals, electrical noise, being random in nature, would average out to zero – whereas the genuine signal would build up and become visible. The snag is that this would be hugely time-consuming and costly – which is why nobody offers such a service. Whether it becomes feasible when national security is at risk is another question entirely, and one we’re not likely to get an answer to from the Secret Service.

The second theoretical technique for recovering data is to use a Magnetic Force Microscope (MFM). It’s claimed that this method offers an additional benefit – it can read data, overwritten or not, from the surface of a platter that is no longer able to spin because of damage. For the normal read operation of a hard disk, the platter must be able to rotate because an electrical signal can only be produced in the read head’s coil if it’s moving with respect to the magnetic field. However, an MFM is able to read a static magnetic field – so the platter doesn’t have to be able to spin.

An MFM is a laboratory instrument that has an ultra-fine magnetised tip suspended on a cantilever. As the tip moves over the surface of the object being imaged, the magnetic field of that object exerts a force on the tip, which in turn causes a displacement of the cantilever. This is measured using optical techniques. It’s self-evident how an MFM could be used to read data from a platter that can no longer rotate, but what isn’t as obvious is how the technique lends itself to recovering data that’s been overwritten – at least in theory.

Data is written to a hard disk’s platter in concentric circles known as tracks. A highly accurate servo control system is used to position the read/write head over the required track, but, even so, the head isn’t always positioned to exactly the same radial position. What this means is that if a track is overwritten but the head wasn’t at exactly the same position as it was on the previous occasion, a narrow band of the previously written data might remain intact at the edge of the track. Again, despite hearing so much about this technique, and despite the fact that hard disk drive manufacturers already use MFMs for research and development, we haven’t found any company who offers this technique for recovering overwritten data.

We’d have to speculate on whether it’s in use by the military, although the fact that military standards require disks to be physically destroyed at the end of their lives might just suggest that they know it’s feasible.

Scratched discs

Disaster rating: 1/5

Methods such as undeleting files, repairing logical errors to the file structure and reassembling files also work with media such as memory cards and pen drives.

However, there are certain methods of data recovery that apply only to optical disks. Except in the most extreme cases, when a CD or DVD won’t read it’s usually scratches in the outer plastic layer that are causing the problem. The plastic layer is there to provide protection to the layer of data underneath, but scratches can impair the passage of the light used to read the data. The solution is to remove the troublesome scratches using a mild abrasive so that light will pass through the plastic layer without obstruction.

Scratches may prevent CDs from being read, but devices such as Digital Innovations’ SkipDr could restore it.

Although there are reports of this being done successfully using household substances such as Brasso, a more reliable solution is to use a product designed for this job. Such products range from kits comprising a bottle of suitably mild abrasive and a lint-free cloth to mechanical disk polishing devices such as Digital Innovations’ SkipDr. The latter claims a greater chance of success since the polishing process is more uniform and controlled.

Shattered discs

Disaster rating: 4/5

Researchers at the University of Arizona’s Optical Sciences Center have published a peer-reviewed paper in which they describe how they used an optical microscope to extract data from fragments of broken CDs or DVDs. However, the process is time-consuming and so isn’t likely to be viable unless the rewards are huge. What’s more, any data in the vicinity of the breaks is totally unrecoverable, so in many cases we’d be talking of recovering fragments of data rather than files in their entirety. A shattered disc still means lost data – although the future may hold more hope for broken optical discs.