May 13

Linux is blessed with a variety of methods for accessing, expanding and building dynamic, scalable storage solutions. And they’re not just for industry experts either – even everyday Linux users can get access to a great deal of functionality by sharing and accessing data across a network.

Set up iSCSI and your data will stream from your PC like a magical rainbow. Perhaps. If you’re very lucky.

But there’s a new generation of solutions that can increase performance, offer more features and improve security – one of which is iSCSI. As its name implies, iSCSI is a version of the SCSI protocol, which is responsible for shuffling data between various SCSI devices. The key difference with iSCSI is that instead of transporting data across local cables and buses, the ‘i’ takes that data across the internet or your local network, attaching remote storage devices to your local system. It’s perfect for storage area networks (SANs) in particular, and you’ll find network administrators in enterprises of all sizes singing the praises of iSCSI’s ability to combine big-business fibre channel commands with generic networking hardware, saving their departments thousands of pounds in both hardware and infrastructure costs. iSCSI is also a great general-purpose remote storage solution that could easily replace the NFS protocol.

Understanding iSCSI

You can now find iSCSI in many network-attached storage (NAS) devices for the home and even standard Linux installations. This is because it’s a great storage protocol for an expanding network. It might be hard work to master, but you can build a working configuration with relative ease. The great advantage that iSCSI has over some of the alternatives is that drives are exported as block devices, just as blocks of data would be transported over an old SCSI cable. This means that, to the Linux kernel, these drives are handled exactly like local block devices. This makes it perfect for connecting the storage area of a database to the client application or, more recently, the virtual storage devices used by VMware and VirtualBox virtual machines. It’s also ideal if you want to run a machine without any local storage – traditionally the domain of NFS.

Use ‘apt-get install iscsitarget’ to add the appropriate packages to your installation.

Before getting started, your first step should be to familiarise yourself with some of the concepts that are used by iSCSI. These appear to take their names from the Terminator films, with the two most important being the Initiator and the Target. The Initiator is the iSCSI equivalent of the client. It’s the machine you want to have access to the remote data – the one that’s running the applications, or your desktop. The Target is the place the Initiator grabs the data from – another machine running the iSCSI server software and managing the requests to and from the storage medium. You’ll often find NAS drives running the Target server, for example, and you’ll only need to run the Initiator on your local machine in order to access the Target drive.

Set up the Target

It’s possible to share a large variety of different storage types over iSCSI, but the easiest to configure are entire drives. iSCSI connects devices at the block level, which means the job of partitioning and formatting can be left to the Initiator rather than the machine that’s attached directly to the hard drive. On our system, the drive we used was listed as /dev/sdb, and we’ll stick with this example throughout this tutorial. The system drive is normally /dev/sda. To find out what yours is, type fdisk -l on the command line for an overview of what’s connected to your system.

Edit the configuration file in nano. You’ll need to add a line which ensures iSCSI is switched on.

On the Target machine, you first need to install the ‘iscsitarget’ package. You can do this either from the Synaptic package manager or by typing sudo apt-get install iscsitarget on the command line. This will also install several configuration files, and the first of these that we need to look at is ‘/etc/default/iscsitarget’. You can type sudo nano followed by the path to the file name to edit it on the command line, or use your favourite desktop editor if you prefer. We only need to make a single edit here – making sure ‘ISCSITARGET_ENABLE=true’ is the only line in the file. Nano users need to press [Ctrl]+[X], then [Y] to save the changes.

Edit configuration files

The next file we need to edit is ‘/etc/ietd.conf’, which you’ll need to open in an editor. This document contains a working configuration, with lines using the # symbol designated as comments and therefore out of action.

The first line we need to edit defines what’s known as the iSCSI qualified name, or IQN for short. Just search for Target, followed by iqn. As with pages on the net, this IQN name has to be unique to your installation, and it takes the form of ‘iqn’, followed by the year and month, then a reversed version of your domain name. This is then followed by a colon and a reference to whatever you’re going to call the target storage device. This can be anything you like. Here’s what we chose:

iqn.2010-01.com.example:storage.disk2

Next, we need to define the drive that’s going to be shared over iSCSI. Remove the # symbol from the line starting with ‘Lun’ and modify it to read Lun 0 Path=/dev/sdb,Type=fileio. You need to change ‘/dev/sdb’ to the location of the drive that you’ve decided to share. Next, uncomment the Alias line at the bottom of the current section and save the file.

You’ll need to do some deeper editing to the ‘/etc/ietd.conf’ file.

You may have noticed that there are two lines available for a username and password, but we’re going to leave these untouched for now because we’re running our iSCSI device over a trusted network. You can always come back to this point and change the configuration after you’ve got the basic connection working, if you think that you need to.

To enable the device to be shared, open ‘/etc/initiators.allow’ and add iqn.2010-01.com.example:storage.disk2 ALL. When you get the connection working, you’ll need to change ‘ALL’ to the IP address of the machines allowed to access your iSCSI device, but for now we’re trying to remove all obstacles that could stop us getting the connection working. Start the Target server by typing sudo /etc/init.d/iscsitarget start.

Set up the Initiator

It’s now the turn of the other machine, the Initiator. To begin, you need to install the ‘open-iscsi’ package. Once that’s done, open the ‘/etc/iscsi/iscsid.conf’ configuration file and look for the line starting with ‘node.startup’. Change the default value of ‘manual’ to automatic, save the file and restart the service by typing sudo /etc/init.d/open-iscsi restart.

Make sure you change the node startup option to ‘manual’.

Almost everything is now configured and ready to go. Our next step is to probe the Target machine to see what storage services it offers, hopefully listing our drive in the process. Type iscsiadm -m discovery -t st -p, followed by the IP address of the Target. You can find this by typing ipconfig on the Target machine. If everything is set up correctly, you should see the IQN of your drive as output to the iscsiadm. This is the output we received:

iscsiadm -m discovery -t st -p 192.168.1.61
192.168.1.61:3260,1 iqn.2010-01.com.example:storage.disk2

Now you know that the Target machine is correctly configured and that iSCSI can see the remote storage device, you need to type iscsiadm -m node. This will automatically create configuration files within /var/lib/iscsi/nodes for the storage unit on your local machine, which will allow the Initiator to mount the device automatically when the service is restarted. You may need to do this manually by typing sudo /etc/init.d/open-iscsi restart.

The remote drive should now be mounted locally. The best way to check is to type fdisk -l to list all the storage devices attached to your machine. The iSCSI drive should be part of the output, but you won’t see any indication that it’s being connected over a network. That’s the great advantage of using iSCSI.

Mount the drive

iSCSI passes the block information, so you need to format the remote drive from the Initiator machine before it can be used. This process is exactly the same as partitioning and formatting any other drive on your system. You could use fdisk on the command line, for example, or you could take the easy route and use a graphical partitioning tool like GParted, which you can install through your distro’s package manager.

After launching GParted, you first need to select the remote drive from the dropdown list in the top-right of the main window. As with fdisk, the iSCSI drive looks exactly like a local drive, so you need to take special care to select the correct device. You can lose data from other drives permanently with GParted, so proceed with caution.

If you’re hooked up to your target drive you should be able to treat it like any locally connected module – including partitioning it.

With the remote drive selected, click on the large grey area in the middle of the window marked ‘unallocated’. This is the unpartitioned area on your remote drive. Click on ‘New’. By default GParted will use the entire disk, but you’re free to subdivide the remote drive in exactly the same way you would a local one if you prefer. Leave ‘Primary Partition’ selected, then select ‘ext4’ as the filesystem and give your drive a meaningful label before clicking ‘Add’. Click on ‘Apply’ to make the changes to the remote drive, and then simply sit back and wait while the formatting process finishes.

Now that the drive is correctly formatted and partitioned, it’s time to mount it onto the local filesystem so you can start reading and writing data to it. You can use the ‘mount’ command to attach the drive just as you would any other, but there’s one small difference that you have to consider whenever you use multiple iSCSI devices – they might not always have the same /dev path. The solution is to navigate to the device using the /dev/disk/by-path/ nodes, so you can be sure you’re getting the same disk every time.

If you type ls /dev/disk/by-path, for example, you can easily see which devices are using the IQN address. In our example, we mounted the remote drive onto the local /mnt/iscsi folder with the following commands:

mkdir /mnt/iscsi
mount /dev/disk/by-path/ip-192.168.1.61\:3260-iscsi-iqn.2010-01.com.example\:storage.disk2-lun-0-part1 /mnt/iscsi

You can now read and write files to the /mnt/iscsi mount point, and these are passed directly to the remote drive just as if they were connected using an extremely long SCSI cable.

iSCSI on NAS boxes

Two of the trickiest parts of using iSCSI are finding the hardware and configuring the Target machine, but luckily there may be another option if you have a Linux-based network-attached storage box. Many of these will offer up a chunk of the storage inside the box over an iSCSI connection, and you can usually activate and configure this facility with just a couple of clicks. Some of these will let multiple Initiators access a single target and enable you to add CHAP password authentication. Then it’s simply a case of running the iSCSI discovery procedure on your Initiator hardware. If you’ve chosen to use a password and username, you’ll need to run the ‘iscsiadm’ command on the Initiator to add those values to the configuration file for that Target. The command takes the following format:

iscsiadm -m node --targetname IQN --portal IP AND PORT OF TARGET --op=update --name node.session.auth.

You need to run this three times, changing the end parameters each time. With the first execution, add authmethod –value=CHAP to the authentication check. For the second, add username –value=username to specify the username for the connection, and for the third execution add password –value=password to specify the password.

Create a virtual hard drive

Many more technical distributions, such as Fedora, use logical volume management (LVM) for local storage and filesystems. One of the advantages of LVM is that you can shrink, expand and create partitions on the fly from the pool of storage on your machine. Logical volumes can be used for all sorts of tasks. Virtual machines often use them to keep data from the main system, and they’re handy if you enjoy playing with filesystems. They’re also useful if you want to try iSCSI, because you don’t need to have a spare hard drive – you just need the space within your logical volume pool.

The key to adding new virtual drives is the ‘lvcreate’ command. We used the following command to create a 10GB logical volume: lvcreate -L10G -n vdrive vg. This creates a volume called ‘vdrive’ in a volume group called ‘vg’. You’ll need to take a look in your /dev directory to discover the name of the logical volume group used by your installation. After creating the drive, it appears in the /dev volume tree just like any other device and you can share it across the iSCSI connection like a real hard drive.

Even without LVM, there are other options for dynamically shared storage. You could create an image file, for example, by typing dd if=/dev/zero of=/mnt/iscsi.img bs=1024k count=1000. The ‘count’ value is the size of the image, while ‘/mnt/iscsi.img’ is the file that’s created. You can use that path as the source for the iSCSI Target on the ‘Lun 0’ line in the ‘/etc/ietd.conf’ configuration file, and use it like a real partition.

Virtualisation

iSCSI is commonly used by cloud applications and the world of virtualisation. This is so you can access the same hard drive data regardless of where the virtual machine is running. If you’re in the market for a major, enterprise-grade virtualisation solution that uses iSCSI, take a look at VMware’s ESXi solution, which is available for free from www.vmware.com.

Boot options

If you want the Target iSCSI device to be available each time you boot the Initiator machine, you will need to add the remote device to the ‘/etc/fstab’ file. The quickest way to do this is to copy another line in the file and change the parameters to suit the iSCSI device. Make sure the path uses the ‘by-name’ format so that you can be sure you get the same drive at each boot.

Apr 27

Profiting from Linux doesn’t involve an obvious winning formula. There are as many different business models as there are distributions, and you seldom find much overlap between those that are working. Instead, you find something more like the world of medieval patronage. It’s a place where the great distribution families fight for favour, sponsoring masked balls and conferences, while trying to attract geek heroes to work under their flags.

Yet despite this somewhat unconventional system, there’s no doubt that profit drives development, progress and ambition. You only have to look as far as the kernel at the heart of the system to see the proof. There may be terabytes of projected idealism in there, but a significant proportion is more prosaically driven and developed by business. When kernel supremo Greg Kroah-Hartman last collated contributions in 2008, he found some surprising statistics. For example, there may be over a thousand developers working on the latest versions, but a mere 10 individuals were responsible for almost 15 per cent of the code added to the kernel in the three years up to 2008. A further 20 individuals take that total to 30 per cent.

This is remarkable when you consider that version 2.6.33 of the kernel – released in February – more than doubles the number of lines of code found in the vanilla kernel from December 2003. That’s 13 million lines, an assemblage that has recently been estimated to be worth over £900million by researchers at the University of Oviedo in Spain.

What’s even more interesting is the list of who’s paying for these contributions. Top of this chart are independent developers, accounting for almost 14 per cent of the work, and these are closely followed by a group whose company sponsors are unknown. Third in the chart is Red Hat, followed by Novell, IBM and Intel. These four accounted for almost a third of kernel development in 2008, and there can be no doubt that each company’s motivation is far from altruistic.

If you had any doubts, note that Microsoft joined the party last July by contributing 20,000 lines of its own code to the kernel to help people running virtualized Linux from their Windows machines. Even though it’s more likely that this act was crafted to avoid legal action, rather than a flush of generosity, it still illustrates that the kernel has become a cut-throat gladiatorial arena in the fight for big business.

However, there’s one influential distribution that is completely absent from Greg’s Hall of Fame. Like the others, it’s a commercial endeavour. Unlike them, it neither makes a profit nor has a clear business model. This is despite it being the number one distro. Ubuntu, developed by Canonical, is the perfect example of what happens when Linux ideology and business meet.

Canonical is a global enterprise, funded by Mark Shuttleworth’s sale of Thawte for $575million in 1999. The charismatic figurehead combined coding skills with a penchant for space flight. He selected his early Ubuntu team by taking six months’ of Debian mailing-list archives with him on an ice-breaker to Antarctica, and choosing those whose comments and contributions he judged most worthy.

Despite its market share, Canonical is desperately trying to find a revenue stream from somewhere. It’s a sign of the sober, business-oriented times that Shuttleworth’s successor, Jane Silbur, isn’t an ex-astronaut, but ex-Vice President of Command and Control Systems at General Dynamics C4 Systems.

At the enterprise-end of the business, it’s chasing clouds by bundling an Amazon EC2 compatible server/client solution and selling support. At the user-end, Canonical has just launched its own online music store. This is a service heavily tied to its own remote storage service, which is used to download and store the tracks you buy, available only in the patent-riddled and legally ambiguous MP3 format.

Mark Shuttleworth’s parting blog post proclaimed he’ll be concentrating on “product design, partnerships and customers”, and the proof seems visible in the new release of Ubuntu. But Mark has also signalled another change for Ubuntu users. In a recent mailing list post dealing with community criticism of recent design changes, Mark wrote, “This is not a democracy. Good feedback, good data, are welcome. But we are not voting on design decisions.” These aren’t the words of a geek with his head in the stars, but of a man who has fallen to earth. And things have changed.

Jul 12

While enjoying holidays, I read the book “Programming Amazon Web Services” by James Murty. As explained in my earlier post, I was most interested to learn how cloud computing could be leveraged for developing integration solutions.

The book discusses 5 Amazon Web Services (AWS):

  • Simple Storage Service (S3)
  • Elastic Cloud Computing (EC2), virtual Linux servers on demand
  • Simple Queue Service (SQS), to deliver short messages
  • Flexible Payment Service
  • SimpleDB – simple database with no SQL support

The book goes into quite some technical detail and has code snippets showing in detail how to interact with the Amazon services. All the samples are written in Ruby. I don’t know Ruby, but the code is quite readable (should read Enterprise Integration with Ruby some day). The author prefers the REST and the Query API. Unfortunately, he does not show anywhere the use of the SOAP API to access Amazon WS.

The 1st chapter is introductory and e.g. explains how to use self-signed certificates to connect with AWS, explains how AWS were developed for internal use by Amazon and later turned into a products, come without an SLA (except for S3) and without real support.

In the 2nd chapter, the author builds up a library of Ruby code to access the Amazon Web Services. This is very well written and gives an immediate feeling for some aspects to take into account, e.g. clock differences.

S3 is covered in chapters 3 and 4. No standard file access but the use of buckets and objects through a non-standard API (REST or SOAP); no FTP, WebDAV or SFTP. And objects cannot be modified: only deleted and re-created (after the deletion has propagated). Ruby code is shown for all the options the API offers: bucket creation/lookup/deletion, object creation/listing/deletion, ACL update/retrieval and access logging file retrieval. Tricks with HTTP header fields (object metadata), posting data through forms, alternative hostnames and BitTorrent are discussed. The last part discusses signed URI’s: this is a neat trick to make S3 resources temporarily accessible to users without Amazon account.

Chapter 4 shows some applications of the S3 service: large file transfer, backup, turning S3 into a file system (with FTP or WebDAV). Interesting to note that the author has his doubts wrt. exposing S3 as a file system. The author also discusses his own Java open source application: JetS3t. This application is a “gatekeeper” for S3 resources and authorizes local agent applications after acquiring signed URL to upload files to S3 and download files from S3.

Chapter 5, 6 and 7 dive into EC2 and how virtual Linux systems (based on Xen) can be configured using Amazon Machine Images. Ruby code is shown for every available API: keypairs (for SSH access), network security (dynamically configure the firewall), images and instances. Chapter 6 explains instances in more detail and discusses how to create new images. This involves quite some commands and scripts at the Linux command prompt. Chapter 7 discusses some sample applications: VPN server, web photo album thereby backing up data on S3. Chapter 7 also discusses issues around dynamically assigned IP addresses and the use of dynamic DNS.

The Simple Queue Service (SQS) is discussed in chapters 8 and 9. Because of the small message size, SQS is clearly meant for events with actual data stored on S3 (or elsewhere). Again Ruby code to manipulate queues and messages. Chapter 9 describes a Messaging Simulator application, not that relevant in my opinion. The 2nd application – leveraging a video conversion tool – shows how to build generic service for implementing “batch” services (Command Message pattern). The 3rd application – LifeGuard – leverages SQS to manage EC2 instance pools and dynamically scale the number of EC2 instances.

The chapter on payment service I skipped and I only skimmed through the SimpleDB chapter. Enough to learn that SimpleDB is not an RDBMS but a basic storage mechanism (no data types) with proprietary query facilities (no SQL).

The author writes fluently and gives a non-biased view on the Amazon Web Services. Sometimes the code goes into too much detail, showing how to invoke every available method of the API. Although the book is very recent (March 2008), important new features such as elastic IP addresses, persistent storage for EC2 and availability zones weren’t yet available at the time of writing. The book definitely taught me that AWS is quite proprietary and not that trivial. And to use Amazon’s cloud computing and AWS, you’d better “think like Amazon”.