Jun 01

Any operating system that contains the letters B, S and D usually conjures images of geeky elitism, arcane interfaces and the undead world of UNIX. Despite its similarity, this is an image Linux has largely been able to shake off, thanks to its friendly graphical installers and configuration tools. But BSDs can offer a unique insight into what has made Linux popular, as well as an opportunity to hone your command-line and trouble shooting skills in a world that might be getting too easy. And while you do need a little technical confidence to get any BSD system up and running, it’s not half as difficult as it first appears.

FreeBSD is not as demonic as its logo might suggest. Honest.

FreeBSD is a the most popular implementation of version 4.4 of the Berkeley Software Distribution. This was the original BSD, a version of UNIX that was developed between the late-70s and the mid-90s and used a famously liberal licence. This licence has meant that anyone can use, copy and redistribute and re-implement its code and APIs. Which is exactly what FreeBSD attempts to do, alongside other projects like OpenBSD and NetBSD. In turn, there are many projects like Apple’s OS X that build upon the foundations in FreeBSD, all thanks to the liberal licences of the original.

Step 1: Prologue

There are several important differences between FreeBSD and Linux, but the most fundamental is the kernel. The term ‘Linux’ is most often used to refer to the entire operating system, from the boot code and drivers to the desktop and the applications. We’d call Ubuntu, Fedora and OpenSUSE different versions of Linux, for instance. But this definition isn’t accurate. ‘Linux’ should only really refer to the kernel – the chunk of code at the heart of the system that deals with hardware, networking, drivers, storage, CPU and process management, and the BSD kernel is entirely different.

The Linux kernel that was originally developed by Linus Torvalds, and it’s still the only part of the whole operating system he has control over. The remainder of what makes a complete operating system, the windowing environment, the desktops and the applications, are pulled from open source project that are mostly using one of the the GNU Public Licences. Hence, the official name for the entire Linux operating system is really GNU/Linux to show that there are two parts of the whole project. Which is why replacing the kernel isn’t a trivial operation.

Don’t be scared of text mode. It’s part of Linux’s legacy, and serves a very useful purpose.

Any new kernel needs to be broadly compatible with Linux so that the remainder of the software stack can be ported without too much difficulty. Fortunately, both FreeBSD and Linux are UNIX-alike, which means there are many similarities, and the result is that with a standard installation, you’ll find many of the tools you’re already familiar with, albeit in a different configuration.

Many system administrators feel FreeBSD is has been a more stable choice for servers over the years, and that it can out-perform its cousin on certain tasks. It’s also a great choice if you want to run a server on limited hardware, as the requirements for a BSD-based system are often less than for the Linux equivalent. FreeBSD, for example, lists its minimum requirements as a 486 CPU with 24MB RAM, which is quite staggering in today’s world of terabytes and quad core processors. FreeBSD is also a i386-based platform. There are ports to other processors, but the project’s focus has always been compatibility with Intel’s standard architecture, and as a result, could be better suited to the majority of machines that parts of the Linux kernel.

Step 2: Installation

While there are Live CD versions of FreeBSD, the traditional install disc is still the most common medium for getting hold of the latest version. But you’ll need to steel yourself against its antiquated installation mechanism.

When you boot your machine with the disc in the drive, the first thing you’ll see is the black and white ASCII art of the install menu. You won’t see any other graphical embellishment until you’re able to boot successfully into a working desktop. For most installs, you’ll need to choose option 1 from the menu, but if you’re using an older machine, you may want to try 2 (with ACPI disabled) to avoid any potential problems.

Unlike trial Linux installation packages like Wubi, installing BSD has a tendency to be destructive. Make sure you’re not going to obliterate anything important before continuing.

You’ll then have to wait a few moments while various kernel messages scroll by before you’ll see the text-based installation and configuration screen. If you’ve used Debian, this kind of text installer will feel familiar. There’s no linearity to the install process. You can move backwards and forwards through the various options, and continue to make adjustments to the installation until you quit the installer and restart the system.

For a basic, working environment, you will need to do at least the following. Select the second option in the menu to initiate a ‘Standard Install’ and read the information on the screen that follows. The next page displays ‘fdisk’, the disk partitioning and formatting tool. Press F1 for documentation, but if you’re using your entire hard drive for this installation, press ‘a’ to select everything, followed by ‘q’ to apply the changes. This will delete everything currently on the drive, so be careful.

The next page will ask if you want to install a boot loader, which is the menu that lets you choose between whatever operating systems you have installed. Choose the second option (BootMGR), and on the following page, you need to create the various partitions used by FreeBSD using the same fdisk-like interface we’ve just seen.

Once again, if you’re using an entire drive for the installation, you can just press ‘a’ to let the installer create the most appropriate array of partitions followed by ‘q’ to make the changes permanent.

The next page will ask you to choose a distribution. Unlike a Linux distribution, FreeBSD uses the term to refer to the default selection of packages that are to be installed. Select ‘Custom’ and add ‘base’ and ‘kernels > GENERIC’ to your installation.

This will give you enough package to get a working system, and we’ll need to add the desktop environment at a later stage. Return to the ‘Exit’ option at the top of the list and press space to jump into the package installation routine.

Step 3: Post-Install

After all the preliminary configuration has completed, you’ll be asked whether you want to configure any Ethernet or SLIP/PPP devices. Select ‘Yes’ if you are connecting to the internet through your machine’s ethernet port, and you should see your adaptor listed in the top of thew connections list. Choose the adaptor, say no to IPV6, say yes to DHCP and skip through the configuration page to the OK button. Say ‘no’ to your machine being a network gateway,’no’ to enabling any inetd services or running SSH, FTP and NFS server and clients, and don’t edit the console settings. You can safely setup a timezone for your machine and enable the PS2 mouse emulation if you’re using one.

Don’t worry too much about your initial selection of packages. You can easily add more later.

Say ‘Yes’ to the next question, and you’ll now be looking at the package manager. This is where you choose what applications you want to be installed on top of the default option we chose earlier, and there are thousands of packages to choose between. For a simple setup, jump into the ‘x11’ menu and select the ‘kde4-4.3.1′ package. It’s exact name will depend on the version of FreeBSD you’re playing with. If you’re not a fan of Gnome, you could also choose ‘gnome2-2’ from the same list of packages. Selecting either will also mark their dependencies for installation. You also need to select xorg-7, and any other packages you know you’re going to need.

When you’re ready to go, jump back to the top package list, select ‘Install’ and press space. You’ll need to wait a while for all the packages to install. The next step is to create a user account. You can do this by saying ‘Yes’ to the option, then selecting ‘Add User’, and entering a user name in the page that follows. Select OK to make the change permanent and exit from the users and groups menu. You’ll then be asked for the system manager’s password, and you’ll need to type this twice.

After that, you can say ‘No’ to the post-install configuration request and wait for your machine to reboot. You’re now at the point where you should have a basic, working installation, and you can quit from the installation menu and restart your system.

Step 4: Configuration

When your machine re-appears, you’ll be greeted by the sombre monochrome of the command line. Login as ‘root’ with your system manager’s password. For both Gnome and KDE, you need to add the following two lines to the ‘/etc/rc.conf’ configuration file:

dbus_enable="YES"
hald_enable="YES"

Unfortunately, you’re going to need to use the ‘vi’ text editor. Type ‘vi /etc/rc.conf’ to load the file. Press ‘i’ to enter insert mode, move to a new line and type the following. Press escape to exit insert mode, followed by ‘:wq’ (without quotes) to save the changes and quit the editor. Next type ‘reboot’ to restart your system.

FreeBSD doesn’t come with a desktop activated by default, but it’s simple enough to change.

When you get back to the login screen, enter your user account details this time, and when you get dropped back to the command line, type ‘vi .xinitrc’ and add the following line to the file:

exec /usr/local/kde4/bin/startkde4

This is telling your system that when the X.org graphical system starts, you want KDE to be used as your desktop environment. Save and exit vi.

Step 5: Launch Desktop

Usually, at this point, you need to create an ‘xorg.conf’ file to define the display properties for your machine. But recent releases of the X server are able to create a working configuration without any further editing. Which means typing ‘startx’ is all you need to do to launch a graphical environment running KDE. If this doesn’t work, then you will need to create create a working /etc/X11/xorg.conf file.

But with FreeBSD 8, it’s more likely that you are now looking at KDE running through its Akanadi porting routines as it builds up a configuration for your desktop. After a couple of minutes, this will leave you with a KDE desktop running on-top of FreeBSD, and you’ve just earned another trophy for your awards cabinet.

At long last: a GUI! And one of the more stable interfaces you’ll find. Here’s hoping BSD serves you well.

This is exactly the same KDE you’ll find on Linux, and you’ll be hard pressed to find any difference between the way it works on FreeBSD and the way it works with Kubuntu. It’s only when it comes to system configuration that you’ll notice because FreeBSD doesn’t have any graphical configuration tools, which means if you need to change anything, you’ve got to be prepared to go back to the command line. But that’s another story.

See also: PC-BSD 8.0

If you’ve followed the main text to install a shiny new version of FreeBSD, you might have noticed that the install mechanism really wasn’t all that shiny or new. It reality, it feels ancient. But this doesn’t mean that the operating system has been languishing unloved and undeveloped, it just means that making the installer easier to use is low on the priority list.

Fortunately, this being open source, demand for a better way of doing things has led to several alternatives, the best of which is PC-BSD, which you’ll find at www.pcbsd.org. It does several impressive things. Firstly, it replaces the tepid monochrome installer of FreeBSD with a graphical application much-more in-line with its Linux counterparts. It will also automatically install and configure a recent version of the KDE desktop, which should mean you can get productive with a FreeBSD system as quickly as possible, without touching the command line, and there’s a wonderful wiki full of helpful documentation.

This means you can install PC-BSD by placing the disc in the drive, rebooting your machine, answer the questions that appear and wait for the operating system to install. You won’t even need to worry about manually partitioning your drive unless you want to create a custom configuration.

Another important difference is that it PC-BSD doesn’t use the same package management as FreeBSD, although you can still get to it if you need to. Instead, package are available as single files with the ‘.pbi’ file extension, which can then me installed with a simple click. It’s more like how packages are handled on OS X, and is far better than the weird world of dependencies you find on Linux.

May 11

When we need to select an algorithm for a particular purpose, we should pay attention to its runtime characteristics: how fast it is; how much memory it uses; whether there’s a worst case for the algorithm’s execution speed; and so on. All these answers are expressed with the big-Oh notation, which I’ll describe later.

A common abstract data structure that’s used all the time in programming is the dictionary or associative array, which is sometimes known as a map. I call it an abstract structure because it can be implemented in myriad different ways, but it always has a specific interface. We’ll use the dictionary to investigate the runtime efficiency of various algorithms that can be used to implement it.

OK, so it’s not this kind of dictionary. We’re referring to a digital one. An associative array.

But first, a definition: a dictionary is a structure that holds name-value pairs. A name-value pair is an object that has a name – that’s used both to describe its value and as a key to find it – and a value, which can be anything at all. The classic example is a real-world dictionary, where the name is a word and its value is the word’s definition. However, don’t limit yourself to assuming the name is always some kind of text string. In reality, names can be integer values, bit strings, 128-bit GUIDs, dates or anything at all. That said, it’s helpful to assume that they’re text strings for now.

The dictionary has various operations that define its external interface. There’s the ‘Create’ operation, which creates a new dictionary, and the ‘Destroy’ operation, which releases any resources the dictionary is using and destroys the structure. A dictionary can only be used after ‘Create’ has been called, and once ‘Destroy’ is executed, it no longer exists. Since these operations are only used once each per dictionary, they won’t have much effect on the overall runtime and so we won’t discuss them any further.

When given a name, ‘Find’ will search for the name-value pair that matches and return its value or an error if the name is not found. ‘Exists’ will do the same, except it will merely return true or false according to whether the name is present or not. Since they’re virtually identical, apart from what they return, we’ll ignore ‘Find’ from now on.

Finally we have ‘Insert’ and ‘Delete’, which do what you’d expect: add a new name-value pair to the dictionary (returning an error if the name already exists), and remove the name-value pair that matches a given name, respectively. In general, ‘Delete’ won’t return an error if the name is not found, and sometimes ‘Insert’ will merely replace the value if the name already exists.

Now that we have our abstract data structure, let’s investigate first how to implement it and second analyse the efficiency of our implementations. We’ll look at a total of four implementations.

Name-value pairs

The first implementation is the most obvious: use an array of name-value pairs. ‘Exists’ is the first operation to think about. In essence, to see whether the given name is present, you would check every pair in the dictionary sequentially and stop when you found it. If the given name isn’t present, you would compare the name of every name-value pair to the given name. The more pairs there are, the longer it would take, but you can be even more precise than that. Suppose there were N pairs in the dictionary and each comparison took the same (constant) length of time – say t. Then it would take tN time units to find out the given name wasn’t present. Another way of putting this is that the time taken for the nonexistence check is proportional to N. In computer science, without going into too much rigorous mathematics, we say the runtime efficiency is O(N), pronounced ‘big-Oh of N’, although you can read it as ‘is proportional to N’.

So if it took so many seconds to find out that a given name wasn’t in a dictionary of 1,000 pairs, it would take twice as long for a dictionary of 2,000 pairs, and 10 times as long for a dictionary of 10,000 pairs.

What if the given name was in the dictionary? What could we say then? Well, it could be that the matching pair was the first item checked. In that scenario, we say the best case efficiency for ‘Exists’ is O(1), which you read as ‘is constant’ (in other words, it doesn’t depend at all on the number of items in the dictionary). But, of course, for that to happen, you’d have to be extremely lucky. You could be completely unlucky and be looking for the final item. Here the worst case efficiency is O(N) – the time taken would be proportional to the number of items in the dictionary.

On average, though, if you searched for every name in the dictionary, the efficiency would be O(N/2). Now comes the fun bit with big-Oh notation: since it essentially means ‘is proportional to’, you can take the 1/2 (a constant) out of the parentheses into the implied proportionality constant and say that the efficiency is O(N). We say that searching through the dictionary-as-array is O(N): twice as many items, twice as long.

‘Insert’ is simple: we add the new name-value pair to the end of the array, a constant O(1) operation. Hold on there though – we first have to search the array to find out if the name is already present or not. ‘Insert’ then degenerates to O(N), just like ‘Exists’. We get no benefits at all from the constant, quick, add-it-to-the-end operation; we still have to search.

‘Delete’, as I’m sure you can see, is at least O(N) as well – we have to do the search. There’s something else about ‘Delete’ that we have to take into account: we have to physically remove the name-value pair from the array. The simplest way of doing this is to simply take the final pair in the array and put it in the slot vacated by the pair that was removed: a constant O(1) operation. So, overall, ‘Delete’ is O(N); the search time will swamp the move-an-item time.

Sorted pairs

Let’s move on to the second implementation. This one is again an array, except this time we maintain the pairs in sorted order. This has the assumed requirement that the names are sortable and that, given any two unequal names, we can say that the first is smaller or greater than the second.

We’ll start off by analysing ‘Exists’ again. The array is in sorted order, so we can use binary search to try and find the name-value pair that matches. With binary search, we look at the middle item in the array. If it’s the one we want, we stop. If the one we want is less than this middle item, we know that, if it’s present at all, it’ll be in the first half of the array. If the one we want is greater than the middle item, we know it will be in the second half. We repeat this process with the half array we selected. We’ll either find the item immediately again, or we’ll have reduced the number of items we have to search to a quarter of the array. Ditto the next step, except we reduce the space we have to search to an eighth of the original array. And so on.

Again, consider the doesn’t-exist case. Say we start out with an array with 1,023 items. After one step, we’ll have discarded one item and will have identified a subarray of 511 items for the next step. After this next step, we’ll have reduced the search space to 255 items, and so on. At the 10th step we’ll have a tiny array of just one item, which we can easily compare. So all in all, we’ll have made 10 comparisons to find out that the given name is not present. What’s so special about 10? Well, it’s the logarithm to base two of 1024 (that is, 2ˆ10 = 1024). Again, without being too rigorous mathematically, we say ‘Exists’ is O(logN) when the name isn’t present.

Think of O(logN) this way: if it takes a particular length of time to find out that a given name isn’t present in a sorted array of 1,000 items, it will only take twice as long for an array of 1,000,000 items. If you square the number of items, you double the time taken. This is an extremely significant result, showing the importance of binary search. What if the given name is present? We can make the same analysis as before: best case is O(1), worst case is going to be the same as not finding it: O(logN), and so we say that, overall, ‘Exists’ is O(logN).

What about ‘Insert’ and ‘Delete’? Again, we have to search for the name, so it would seem that they’re both O(logN). But this time, consider what we must do to add (or remove) the name-value pair. For ‘Insert’, we have to make a hole in the array to put the new pair in, shuffling all the items greater than it along by one. For ‘Delete’, we have to shuffle the remaining pairs to close up the hole vacated by the removed pair. If we’re lucky, in both cases, we don’t have to move any items (that is, best case is O(1)); if we’re unlucky we have to move all of the remaining pairs (that is, worst case is O(N)). On average, it’s O(N) for all the shuffling we need to do. Since O(N) is bigger than O(logN) – for very large values of N the (in)efficiency of the moving of the items will swamp the efficiency of the search – we ignore the smaller proportionality and just use the larger one. We say ‘Insert’ and ‘Delete’ are both O(N).

Hash table

Now for the next implementation: the hash table. Without going into full detail, we have an array as the basic data structure. Again, we analyse ‘Exists’ first. To find an item in a hash table, we hash the given name to produce an index into the array. The hash is produced by a randomising type function that takes the name, chops it up and combines the parts to produce an integer value. That integer value is then reduced to a possible array index value by use of the mod operator. The hash function is designed so that similar names produce very different hash values.

Best case is that ‘Exists’ is O(1). That is, we create the hash for the given name, convert it to an index, go to that element in the array, and the pair we need is there and matches. No matter how many items are in the array, that process is constant. (Actually, the hash function is usually O(k) where k is the length of the name, but we’re ignoring that for now.)

What about worst case? Well, in practice we’ll find that many names will hash to the same array index value. These are called collisions and we need to implement a collision resolution strategy to deal with them. The simplest is known as chaining, where we chain the name-value pairs as, say, a linked list at each array element. In this case, once we’ve calculated the index, we then do a sequential search through the chain at that index.

To ensure that the chain is never too long, hash tables grow themselves periodically when their load factor (the number of pairs present divided by the number of array elements) reaches a particular value. To do this, a new array is created, and all the pairs are rehashed and inserted into the new array. This ensures that chains never grow beyond a few items, say five or 10. Since this isn’t dependent on the total number of items, it’s still constant and we say ‘Exists’ in a hash table is O(1) on average.

‘Insert’ is a more difficult operation to analyse. On the face of it, it’s O(1) – both the ‘Search’ and ‘Add’ functions are constant time operations in general – but every now and then, a reorganisation will take place on an insertion operation. In general, hash tables are written such that they double in size when they grow. This is a O(N) operation, but we can amortise it over all previous insertion operations, so that, overall, ‘Insert’ remains O(1). Best case then is O(1), worst case is O(N), amortised case is O(1).

The same types of arguments can be made about ‘Delete’, although in general we tend not to shrink a hash table anywhere near as often as we make it bigger. ‘Delete’ is then O(1), meaning that the amortised use of a hash table over all its operations is O(1). There is, of course, still that warning that every now and then you will hit the O(N) worst case on an insertion.

Binary tree

The next data structure we can use is a balanced binary search tree, such as a red-black tree. This, like the sorted array version, makes the assumption that names can be sorted.

In a binary tree, the efficiency of search operations is O(d), where d is the maximum depth of the tree (the number of levels from the root of the tree to the furthest leaf). Since a perfectly balanced binary search tree is equivalent to binary search on a sorted array (every link you decide to follow will enable you to ignore a whole chunk of the tree), ‘Exists’ is on average O(logN). Best case is still O(1), but what about worst case? That depends on the algorithm used to balance the binary tree. Balancing is never perfect but, using red-black trees as an example, we can prove that they’re constructed such that the longest path is a maximum of twice the length of the shortest path. If you like, O(2logN). Since 2 is a constant, we can take it out, making red-black trees O(logN) in the worst case for ‘Exists’.

For ‘Insert’ and ‘Delete’, there’s a lot of mathematics that can prove that they’re both O(logN) as well. In essence, the search is O(logN), and the addition of the new node or removal of the old node is O(1) on average.

So, overall, a red-black tree is O(logN) in all its operations. Perhaps more importantly, it has guaranteed O(logN) time even in the worst case. This means that some people will prefer to use a red-black tree for their dictionary instead of a hash table because they don’t want to hit the possibility of O(N) insertion.

Figure 1: Graphing some common big-Oh expressions (O(N^2) is cut off so we can see the others).

From this discussion, you should now have a basic understanding of how to read and understand big-Oh expressions and how to evaluate algorithms and data structures based on them. Figure 1, above, illustrates the runtime for various common big-Oh expressions.

Radix trees

Radix trees offer a further data structure that can be used for a dictionary. A radix tree stores prefixes to keys rather than complete keys in its nodes, and each node can have many children. A key is then found as a complete path through the tree from root to leaf – at each step down the tree, you compare another small part of the name to the next node.

Figure 2, below, shows an example radix tree storing a small set of words. In searching for ‘hostess’, we follow the left link from the root, matching host, then follow the middle link matching the ‘e’ and finally matching the ‘ss’ in the right node.
Unlike the other data structures we’ve looked at, the efficiency of a radix tree doesn’t depend on the number of name-value pairs, but instead on the length of the keys. All operations are essentially O(k), where k is the maximum name length in the radix tree. This can be greater than the balanced binary tree’s O(logN), for example, but in practice we find that the comparisons needed in a binary tree are also significant, so the radix tree can be a viable alternative.

Figure 2: A small radix tree, using middle dot to indicate end of word.

Ternary trees

Back in issue 282, I cited ternary search trees as a strong candidate for the data structure behind a dictionary. Ternary trees, like radix trees, have a runtime efficiency that’s dictated by the length of the keys rather than their number, but are much easier to implement. Ternary search trees and radix trees also have a further benefit: using them means you can easily produce a sorted list of names in the dictionary, as well as produce a prefix list (a list of names with a particular prefix).

Profiling

All of the efficiency results quoted in this article are theoretical. They are all of the form ‘for large values of N the efficiency is proportional to some expression in N’, but make no mention of the size of the constant of proportionality. Therefore, when deciding on which data structure to use in your dictionary, you should profile actual code running on your actual data. It’s pointless worrying, for example, about the efficiency of millions of items in a dictionary when you’ll only have 100.

May 08

Every business, organization and government bodies are collecting large amount of data for research and development. Such huge database can make them to have the information on hand when required. But most important is that it takes much time to find important information from the data. “If you want to grow rapidly, you must take quick and accurate decisions to grab timely available opportunities.”

By applying the process of data mining, you can easily extract and filter required information from data. It is a processing of refining data and extracting important information. This process is mainly divided into 3 sections; pre-processing, mining and validation. In pre-processing, large amount of relevant data are collected. The mining section includes data classification, clustering, error correction and linking information. The last but important is validate without which you can not make trust on information. In short, data mining is a process of converting data into authentic information.

Let’s have look on how data mining is useful to companies.

Fast and Feasible Decisions: To search information from huge bundle of data require more time. It also irritates a person who is doing such. With annoyed mind one can not take accurate decisions that’s for sure. By having help of data mining, one can easily get information and make fast decisions. It also helps to compare information with various factors so the decisions become more reliable. Data mining is helpful in every decision to make it quick and feasible.

Powerful Strategies: After data mining, information becomes precise and easy to understand. While making strategies, one can easily analyze information in various dimensions. This analysis helps to get real idea about the strategy implementation. Management bodies can implement powerful strategies effectively to expand business boundaries.

Competitive Advantage: Information is easily available and precise so that one can compare it with competitors’ information. It is very much required that you must compare the data otherwise you will have to suffer in business. After doing competitive analysis, one can make corrective decisions to go ahead from competitors. This way company can gain competitive advantage.

Your business can get all the benefits of data mining at cutting rates through outsourcing.

Article Source: http://www.articlesnatch.com

About the Author:
Bea Arthur invites you on Data Entry India, which provides Data Entry Services, Data Conversion Services and Data Processing Services. They are having vast experience in data mining.