dd if=/dev/random of=/dev/blog

5. March 2010

AMD RAID-on-Chip: A valid technology? Or is it just too late in the game?

Filed under: RAID, Storage, File Systems, SCSI — admin @ 14:33

Back in December I just came across this article for an AMD RoC (RAID-on-Chip) that will be embedded into servers to provide uninterrupted RAID functionality. A quick question came to mind as I was reading this: “Considering today’s storage capabilities and low cost equipment, who will be using this?” And honestly I was not able to come up with an answer.

In an earlier blog post I had mentioned the rise in usage of software RAID. Small to Medium sized Business (SMB) have been running to these low cost solutions. And why not? You are able to get more bang for your buck. For instance, by running OpenSolaris, one is able to use the redundancy of the ZFS file system (with single/double parity or mirrored RAID), file system level snapshot, data deduplication, and more. On top of that, there is a checksum calculator to ensure that all data corruption (noisy and silent) are never a threat. Take these ZFS pools and share them via NFS/CIFS, over ftp/http to even mapping them over iSCSI, Fibre Channel, AoE or FCoE protocols. The operating system (with all bells and whistles) is freely distributed under the CDDL license. The only costs will be the hardware equipment (a server or two and if external storage is needed, a JBOD) and the storage administrator. For years, servers have been equipped with LSI Logic (or other) RAID controllers that have proven to be just as efficient as anything else to handle local storage. Now when you look at larger enterprise scale companies, they are not going to want a server to manage their RAID. Instead they will keep the external storage managed externally with special purpose RAID controllers managing hundreds of terabytes to petabytes of data storage and apart from all the nodes in a cluster accessing that equipment.

But going back to the server, how practical is it to have an implemented RoC? With today’s level of high speed computing, does it make that much of a difference if the RAID is accomplished on the chipset as opposed to the operating system? If so how easy is it to recover from data corruption or any other error? Unless you are setting up a small home or small business server, what if you wanted additional functionality such as snapshots, data deduplication and checksum validation? You still have to go to the operating system and have some sort of volume manager on top of the RoC grouped volumes. No offense to Dot Hill even though they were a direct competitor to one of my previous employers (Xyratex). According to their numbers posted on Google Finance, financially they have been struggling for at least the past 5 to 6 years and this is a great opportunity for them. Although it is in my opinion that this would have been a valid technology back in 2001 and not 2010.

26. November 2009

Linux Magazine Article: Three Simple Tweaks for Better SSD Performance

Filed under: Storage, Red Hat, File Systems, Ubuntu, Linux — admin @ 13:23

Earlier today I came across this interesting article on tuning your SSD drive to achieve greater performance. It is worth noting that this article is intended for Linux and when it mentions setting your file systems mount options with noatime, this too is relevant for file systems that support such an option.

I would also take the time to read the comments. There are some distribution specific responses to the author’s notes.

3. November 2009

Recently integrated into ZFS: Data Deduplication

Filed under: OpenSolaris, Storage, File Systems, Solaris, UNIX — admin @ 09:26

I just stumbled onto this blog entry on the implementation of data deduplication into the Sun Microsystem’s ZFS file system. It is implemented in such a nice and clean way, I am looking forward to testing it. For instance, just like any other feature of the ZFS file system, data dedup can be enabled disabled at any path from the ZFS root mount point. Examples taken from Jeff Bonwick’s blog post cited above:

zfs set dedup=on tank
zfs set dedup=off tank/home
zfs set dedup=on tank/vm
zfs set dedup=on tank/src

It is that simple (man 1 zfs).

27. October 2009

Apple discontinues port of Sun’s ZFS file system.

Filed under: OpenSolaris, BSD, File Systems, Solaris, UNIX — admin @ 14:22

On 23 October, 2009 it was announced on MacOSForge that Apple had decided to discontinue any and all development on the porting of the ZFS file system. I know that I am not the only one to say this but I am not surprised. Supposedly there were legal reasons behind this action but in the end, who cares? They are the ones losing out to continue with an out dated and still limiting file system.

Now Apple has recently been hiring file system developers to develop a next generation file system to replace the traditional HFS+ but (as Robin Harris has previously stated) how long will it take before it becomes stable and accepted by the general public? Traditionally it takes 5+ years before a file system is considered somewhat stable and ready for production use. It wasn’t until recently that ZFS was starting to make its impact in the enterprise scene. Though my question is, to whom will this next generation file system cater to? I am to assume that it will be for the general end user utilizing Mac devices that “don’t require the weight of the ZFS features and functionality” ; or so it has been said regarding the topic of Apple abandoning the ZFS project. If that is the case and is the primary focus of the new file system, how will this impact their server market share? We already know that there is no such thing as a perfect file system that will perform ideally in every arena it is thrown into. Some will excel more than others and is entirely dependent on its implementation and workload.

In past posts, I have always stressed the importance of the file system and what is integrated within the file system. I routinely point out the numerous drawbacks and limitations of the NTFS driver. Sure, Microsoft compensates for the “lack of features” with applications, services and additional APIs to fill in all those gaps. A good example is VSS (shadow copy). This can impact performance as it is taking file system concepts out from kernel mode and into user land and consuming user mode resources. All these feature should and need to be incorporated into the file system driver. That way we can ensure that there is stability and consistency with all functions the file system performs. Even the general layout is not ideal for traditional computing over large storage media; as the fragmented large seeks between the MFT and the file data can put a lot of stress on the magnetic device. Going back to HFS+ and sort of on the same topic (although the concept is a bit different), the same could be said about Apple’s Time Machine and it running as an application on top of the driver.

One thing that I hold to heart when it comes to file systems is the ability and flexibility to tune it even without taking the mounted device(s) off-line. Most modern UNIX and Linux file systems offer a lot of tunable features (built into the driver!). For instance (through the ZFS character device node) I can dynamically alter file system variables (man 1 zfs). For this example I will focus on access times. Let us say I am using an SSD and decide that it would be more cell friendly and better performing to disable file access times on the root mount.

atime=on | off
Controls whether the access time for  files  is  updated
when  they  are  read.

To view current settings and disable this feature you would type the following in the command-line terminal:

petros@opensolaris:~$ pfexec zfs get atime rpool/export/home
NAME               PROPERTY  VALUE  SOURCE
rpool/export/home  atime     on     default
petros@opensolaris:~$ pfexec zfs set atime=off rpool/export/home
petros@opensolaris:~$ pfexec zfs get atime rpool/export/home
NAME               PROPERTY  VALUE  SOURCE
rpool/export/home  atime     off    local

I just hope that Apple is prepared for the journey they are about to embark on. They obviously have file system development experience, and I have no doubts that they have the talent. Do they have the patience and time to invest?

8. October 2009

FlexTk article: NAS Performance Comparison

Filed under: Red Hat, Storage, OpenSolaris, File Systems, Ubuntu, Microsoft, Linux, UNIX — admin @ 14:11

Linked from linuxtoday.com, I found an interesting article posted on FlexTk regarding NAS Performance Comparisons between Linux, Windows and OpenSolaris. The results are very interesting. Under each category, comparisons are drawn between:

  • Red Hat Enterprise Linux 5.3 (64-bit)
  • Ubuntu Server 9.04 (64-bit)
  • OpenSolaris 2009.06 (64-bit)
  • Windows Server 2003 (64-bit)
  • Windows Server 2008 (64-bit)
  • Windows Storage Server 2008 (64-bit)

I assume that each operating system is utilizing the default file systems with default settings for that specific release. Red Hat and Ubuntu should be using Ext3-fs, Windows obviously uses NTFS while OpenSolaris is built on top of ZFS. The CIFS/NFS exported share(s) in turn are running on top of these defaulted file systems. Either way, with average overall performance, OpenSolaris seemed to really shine. It also did well in some of the other categories which made sense when knowing the design of the ZFS file system.

2. October 2009

LWN article: Log-structured file systems: There’s one in every SSD

Filed under: Storage, File Systems, Linux, UNIX — admin @ 08:39

Yesterday I came across this excellent article on log-structured file systems and their implementation on SSD technologies. It is worth the read.

Opinion: On pramfs and RAM based Linux file systems

Filed under: Storage, File Systems, Linux — admin @ 08:36

A few days ago I received the latest issue of Linux Journal Magazine. I must admit that one of the sections I look forward to reading is diff -u. This section summarizes the latest updates and discussions of the Linux kernel development community. It becomes much easier to read a summary as opposed to signing up for the mailing list because you will just get bombarded with e-mails which can be overwhelming the majority of the time.

While reading I came across a Montavista developed project called pramfs. In summary pramfs is a non-volatile RAM based file system, similar to your ramfs and tmpfs with a few differences to distinguish it from the others and in turn adapted for an embedded environment. Two obvious differences are that it is persistent like a traditional disk-based file system and does not reside in volatile DRAM. Pramfs is not new. It was originally announced back in 2004. It is designed to be a simplified file system that does not carry the same weight of the journal-based file systems.

Apparently there had been some problems with the patch being merged into the Linux kernel for a number of reasons. (1) Montavista was attempting to patent some of the concepts and algorithms used in the file system (in 2004) and (2) even after the dropped the idea of patenting their code, there was some discussion on the redundancy of having yet another file system implemented into the Linux kernel (in 2009). What that means, is that the Linux kernel already has two commonly used RAM file systems and a large number of other file systems. So why was there a need to write another one? Why couldn’t Montavista patch already existing code? (3) It is also not a full featured file system in that it does not support symbolic links.

I agree with this logic. Please do not misunderstand me. Montavista is a very respectable company that has done an excellent job in supporting embedded Linux. I am also glad to see them contribute to the kernel and in turn the community. But truth be told, tmpfs was build on top of the ramfs code. Why couldn’t pramfs follow the same course of development. The GPL makes it easy to not have to re-invent the wheel.

The two most noteworthy goals achieved for pramfs (1) is to work with NVRAM and (2) provide and interface that does not utilize the kernel page caching mechanism. By utilizing the DIRECTIO flag available in the 2.6 kernel, Montavista claims that I/O performance is increased significantly to an already high performing interface. Pramfs also allows the user to specify regions of memory for file system usage.

mount -t pramfs -o physaddr=0x1e000000,init=0x2000000 none /mnt/pramfs

With it working in non-volatile memory, the data contents will remain intact even after an expected/unexpected power cycle.

This concept got me thinking a bit. How difficult would it be to add some of these features in Ramfs? Ramfs offer some similar functionality as in it does not use the kernel’s page cache for file I/O.  Tmpfs was designed to offer that functionality along with additional file system control and limitations. Ramfs also has a slightly similar general file system layout. Sure a few structures and routines need to be redefined but that isn’t a big deal in the grand scheme of things.

I mention this in the light of some of the latest headlines circulating through the internet regarding Linux Torvalds’ comments on the kernel being bloated. Does the kernel leave room for additional “bloat” or would it be wiser to add on top of current features/functionality? I would love to read some of your opinions.

For more blog posts relating to RAM-based file systems and RAM Disk device drivers, you can find them posted here, here and here.

4. September 2009

IBM Article: Anatomy of the Linux virtual file system switch

Filed under: File Systems, Linux — admin @ 08:56

Two days ago, posted on one of the Linux news feeds that I usually frequent I saw this interesting article which serves as a great basic introduction to the VFS layer of the Linux kernel. For those interested in Linux file systems, check it out.

26. August 2009

Some exciting updates expected for Linux kernel 2.6.31

Filed under: Storage, File Systems, SCSI, Linux — admin @ 11:03

Recently I came across this article on h-online.com discussing some of the new features and functionality that is to be expected in the 2.6.31 Linux kernel. As I am usually more interested in data storage technologies, it was the file system and other storage concepts that drew my attention. I will only cover a few of the listed topics. You can read a full list of these patches provided in the h-online link I posted above.

Some updates include a large patch for the btrfs file system which tunes the file system to achieve greater performance. It is also noted that in this release btrfs will be less memory hungry and the SSD mode has been improved. Early benchmarks comparing both standard and SSD modes have shown the early implementation of SSD mode to be less than ideal. I am interested to see this improvement, especially as  Flash-based SSDs increase in usage and popularity.

During the development of btrfs I have been spending more time on observing the development process as opposed to taking it for early test spins. So when I make the following comment(s), I am not speaking directly from experience and if I make any errors in my statement(s), I hope the reader will correct me. As we are still early in the development stage and it is still too soon to tell, I wonder if btrfs will offer tuning with on-line volumes (as can be seen with ZFS). Most (if not all) modern Linux file systems are only capabable of processing file system options during mount time and in some cases with the remount option when invoking mount again. For example, in ZFS a character device node is available for management applications which are capable of pulling real time data and altering file system options on-the-fly. Here is a document with the diagram (reference page 10; sorry, I could only find a German copy of it; explanation found in last bullet point of section). If I wanted to disable/enable atime, compression, checksums, or alter quotas to both the entire storage pool and/or specific mounted volumes, I can do so on the fly with a simple zpool/zfs command. I am curious as to if btrfs  will implement a similar feature which can be extremely advantageous in storage administration environments.

Other patches include support for ext4 online defragmentation. I am surprised to see that ext4 is really starting to gain some grounds. Fedora currently implements it as the default file system in their latest release while Ubuntu provides it as an option during installation. It usually takes a while for a new file system to gain public trust and support.

Some other exciting patches include Fibre Channel Pass-Through support. I am curious to learn more about this functionality and if there is any relation (in functionality) to the SCST project hosted on SourceForge.

30. July 2009

OpenSolaris: GRUB and the Boot Environment

Filed under: OpenSolaris, File Systems, Solaris — admin @ 12:26

Ever since I started working with OpenSolaris (release 2008.05 to build 118: 2010.02), I have been suffering through some of the longer load times. While the distribution is maturing fairly well and quick, the boot times are just horrible. And to my understanding the culprit is ZFS. OpenSolaris utilizes ZFS as its default file system. On top of that, one thing that I still cannot understand is why GRUB defaults its timeout value to 60 seconds. 60 seconds! Why!?! Who needs this 60 seconds and/or who wants to be constantly annoyed to hit enter to the default kernel image, initiating the boot process? Either way, this can be modified. On OpenSolaris, editing the GRUB boot options is a little different from your traditional UNIX/Linux operating system. Note that this article is for Intel architectures and not SPARC.

Traditionally we find the appropriate files for editing in the /boot path, specifically in /boot/grub; and depending on your distribution the configuration file can vary (grub.conf or menu.lst). In OpenSolaris and on the ZFS file system, while the /boot/grub/ path exists, it does not contain the menu.lst file that we need. Instead, it is located in the /rpool/boot/grub/ path.

When you really start using Solaris/OpenSolaris, you may notice one thing that sticks out when compared to the GNU/Linux counterpart; and that is Solaris/OpenSolaris tends to be better polished when it comes to using the command line for editing system configuration files. For example, two separate tools exist for managing boot configuration files and the boot environment: bootadm and beadm. I know that certain Linux distributions have their own sets of tools for managing such stuff (i.e. QGRUBEditor as I have seen in Ubuntu; among others) but when I hop back onto a Solaris machine, it just seems simpler and a lot more straight forward. It is also standardized across both operating platforms as opposed to each distribution having their own. Historically I have always opened up the menu.lst or grub.conf file with vim and made my modifications right there. While this can still be done, the development team behind Solaris/OpenSolaris have decided to standardize it within the two applications.

bootadm

As mentioned earlier, bootadm is used to list and/or redefine specific values in your menu.lst file. Its usage is as follows (man bootadm):

#petros@opensolaris:~$ bootadm
bootadm: a command option must be specified
USAGE:
bootadm update-archive [-vn] [-R altroot [-p platform>]]
bootadm list-archive [-R altroot [-p platform>]]
bootadm set-menu [-R altroot] key=value
bootadm list-menu [-R altroot]

If I wanted to list my current configuration I would type at the command line:

petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 3
0 Solaris Development snv_118 X86

I can easily modify a parameter such as the timeout with the following command:

petros@opensolaris:~$ pfexec bootadm set-menu timeout=2
petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 2
0 Solaris Development snv_118 X86

beadm

The beadm tool is used to create and enable new boot environments. What beadm can do is take a snapshot of your current environment. This routinely occurs (transparent to the user) after a system update. Usually these snapshots should be made when applications are installed/removed to even when configuration files are modified. It will then append the listing into the menu.lst file. This way, if the new image ends up bringing down the system, you can revert back to the previous image (snapshot). Such are some advantages when the ZFS file system incorporates its own native snapshot mechanism. Basic usage for this utility is extremely simple (man beadm).

petros@opensolaris:~$ beadm

Usage:
beadm subcommand cmd_options

subcommands:
beadm activate beName
beadm create [-a] [-d description]
[-e non-activeBeName | beName@snapshot]
[-o property=value] ... [-p zpool] beName
beadm create beName@snapshot
beadm destroy [-fF] beName | beName@snapshot
beadm list [[-a] | [-d] [-s]] [-H] [beName]
beadm mount beName mountpoint
beadm rename beName newBeName
beadm unmount [-f] beName

To list all boot environments you would type the following on the command line:

petros@opensolaris:~$ beadm list
BE          Active Mountpoint Space Policy Created
--          ------ ---------- ----- ------ -------
opensolaris NR     /          6.10G static 2009-02-18 08:35

Let us say you made some changes to a configuration file or two or maybe install some applications or enable/disable services. You may want to create a new image so that if someone was wrong with the image, you can always revert back to the previous.

petros@opensolaris:~$ pfexec beadm create 22Feb09
petros@opensolaris:~$ beadm list
BE          Active Mountpoint Space Policy Created
--          ------ ---------- ----- ------ -------
22Feb09     -      -          92.0K static 2009-02-22 11:54
opensolaris NR     /          6.10G static 2009-02-18 08:35

A new entry is created in GRUB, although the older image is still the default.

petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 2
0 Solaris Development snv_118 X86
1 22Feb09

To activate the new image and have it default in GRUB, you can invoke beadm as so:

petros@opensolaris:~$ pfexec beadm activate 22Feb09
petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 1
timeout 2
0 Solaris Development snv_118 X86
1 22Feb09

After reboot beadm list will look like this:

petros@opensolaris:~$ beadm list
BE          Active Mountpoint Space Policy Created
--          ------ ---------- ----- ------ -------
22Feb09     NR     /          6.24G static 2009-02-22 11:54
opensolaris -      -          5.25M static 2009-02-18 08:35
Next Page »

Powered by WordPress