dd if=/dev/random of=/dev/blog

2. October 2009

Opinion: On pramfs and RAM based Linux file systems

Filed under: Storage, File Systems, Linux — admin @ 08:36

A few days ago I received the latest issue of Linux Journal Magazine. I must admit that one of the sections I look forward to reading is diff -u. This section summarizes the latest updates and discussions of the Linux kernel development community. It becomes much easier to read a summary as opposed to signing up for the mailing list because you will just get bombarded with e-mails which can be overwhelming the majority of the time.

While reading I came across a Montavista developed project called pramfs. In summary pramfs is a non-volatile RAM based file system, similar to your ramfs and tmpfs with a few differences to distinguish it from the others and in turn adapted for an embedded environment. Two obvious differences are that it is persistent like a traditional disk-based file system and does not reside in volatile DRAM. Pramfs is not new. It was originally announced back in 2004. It is designed to be a simplified file system that does not carry the same weight of the journal-based file systems.

Apparently there had been some problems with the patch being merged into the Linux kernel for a number of reasons. (1) Montavista was attempting to patent some of the concepts and algorithms used in the file system (in 2004) and (2) even after the dropped the idea of patenting their code, there was some discussion on the redundancy of having yet another file system implemented into the Linux kernel (in 2009). What that means, is that the Linux kernel already has two commonly used RAM file systems and a large number of other file systems. So why was there a need to write another one? Why couldn’t Montavista patch already existing code? (3) It is also not a full featured file system in that it does not support symbolic links.

I agree with this logic. Please do not misunderstand me. Montavista is a very respectable company that has done an excellent job in supporting embedded Linux. I am also glad to see them contribute to the kernel and in turn the community. But truth be told, tmpfs was build on top of the ramfs code. Why couldn’t pramfs follow the same course of development. The GPL makes it easy to not have to re-invent the wheel.

The two most noteworthy goals achieved for pramfs (1) is to work with NVRAM and (2) provide and interface that does not utilize the kernel page caching mechanism. By utilizing the DIRECTIO flag available in the 2.6 kernel, Montavista claims that I/O performance is increased significantly to an already high performing interface. Pramfs also allows the user to specify regions of memory for file system usage.

mount -t pramfs -o physaddr=0x1e000000,init=0x2000000 none /mnt/pramfs

With it working in non-volatile memory, the data contents will remain intact even after an expected/unexpected power cycle.

This concept got me thinking a bit. How difficult would it be to add some of these features in Ramfs? Ramfs offer some similar functionality as in it does not use the kernel’s page cache for file I/O.  Tmpfs was designed to offer that functionality along with additional file system control and limitations. Ramfs also has a slightly similar general file system layout. Sure a few structures and routines need to be redefined but that isn’t a big deal in the grand scheme of things.

I mention this in the light of some of the latest headlines circulating through the internet regarding Linux Torvalds’ comments on the kernel being bloated. Does the kernel leave room for additional “bloat” or would it be wiser to add on top of current features/functionality? I would love to read some of your opinions.

For more blog posts relating to RAM-based file systems and RAM Disk device drivers, you can find them posted here, here and here.

18. September 2009

Enterprise Storage Forum Article: RAID’s Days May Be Numbered

Filed under: Storage, SCSI — admin @ 07:50

Earlier this morning I ran into an interesting article written by Henry Newman, “RAID’s Days May Be Numbered.” While Mr. Newman highlights different reasons for his prediction, I tend to feel the same way. It is worth the read.

4. September 2009

IBM Article: Anatomy of the Linux virtual file system switch

Filed under: File Systems, Linux — admin @ 08:56

Two days ago, posted on one of the Linux news feeds that I usually frequent I saw this interesting article which serves as a great basic introduction to the VFS layer of the Linux kernel. For those interested in Linux file systems, check it out.

26. August 2009

Some exciting updates expected for Linux kernel 2.6.31

Filed under: Storage, File Systems, SCSI, Linux — admin @ 11:03

Recently I came across this article on h-online.com discussing some of the new features and functionality that is to be expected in the 2.6.31 Linux kernel. As I am usually more interested in data storage technologies, it was the file system and other storage concepts that drew my attention. I will only cover a few of the listed topics. You can read a full list of these patches provided in the h-online link I posted above.

Some updates include a large patch for the btrfs file system which tunes the file system to achieve greater performance. It is also noted that in this release btrfs will be less memory hungry and the SSD mode has been improved. Early benchmarks comparing both standard and SSD modes have shown the early implementation of SSD mode to be less than ideal. I am interested to see this improvement, especially as  Flash-based SSDs increase in usage and popularity.

During the development of btrfs I have been spending more time on observing the development process as opposed to taking it for early test spins. So when I make the following comment(s), I am not speaking directly from experience and if I make any errors in my statement(s), I hope the reader will correct me. As we are still early in the development stage and it is still too soon to tell, I wonder if btrfs will offer tuning with on-line volumes (as can be seen with ZFS). Most (if not all) modern Linux file systems are only capabable of processing file system options during mount time and in some cases with the remount option when invoking mount again. For example, in ZFS a character device node is available for management applications which are capable of pulling real time data and altering file system options on-the-fly. Here is a document with the diagram (reference page 10; sorry, I could only find a German copy of it; explanation found in last bullet point of section). If I wanted to disable/enable atime, compression, checksums, or alter quotas to both the entire storage pool and/or specific mounted volumes, I can do so on the fly with a simple zpool/zfs command. I am curious as to if btrfs  will implement a similar feature which can be extremely advantageous in storage administration environments.

Other patches include support for ext4 online defragmentation. I am surprised to see that ext4 is really starting to gain some grounds. Fedora currently implements it as the default file system in their latest release while Ubuntu provides it as an option during installation. It usually takes a while for a new file system to gain public trust and support.

Some other exciting patches include Fibre Channel Pass-Through support. I am curious to learn more about this functionality and if there is any relation (in functionality) to the SCST project hosted on SourceForge.

30. July 2009

OpenSolaris: GRUB and the Boot Environment

Filed under: OpenSolaris, File Systems, Solaris — admin @ 12:26

Ever since I started working with OpenSolaris (release 2008.05 to build 118: 2010.02), I have been suffering through some of the longer load times. While the distribution is maturing fairly well and quick, the boot times are just horrible. And to my understanding the culprit is ZFS. OpenSolaris utilizes ZFS as its default file system. On top of that, one thing that I still cannot understand is why GRUB defaults its timeout value to 60 seconds. 60 seconds! Why!?! Who needs this 60 seconds and/or who wants to be constantly annoyed to hit enter to the default kernel image, initiating the boot process? Either way, this can be modified. On OpenSolaris, editing the GRUB boot options is a little different from your traditional UNIX/Linux operating system. Note that this article is for Intel architectures and not SPARC.

Traditionally we find the appropriate files for editing in the /boot path, specifically in /boot/grub; and depending on your distribution the configuration file can vary (grub.conf or menu.lst). In OpenSolaris and on the ZFS file system, while the /boot/grub/ path exists, it does not contain the menu.lst file that we need. Instead, it is located in the /rpool/boot/grub/ path.

When you really start using Solaris/OpenSolaris, you may notice one thing that sticks out when compared to the GNU/Linux counterpart; and that is Solaris/OpenSolaris tends to be better polished when it comes to using the command line for editing system configuration files. For example, two separate tools exist for managing boot configuration files and the boot environment: bootadm and beadm. I know that certain Linux distributions have their own sets of tools for managing such stuff (i.e. QGRUBEditor as I have seen in Ubuntu; among others) but when I hop back onto a Solaris machine, it just seems simpler and a lot more straight forward. It is also standardized across both operating platforms as opposed to each distribution having their own. Historically I have always opened up the menu.lst or grub.conf file with vim and made my modifications right there. While this can still be done, the development team behind Solaris/OpenSolaris have decided to standardize it within the two applications.

bootadm

As mentioned earlier, bootadm is used to list and/or redefine specific values in your menu.lst file. Its usage is as follows (man bootadm):

#petros@opensolaris:~$ bootadm
bootadm: a command option must be specified
USAGE:
bootadm update-archive [-vn] [-R altroot [-p platform>]]
bootadm list-archive [-R altroot [-p platform>]]
bootadm set-menu [-R altroot] key=value
bootadm list-menu [-R altroot]

If I wanted to list my current configuration I would type at the command line:

petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 3
0 Solaris Development snv_118 X86

I can easily modify a parameter such as the timeout with the following command:

petros@opensolaris:~$ pfexec bootadm set-menu timeout=2
petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 2
0 Solaris Development snv_118 X86

beadm

The beadm tool is used to create and enable new boot environments. What beadm can do is take a snapshot of your current environment. This routinely occurs (transparent to the user) after a system update. Usually these snapshots should be made when applications are installed/removed to even when configuration files are modified. It will then append the listing into the menu.lst file. This way, if the new image ends up bringing down the system, you can revert back to the previous image (snapshot). Such are some advantages when the ZFS file system incorporates its own native snapshot mechanism. Basic usage for this utility is extremely simple (man beadm).

petros@opensolaris:~$ beadm

Usage:
beadm subcommand cmd_options

subcommands:
beadm activate beName
beadm create [-a] [-d description]
[-e non-activeBeName | beName@snapshot]
[-o property=value] ... [-p zpool] beName
beadm create beName@snapshot
beadm destroy [-fF] beName | beName@snapshot
beadm list [[-a] | [-d] [-s]] [-H] [beName]
beadm mount beName mountpoint
beadm rename beName newBeName
beadm unmount [-f] beName

To list all boot environments you would type the following on the command line:

petros@opensolaris:~$ beadm list
BE          Active Mountpoint Space Policy Created
--          ------ ---------- ----- ------ -------
opensolaris NR     /          6.10G static 2009-02-18 08:35

Let us say you made some changes to a configuration file or two or maybe install some applications or enable/disable services. You may want to create a new image so that if someone was wrong with the image, you can always revert back to the previous.

petros@opensolaris:~$ pfexec beadm create 22Feb09
petros@opensolaris:~$ beadm list
BE          Active Mountpoint Space Policy Created
--          ------ ---------- ----- ------ -------
22Feb09     -      -          92.0K static 2009-02-22 11:54
opensolaris NR     /          6.10G static 2009-02-18 08:35

A new entry is created in GRUB, although the older image is still the default.

petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 2
0 Solaris Development snv_118 X86
1 22Feb09

To activate the new image and have it default in GRUB, you can invoke beadm as so:

petros@opensolaris:~$ pfexec beadm activate 22Feb09
petros@opensolaris:~$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 1
timeout 2
0 Solaris Development snv_118 X86
1 22Feb09

After reboot beadm list will look like this:

petros@opensolaris:~$ beadm list
BE          Active Mountpoint Space Policy Created
--          ------ ---------- ----- ------ -------
22Feb09     NR     /          6.24G static 2009-02-22 11:54
opensolaris -      -          5.25M static 2009-02-18 08:35

22. July 2009

Playing with RAM disks on OpenSolaris 2009.06

Filed under: OpenSolaris, Storage, File Systems, Solaris — admin @ 11:11

After writing my article on The Linux RAM Disk for Linux+ Magazine and also after writing a very generic Linux RAM disk block device module, I decided to play around with the concept of RAM disks on OpenSolaris 2009.06. I must admit that this was actually a very great learning experience. One that I wish to share with the reader. Note that this post will be separated into two section: (2) tmpfs and (3) ramdiskadm.

TMPFS

While the tmpfs module exists across multiple operating systems, including Linux, the Solaris/OpenSolaris version does differ quite a bit its Linux counterpart. It is also not as flexible as the one found in Linux. For instance, on the Linux version you have the following supported user-defined module parameters (for more details, please reference the Documentation/filesystems/tmpfs.txt file in the Linux kernel source tree):

  • size - limit of allocated bytes for the size of the volume
  • mode - volume permission (once mounted)
  • nr_blocks - same as in size, but in blocks
  • nr_inodes - maximum number of inodes

The Solaris/OpenSolaris version only defines size and when you list the mounted device with the df command, it labels it as a swap. When it comes to volume permissions, you can always work around this (read below). In all cases, the advantages to utilizing tmpfs lie in the fact that it works off of virtual memory and can swap to disk when necessary. It runs like a normal file system, but when the power goes out or when the mounted volume is unmounted, all data contents disappear; as is the case with any RAM disk and no method of synchronization employed. By default, a normal installation of Solaris/OpenSolaris will use tmpfs for various computing needs such as mounting the /tmp directory with a tmpfs volume. This module can also be used to help ease a user’s computing experience and security, for example by optimizing Firefox and instead of having cache all data contents to the physical hard drive, create and have it write all necessary content to a tmpfs mounted device. The file system can also serve well in an area where constant database queries or other web services are being cached and routinely accessed. These are a few of many scenarios in which this can be used in.

Despite the few differences between each operating system, the tmpfs module is still easy to work with. Once a directory for the mount point is created, you can then mount the tmpfs RAM-based volume:

petros@opensolaris:~$ pfexec mkdir /mnt/rdisk
petros@opensolaris:~$ pfexec mount -F tmpfs -o size=96m tmpfs /mnt/rdisk

You can even configure the /etc/vfstab to automount the tmpfs volume at bootup:

#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
/devices        -               /devices        devfs   -       no      -
/proc           -               /proc           proc    -       no      -
ctfs            -               /system/contract ctfs   -       no      -
objfs           -               /system/object  objfs   -       no      -
sharefs         -               /etc/dfs/sharetab       sharefs -       no      -
fd              -               /dev/fd         fd      -       no      -
swap            -               /tmp            tmpfs   -       yes     -
/dev/zvol/dsk/rpool/swap        -               -               swap    -       no      -
swap            -               /mnt/rdisk      tmpfs   -       yes     size=96m

The only major drawback is that at this point only root or someone with superuser permissions (i.e. sudo/pfexec) can work with the volume or change the mount point permissions so that other users can access it. If you desire have these permissions altered at bootup, it may be advantageous to automate it in a startup script. If it doesn’t already exist, you can create it. I am referring to the /etc/rc3.d/S99local file. Some, if not all of you may already be familiar with the concept of run levels and if you are coming from a Linux environment, you may realize that the run levels are not the same between Linux and Solaris. But, this is a topic for another day. Notice under the field of startup, how I use pfexec (similar to sudo) to change the permission of the tmpfs mount. Also note that you can skip the vfstab step shown earlier and just mount the file system within the same script file.

#!/bin/bash
if [ $? -ne 0 ]; then
exit 0;
fi
case "$1" in
'start')
pfexec chmod 777 /mnt/rdisk
;;
'stop')
;;
*)
echo "Usage: $0 { start | stop }"
exit 1
;;
esac
exit 0

As can be seen, tmpfs is very easy to work with and can be applied to many things to bring forth many advantages and additional securities (a result of having the contents disappear at power down of the system). If one wanted to employ some basic method of synchronization, a script can be called during scheduled periods to perform an rsync to another mount point.

RAMDISKADM

I find this module to be an excellent tool, capable of being utilized in both production use or for teaching purposes. To utilize the ramdisk module you need to invoke the ramdiskadm command on the command line. For example, let us say I want to create a memory-based volume 100 MB in size, I would type:

petros@opensolaris:~$ pfexec ramdiskadm -a ramdisk1 100m

A device node would be created at /dev/ramdisk/ramdisk1. You can format this node with a file system and mount it locally to even exporting it as an NFS share; or configure it as an iSCSI target.

To destroy the RAM disk, you would need to type:

petros@opensolaris:~$ pfexec ramdiskadm -d ramdisk1

If you are interested, you can even create multiple ramdisks and pool them in a ZFS volume.  An advantage to utilizing this approach comes from the checksum feature to prevent data corruption of contents stored in volatile memory and also the ability to grow the volume and add it to the RAM-based pool.

petros@opensolaris:~$ pfexec ramdiskadm -a ramdisk1 100m
/dev/ramdisk/ramdisk1
petros@opensolaris:~$ pfexec ramdiskadm -a ramdisk2 100m
/dev/ramdisk/ramdisk2
petros@opensolaris:~$ pfexec zpool create rampool mirror /dev/ramdisk/ramdisk1 /dev/ramdisk/ramdisk2

I now have a zfs pool mirroring two 100 MB ramdisks. Obviously there are no real advantages to mirroring two RAM disks but this just serves as an example. I can check the status of the rampool device with the following command:

petros@opensolaris:~$ zpool status
pool: rampool
state: ONLINE
scrub: none requested
config:
NAME                       STATE     READ WRITE CKSUM
rampool                    ONLINE       0     0     0
mirror                   ONLINE       0     0     0
/dev/ramdisk/ramdisk1  ONLINE       0     0     0
/dev/ramdisk/ramdisk2  ONLINE       0     0     0
errors: No known data errors

Listing the zfs device will provide the following information (note that I cropped out the unrelated material):

petros@opensolaris:~$ zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rampool                   71.5K  63.4M    19K  /rampool

To destroy the rampool I can invoke the following on the command line:

pfexec zpool destroy rampool

Let us create a RAIDZ volume:

petros@opensolaris:~$ pfexec ramdiskadm -a ramdisk1 64m
/dev/ramdisk/ramdisk1
petros@opensolaris:~$ pfexec ramdiskadm -a ramdisk2 64m
/dev/ramdisk/ramdisk2
petros@opensolaris:~$ pfexec ramdiskadm -a ramdisk3 64m
/dev/ramdisk/ramdisk3
petros@opensolaris:~$ pfexec zpool create rampool raidz /dev/ramdisk/ramdisk1 /dev/ramdisk/ramdisk2

zpool status gives us:

pool: rampool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rampool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/dev/ramdisk/ramdisk1 ONLINE 0 0 0
/dev/ramdisk/ramdisk2 ONLINE 0 0 0
errors: No known data errors

A zfs list will show:

petros@opensolaris:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
rampool 70K 27.4M 19K /rampool

I now want to add another device to the rampool. Notice the differences when I list the zpool status and list the zfs device.

petros@opensolaris:~$ pfexec zpool add -f rampool /dev/ramdisk/ramdisk3
petros@opensolaris:~$ zpool status
pool: rampool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rampool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/dev/ramdisk/ramdisk1 ONLINE 0 0 0
/dev/ramdisk/ramdisk2 ONLINE 0 0 0
/dev/ramdisk/ramdisk3 ONLINE 0 0 0
errors: No known data errors
petros@opensolaris:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
rampool 73K 86.9M 19K /rampool

CONCLUSION

As one can see, working with RAM disks on Solaris/OpenSolaris is fairly simple and can be configured in just a couple of steps. Who knows, you may find situations in your current environment which may benefit from the use of a RAM disk. Note - Do not forget to destroy the zpool and the RAM disks when not in use. The last thing you want to do is waste much needed memory.

15. July 2009

Opinion: On the Future of Data Storage and RAID Technologies

Filed under: File Systems, Solaris, SCSI, Linux, Microsoft — admin @ 13:44

Please note that this is only a personal opinion of mine as I have been observing the growth and various decline of storage concepts within the data storage industry. The views of the reader may differ from my own which is why I would invite you to please post your opinions as a comment to this post.

One of the most volatile and yet needed industries is the data storage industry. As computing technologies become more cloud centric and rely upon the web for business, productivity, education to even recreation, there is a constant push to increase capacities but even more so increase I/O throughput. As a result of recent demands, our approach with these technologies need to be re-evaluated. The primary focus of this article is on the future of data storage concepts and the limited life and functionality of RAID.

Back in 1987 when the idea of RAID was first conceived, the goal or vision was to be able to scale multiple drives into a single volume which was represented to a host as such while also offering a form of redundancy with a more sensitive magnetic platter-based disk technology. Flash forward to the present and we are still reliant upon the same technologies. Is that because RAID is so perfect or have we just grown too comfortable and are too afraid of change?

Hardware Vs. Software RAID 

There was a time when processing power was limited and it became advantageous to utilize external methods for creating and managing arrays of data storage, but as time progressed, this approach became increasingly insignificant. At least that is to say for the Small-to-Medium sized Business (SMB). For the last decade, a lot of efforts have been placed toward increasing the reliability, stability and enhanced features with the software-based RAID. This has slowly been eating away at the hardware vendors. Although it has been rarely noticeable.

These software implementations are integrated with methods of Logical Volume Management (with built in redundancy via RAID 1-6), Load Balancing/Multipathing capabilities, data encryption, along with the abilities to utilize incremental snapshot(s) over designated volumes. These software implementations include dynamic resizing, quota/permission management, enhanced copy-on-write file systems that perform very well along with routine checksums to correct noisy and silent data corruption; almost all of which can be managed while volumes are on-line. Some of these volume managers have the capability to export iSCSI & FCoE targets and can also be tuned to support FC targets.

To name a few you have ZFS (an all-in-one solution), Btrfs (still in development and under test), device-mapper / LVM2 / multipath-tools, mdadm, DRBD, etc. The list goes on. What is to stop an SMB from setting up an array of JBODS and (if more redundancy is needed) cluster a couple of Solaris / OpenSolaris or Linux servers to manage their software RAID while also exporting it via a file server or into a SAN? Note that Lustre support for ZFS is still in development. Realistically most entry-level modular external RAID solutions don’t run on the latest and greatest of hardware components (as they are intended for a limited purpose and not to provide other hosting services). You will most likely achieve much greater performance with the software approach while also utilizing a much more efficient virtual memory manager (for enhanced caching) alongside a finely tuned schedular.

On the enterprise end of computing you will find some very impressive storage solutions that are intended to take the workload of the enterprise environment. Such companies as Hitachi Data Systems (HDS) have been doing an excellent job with providing high-quality and well performing storage solutions that are also easily manageable. Other companies have resorted to being a little creative in order to gain some market share with the SMB and larger companies. Such notable companies are NetAppData Domain to even Cleversafe.

Earlier I found an interesting link differentiating the positives and negatives of both hardware and software RAID implementations. It should be noted that times have changed and some of the key points highlighted are no longer an issue. For instance, under the category /boot partition, this seems to no longer be an issue with at least ZFS.

Enter the SSD

In more recent years, the Flash-based Solid State Drive (SSD) has been entering into enterprise markets. This is a result from such notable providers as Sun Microsystems, etc. Currently the percentage in SSD usage in the enterprise is somewhat minimal as their is a limit in maximum capacities for the drives. This may soon change as in Q3 of 2009, PureSilicon will release their Nitro 1TB SSD drive. The throughput and performance speeds seem very optimal in arenas where greater speeds are needed, but the technology introduces additional handicaps (in the form of write operations and a limited cell life) which most environments and some manufacturers have a difficult time in accomodating to. To combat the limited cell life, vendors have implemented their own method of wear leveling, transparent to the host. With this concept, the same data cell, when accessed and written to multiple times will not get written to the exact location but instead, through an “intelligent” built in firmware the data will get written to another cell on the drive. To the operating system, it is still the same “sector” location. While there is very little latency in seeking performance (sequential and random), write operations take a huge hit, especially with smaller I/O transfer sizes, when typically the flash medium erase/rewrite a 128K page at a time.

SSD Tuning

With the recent hype of Flash-based SSDs, many vendors and UNIX/Linux distributions have been writing file systems tuned to perform extremely well on SSDs (and limit the impact of these handicaps). For example, Sun Microsystem’s ZFS (available on Solaris, OpenSolaris, MacOS X [read-only], FreeBSD and Linux [over FUSE]) had recently added tunable support for SSDs in their release versions for Solaris & OpenSolaris, while the development of Btrfs for Linux has done the same. In contrast the Microsoft developed NTFS does not offer such features or functionality. In fact the file system has remained somewhat unchanged over the course of the years and is just as inferior now as it was when it was first released as a replacement to the FAT series of file systems. I wrote an entire post explaining why the NTFS file system is not well suited for today’s methods of computing here.

In recent releases it should be noted that Microsoft’s Windows 7 has been tuned for SSDs that are to be provided on netbooks. What this means, I do not know? And by tuned, this is still unclear. You can read some of that information here. The only reason for the lack of changes in NTFS is to preserve backwards compatibility. This approach limits the ability to update a current existing server’s (if not running Windows 7) NTFS module if it needed to serve backend storage utilizing SSD media.

The Impact on RAID Technologies

As SSDs become more popular the advantages to using RAID are reduced, where the only benefits are gained from a simple stripe in a RAID 0 or mirroring to a backup array within a SAN or other form of network using RAID 01 (not to be confused with a RAID 10); just in case access to the first fails for whatever reason. This is where DRBD would come in real handy. As I briefly mentioned earlier, the whole concept of this form of redundancy was dependent upon the problematic nature of a magnetic disk device; where failures were imminent. And for those who are concerned with a method of error detection for both silent and noisy data corruptions, the majority of RAID implementations (both hardware and software) do not validate the data like the ZFS or Btrfs checksum implementation.

Changes in Protocol Layers?

With the popularity of SSD technologies growing and its costs reducing, the one drawback that is setting manufacturers and consumers back are the limitations offered by the protocols that they are working with. Today, Fibre Channel, SAS and SATA are not capable of handling full SSD speeds and serve only as a bottleneck to the technology. There have been recent attempts from vendors as Fusion-io to even PureSilicon to rely on other protocol interfaces such as PCI Express (PCI-E). Capable of handling up to 1 GB per second, it only seems natural for these vendors to move in that direction. I anticipate that shortly, others will follow. Fibre Channel and SAS may continue to serve the SAN (and with the appropriate load balancing mechanisms configured, it will perform well) but when it comes to the drive within the chassis, I expect to see more PCI Express in the near future. But who knows, with the recent drop in prices for 10Gb Ethernet or the supported high throughput offered from Infiniband, things may be moving toward another direction altogether.

In conclusion, I predict that in five years time we will start to see some huge and very interesting changes. I am looking forward to it.

11. July 2009

Updates to my Linux RAM disk module

Filed under: Storage, Linux — admin @ 06:14

In my last post I had posted a link to a *.tar.gz file containing the source code to both a very generic Linux RAM disk module that I wrote for the Linux 2.6.26.x kernel (was tested on Fedora 8 and Debian 5.0.1) and the source to a single purpose user binary that obtains the total device block count (no. of sectors) via an ioctl() and also performs some write(), lseek() and read() operations.This way, if the user were to monitor the /var/log/messages file they would see exactly where the process is in the driver code (I had placed many printk() messages for learning and tracking purposes).

I had made some updates to the device driver. The updates are so minor that the revision level of the driver incremented from 0.1 to 0.1.1. The updates are as follows:

  1. The user has the ability to specify a size (in MB) for the module during the module insertion.
  2. Also I added a second case field. The only one supported in v.0.1 was BLKGETSIZE which returned the total sectors for the default 64 MB size. As of v.0.1.1, the case statement now calculates from the specified size as opposed to the fixed value it was returning before. The new case field is BLKSSZGET which returns the sector size.

Shortly I will add support for later kernel revisions as the field for unlocked_ioctl() has been removed; at least from what I can tell when I view the block_device_operations structure in the include/linux/blkdev.h file of the source tree. Note that I am seeing this in the 2.6.28.x kernel (confirmed at lxr.linux.no). The block_device_operations structure was modified and relocated from include/linux/fs.h to the location specified above.

When monitoring the /var/log/messages, the user may notice that when invoking my test application (rxio), despite the fact that the application writes before it reads to/from the disk, the trace log will report the opposite (read then write). This is a result of the kernel scheduler. When a request (reference rx_request() function in the source code) is made to the block device for a read/write operation and multiple operations are sent to it, the first request to return is what the scheduler believes would be the best to execute next. So there is nothing wrong here and nothing to be concerned about.

To specify the RAM disk size during insertion, you will need to type the following, otherwise it will continue to default to 64 MB:

# insmod rxd.ko sizemb=96

Version 0.1.1 can be downloaded from here. As I update this driver I will post the details on those updates.

5. July 2009

New Article: The Linux RAM Disk

Filed under: Storage, File Systems, Linux — admin @ 06:39

Back in April, I had written about Linux+ magazine publishing my article, Linux Storage Management in their 3/2009 issue. This has hit the shelves earlier this month. This post is to note that Linux+ is publishing another article of mine, The Linux RAM Disk for their 4/2009 issue. For those interested, it may be worth the read. The article is broken down into:

  • the original Linux RAM disk
  • Linux RAM-based file systems (ramfs and tmpfs)
  • tips for using tmpfs to tune your system (included Firefox caching to RAM)
  • data synchronization techniques
  • hardware implementations of RAM disks and DRAM Solid State Disks (SSD)

While not part of the original article I also decided to write a generic RAM disk block device module which can be written to/formatted and mounted like a normal Linux block device. Note that you can download a *.tar.gz file containing the source code and Makefile here (included is a README with basic instructions for compilation and initialization). This exercise was an excellent learning experience on how Linux block devices functions.

The block device driver is limited to a size of 64MB and can be altered within the source file rxd.c. You can write to the physical device node /dev/rxdev or write an ext2 file system and mount it locally where you can read/write files to it. All files will remain on the RAM disk until the module is removed.

26. June 2009

Hard Rectangular Drives (HRD)

Filed under: Storage, SCSI, Misc. — admin @ 13:55

I do not know all the details on this but I found the concept extremely interesting. It is a Hard Rectangular Drive (HRD) which is very unique in terms of design and functionality. You can read more about it here and here. This technology is being developed by Data Slide. The first article goes on in stating the following:

DataSlide says the new technology would find first use in a PCIe-based card format designed for use in Oracle database applications. The PCIe format is necessitated by the extremely high performance of HRD; like RAMDisks and high-end NAND SSDs, HRD would overwhelm a SATA or SAS interface. The cost of such a device is unknown, but its capacity would be comparable to that of a modern HDD.
« Previous PageNext Page »

Powered by WordPress