dd if=/dev/random of=/dev/blog

17. March 2009

Linux File Systems: Ext4. Btrfs. Do we understand what we need?

Filed under: Storage, File Systems, Linux — admin @ 11:10

Lately I have been reading a lot about Ext4, Btrfs and the next generation of Linux file systems in general. What frustrates me about all these articles is the lack of understanding in what the file system is attempting to accomplish/address. You have file systems that are catered to Flash devices, to desktop usage, enterprise class computing while others resort to compression methods as a space saving technique. As an end-user or administrator you will most likely utilize the file system that will be the most appropriate in your environment. What are you trying to solve? What is the I/O profile that you are catering to? How is this storage medium going to be accessed? All of these are important questions to ask when deploying a file system to manage a specific type of data computing.

Let us first start with a brief history of GNU/Linux. Linux began as a hobby/project intended as a desktop alternative. Almost immediately ports became available to the GNU/Linux environment in which now it was able to serve web, print and other services. It did not take much longer for it to be adopted into business environments in which it was starting to run the behind the scenes.

Since its beginning, most file systems utilized were intended for desktop use. That includes ext2/3-fs, ReiserFS (Reiser4), etc. Companies such as IBM and SGI had developed JFS and XFS (and CXFS) as a higher performing and easily scalable solution. This addressed more of the needs for enterprise class computing requirements. The needs being performance. Redundancy was still important but judged on a different scale and usually managed by external methods or through other applications.

Flash forward to the present and let us take a look at Ext4-fs and Btrfs. A lot of journalists/bloggers  have been doing a lot of comparisons between the two and unless I am mistaken, both attempt to resolve two different problems. Ext4-fs (an upgrade to the Ext3-fs desktop file system) will still be a “desktop file system” while Btrfs is a solution to compete with the Sun Microsystems enterprise capable ZFS. Both are to be utilized in completely different arenas. On the one hand, the one will run on a local disk while the other can be deployed and run efficiently with external storage in a data center, some of which are under the management of external RAID controllers with their own features and accessible via Fibre Channel, Serial Attached SCSI (SAS), iSCSI, etc. protocols; as is common with deployments of ZFS. I can attest to this personally.

Again, we are talking about a totally different playing field. And if that is the case, then why are we treating both Linux file systems as competitors with each other? Why are we treating Ext4-fs as a short term file system solution until Btrfs can replace (after it has become stable)?

I am excited to see what the future holds for both file system but am more intrigued to see the path that Btrfs follows, at least once it has become stable and engineered to truly rival ZFS. ZFS is an extremely well designed file system that offers features and functionality that Btrfs is just catching up to. But that is a topic for another time.

17 Comments »

  1. Wow your font is horrible to read.

    Comment by noone — 17. March 2009 @ 12:01

  2. Here you go.

    Comment by admin — 17. March 2009 @ 12:06

  3. Thank you.. I had exactly the same questions you answered here.

    Comment by Santosh — 18. March 2009 @ 01:00

  4. ted ts’o talks about this in a video interview:

    http://www.linux-magazine.com/online/news/video_ted_ts_o_on_ext4_btrfs_and_first_steps_with_linux

    Comment by rich — 18. March 2009 @ 03:31

  5. Even though you are correct in saying that ext4 is most suited to desktop deployment, btrfs is so rich that it will be an excellent choice not only for enterprise, but also the desktop.

    Logical volume management is such an awesome feature that I can’t believe everyone isn’t using it. Even if you don’t think you need LVM, you need LVM. And with btrfs, volume management is baked in.

    Personally, I am fanging for btrfs. I would have loved to see ZFS available in-kernel but in the meantime I am making do with the cobbled solution of RAID10, LVM2 and various filesystems (usually ext3 and XFS, depending on need) to meet my storage needs. It works remarkably well considering they are discreet concepts, but that has always been the UNIX philosophy.

    EXT4 will be one of the shortest-lived filesystems despite the fact it will solve a very large problem.

    McPop.

    Comment by McPop — 18. March 2009 @ 06:18

  6. “…Logical volume management is such an awesome feature that I can’t believe everyone isn’t using it. Even if you don’t think you need LVM, you need LVM…” .
    ..uhm.. try recover a damaged disk with LVM and without LVM and You can see why a lot of people don’t use it.

    Without LVM You boot with , let’s say , a Knoppix live , mount and go recovering. With LVM boot with a live and …pray; after lvm-brainstorming may be you can recover the original LVM structure and AFTER that You can start the recovering process. my 2 cents.
    Steve from Italy.

    Comment by steve Italy — 18. March 2009 @ 08:14

  7. Steve,

    I understand where you are coming from because I have been in that same situation myself. Although one thing to consider is that if the Volume Manager is maintained with the file system itself, recovery could be much easier, as long as the superblock(s) of the file system are still readable as (for example) Btrfs. I guess only time can tell as development of these next generation file systems progress.

    Comment by admin — 18. March 2009 @ 08:32

  8. Now that IBM is sniffing around SUN, maybe they’d open source ZFS when/if IBM did buy SUN.

    Comment by Ian — 18. March 2009 @ 09:36

  9. Ian,

    The sad reality is that Sun did open source ZFS, but under the Common Development and Distribution License (CDDL) which is not compatible with the General Public License Version 2 (GPLv2) that the Linux kernel is licensed under. This is why there are efforts to port it into Linux’s FUSE interface.

    I just recently saw the articles about a possible acquisition of Sun into IBM and I guess we will have to see if a re-licensing may take place.

    Comment by admin — 18. March 2009 @ 09:42

  10. > Why are we treating Ext4-fs as a short term file system solution
    > until Btrfs can replace (after it has become stable)?

    Perhaps because that’s what Ted Ts’o is quoted as saying.

    Comment by Daniel — 18. March 2009 @ 09:43

  11. >Perhaps because that’s what Ted Ts’o is quoted as saying.

    It is difficult when the key developer hints to that also. Thanks for pointing that out.

    Comment by admin — 18. March 2009 @ 09:44

  12. Lets look at the types of file systems from another aspect with regard to ext2, ext3, ext4, lvm, and btfrs. Look at it from the size of the linux system.

    Home /laptop computer
    If I have a small desktop computer with perhaps 512megs of memory and an 80 gig drive, what do I want to have loaded in memory so as to have the maximum amount of free memory available for my application? As a second part of this question, am I running a time sharing service with financial data? Does every small computer limit itself to one user running a small database and some other stuff, where the swap file is kept inactive? So, to answer, perhaps ext3 is ideal.

    Small Small business system
    Now lets look at the intermediate sized system, (lots of memory with all data on one disk). Here we can safely say that we want some speed, and we want disk I/O re-sequencing to permit improved response times. This option is great (and assume we have the 2008 technology for PC’s with 4 gigs of memory max). Some caching would be advantageous, would it not? But again, we want to preserve as much ram as possible for applications, and as little extra for the kernel. Would EXT4 be best? That is my thought.

    Next case, (I did not exhaust all the cases. There are many I left out)

    The humongous data store.

    We have terabytes of data, we require fast response time, and we require data to be shared over several hard disks. What do we do?

    Do we create sub-directory links from root to the other hard disks? Do we use logical volume management(lvm) to allow one directory to be mapped to several hard disks, with lvm restrictions, or do we use btfrs? Btfrs is the file system that is geared to larger CPU systems with very large data storage requirements, and with simpler management to controlling where extents will be placed. BTFRS in theory will replace LVM. LVM currently is updated with ext3.

    So, now, lets look at SSD devices, which we believe will replace hard disks as the active store, and which I believe in the near future, will use the hard disk as a log file device for recovery, or as an archive device. Do we as yet know if we require btfrs, or ext4 or another “not yet designed” set of software drivers?

    For very very large data stores, would it be reasonable to have a dedicated file-server. Would it actually be a battery powered CPU acting as both ssd and disk controller? We communicate with the data store via this CPU (device). It is responsible for safely storing and retrieving data, and for allowing multiple clusters of systems to interface to the common set of data.

    In a banking or other critical application, such as airline controllers at a busy airport,
    we need the backup system to take over in a second or two after a main system failure.
    Which file system would permit this activity?

    Well, from what I read, none of ext3, ext4, or even btfrs do allow this. So we need something new.

    For what it is worth, IBM solved this problem around 1980, or before, with their mainframe systems.

    Finally, in the articles I read, the new hard disks come with 16megs or even 32 megs of cache memory. Not one author has been written about using ext[234], lvm or btfrs with these devices.

    Your thoughts please.

    Leslie

    PS. Some IBM’s raid controllers contain a cache memory with battery backup. If there is a restart following a crash, the controller card takes precedence over the hard disk contents. The logic behind this is to permit the use of caching hard-disk devices, with full recovery possibilities. Now, with the cache at the hard disk level, but battery backed up, we realize that we don’t require 4 or 5 second delays and large caches in the Linux disk drivers.

    Comment by Leslie Satenstein — 22. March 2009 @ 12:25

  13. Leslie,

    Thank you for the reply. I will agree with you 100% that in the end scheme of all things, the only thing that matters is the application. The file system must appropriately accommodate the environment it is deployed in. If the equipment used is somewhat limited or the volumes that are worked with are smaller than typical, then you may want to consider something that is not so processor intensive. If quick recoveries are not your concern or you need something that compresses all data in real time or to run on a flash device, again…you must be careful in picking the right choice. That is why I urge people to understand the I/O profile that they are accommodating to.

    I do also agree that the Ext[2-4] series of file systems would be ideal in a home/laptop or small business environment. Although it is unfortunate that this may not necessarily be the case when Btrfs comes, even though it is intended for a completely different environment. I guess it can be though of this way. When Windows XP came out, a lot of people were rushing out to get the Professional Edition as opposed to the Home Edition which is what they should have settled with instead. They did not understand the differences between the two; but Professional had to be greater so they had to get it. This is what I feel will happen with Ext4 and Btrfs. Distributions will start supporting it and possibly offering as their default file system. To kills two birds with one stone in regards to volume management.

    >>BTFRS in theory will replace LVM. LVM currently is updated with ext3.

    I have experienced problems in the past while trying to get XFS to run on top of LVM2. So the frustration was there is it is about time to get something new and up-to-date.

    >So, now, lets look at SSD devices, which we believe will replace hard disks as the active store,

    Be careful what you say here. The SSD market has been around for decades but only recently obtained momentum. Most uses are in home environments but those are strictly the Flash-based SSDs. Storage companies have great marketing because they have started to deceive people with false ideas about SSD sequential write performance. It is horrible after the first cell write. With regards to seeking, there is no latency. This is great. Read performance is through the roof while random write performance is significantly better than the magnetic counterpart of the Hard Disk Drive. Going back to marketing, huge companies are starting to push Flash-based SSD technologies out the door. Companies such as Sun Microsystems, Hitachi, Texas Memory Systems, etc. And I believe this may hurt the market more than help it. As I had just mentioned, the sequential write performance is horrible after the first cell write. These drives have 2KB cells which can only be accessed in chunks of 64; that is 128KB at a time. So if I wish to update 1 bytes or 128KB, the drive has to seek to the 128K chunk, read the data, erase the data and write the new data. Writing to NAND Flash is not all that quick either. This is why most companies resort to extra caching when utilizing these drives.

    With regards to enterprise class computing and SSDs, I strongly believe that the DRAM technology will be catching us by surprise shortly. Some companies have started working with this approach. No latencies and extraordinary speeds of sequential/random read and write operations. It is all in memory and backed up by battery (some approaches sync them to a local Flash module).

    As for the type of file system to use, I know that some work has been done by Jörn Engel for LogFS. Supposedly Btrfs will be optimized for SSDs but this logic will just keep pushing back into an environment which it will be too much for (end user) as opposed to serving Terabytes of storage and such.

    >In a banking or other critical application, such as airline controllers at a busy airport,
    >we need the backup system to take over in a second or two after a main system failure.
    >Which file system would permit this activity?
    >Well, from what I read, none of ext3, ext4, or even btfrs do allow this. So we need something new.

    In this case I agree with you again, I do not think Ext4 or Btrfs would do well. We would need something that supports clustering, possibly GFS (Global File System).

    >Finally, in the articles I read, the new hard disks come with 16megs or even 32 megs of cache memory.
    >Not one author has been written about using ext[234], lvm or btfrs with these devices.

    Also let us not forget about the SCSI-based technologies such as Fibre Channel and Serial Attached SCSI drives in which you are working with a dual channel drive in full duplex with speeds double of the SATA and other ATA counterparts. This is on top of the drive cache with in native and optimized queuing mechanism to limit poor performance in seeking.

    Sometimes I feel that we put too much thought into area of a file system that we have now evolved from instead of focusing in other areas where we can make the device truly shine.

    Petros

    Comment by admin — 22. March 2009 @ 14:51

  14. I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

    Joannah

    http://linuxmemory.net

    Comment by Joannah — 26. March 2009 @ 08:29

  15. Joannah,

    Thank you for much for the encouraging words. I enjoy hearing such things because it motivates me to continue. ;-)

    Petros

    Comment by admin — 26. March 2009 @ 09:01

  16. With steady decreasing of the realiability of harddiscs safety measures like md5 hash per file etc. will become as important as speed etc. - even for private and small company users!

    PS: With cookies disabled your silly system says that one does not know math. Fantastic :-((

    Comment by tricky — 27. March 2009 @ 09:10

  17. Ext3 will be around for a while as it is a proven tech..
    Ext4 will be “good enough” that it will be the mainstay for a while.
    Btrfs is a complete unknown so may not even last the course, it *could* be a contender but we will see.
    ZFS is the possible game changer, that is if IBM buy SUN and decide to GPL2 it (or even GPL3! it) then all bets are off - Linus said he would even consider GPL3ing Linux if that’s what it took to get ZFS.

    Interesting times :)

    Comment by SilverWave — 5. April 2009 @ 16:43

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress