Difference between pages "ZFS Fun" and "Zope HOWTO"

(Difference between pages)
(Playing with ZFS)
 
(First Steps)
 
Line 1: Line 1:
'''WARNING: This tutorial is under a heavy revision to be switched from ZFS Fuse to ZFS on Linux.'''
+
This page documents how to use Zope with Funtoo Experimental, which currently has good Zope support thanks to [[Progress Overlay Python]] integration.
  
= Introduction =
+
== About Zope ==
  
== ZFS features and limitations ==
+
Zope is an Open Source application server framework written in Python. It has an interesting history which you should familiarize yourself with before starting Zope development, as it contains several interesting twists and turns.
  
ZFS offers an impressive amount of features even putting aside its hybrid nature (both a filesystem and a volume manager -- zvol) covered in detail on [http://en.wikipedia.org/wiki/ZFS Wikipedia]. One of the most fundamental points to keep in mind about ZFS is it '''targets a legendary reliability in terms of preserving data integrity'''. ZFS uses several techniques to detect and repair (self-healing) corrupted data, simply speaking it makes an aggressive use of checksums and relies on data redundancy, the price pay is it requires a bit more CPU processing power than traditional filesystems and RAID solution. However, the [http://en.wikipedia.org/wiki/ZFS Wikipedia article about ZFS] also mention it is strongly discouraged to use ZFS over classic RAID arrays as it can not control the data redundancy,thus ruining most of its benefits.
+
=== Zope History ===
  
In short, ZFS has the following features (not exhaustive):
+
There are two versions of Zope, Zope 2 and Zope 3. One might assume that Zope 3 is the version that people should use for new software development projects by default, but this is not the case. Most Zope-based projects continue to use Zope 2. Zope 3 was an attempt to redesign Zope 2 from scratch, and is completely different from Zope 2, but it was not adopted by the community.
  
* Storage pool (if you are used to BTRFS volumes should be familiar)
+
There is also something called [http://codespeak.net/z3/five/ Five] (named because it is "2 + 3") that backports many of the new features of Zope 3 into the Zope 2 framework. Several projects will use Zope 2 plus Five in order to use some of the newer features in Zope. Five was merged into mainline Zope 2 in early 2010, and first appeared in Zope 2.8.
* Plenty of space:
+
** 256 zettabytes per storage pool (2^64 storages pools max in a system).
+
** 16 exabytes max for a single file
+
** 2^48 entries max per directory
+
* Virtual block-devices support support over a ZFS pool (zvol) - (extremely cool when jointly used  over a RAID-Z volume)
+
* Read-only Snapshot support (it is possible to get a read-write copy of them, those are named clones)
+
* Encryption support (supported only at ZFS version 30 and upper, ZFS version 31 is shipped with Oracle Solaris 11 so that version is mandatory if you plan to encrypt your ZFS datasets/pools)
+
* Built-in''' RAID-5-like-over-steroid capabilities known as [http://en.wikipedia.org/wiki/Non-standard_RAID_levels#RAID-Z RAID-Z] and RAID-6-like-over-steroid capabilities known as RAID-Z2'''. RAID-Z3 (triple parity) also exists.
+
* Copy-on-Write transactional filesystem
+
* Meta-attributes support (properties) allowing you to you easily drive the show like "That directory is encrypted", "that directory is limited to 5GiB", "That directory is exported via NFS" and so on. Depending on what you define, ZFS takes the appropriates actions!
+
* Dynamic striping to optimize data throughput
+
* Variable block length 
+
* Data deduplication
+
* Automatic pool re-silvering
+
* Transparent data compression / encryption (later requires Solaris 11)
+
  
Most notable limitations are:
+
You can learn more about the history of Zope 2, 3 and Five in the [http://svn.zope.org/Zope/trunk/src/Products/Five/README.txt?view=markup Five README].
  
* Lack a features ZFS developers knows as "Block Pointer rewrite functionality" (planned to be developed), without it ZFS suffers of currently not being able to:
+
To make things even more interesting, work on [http://docs.zope.org/zope2/releases/4.0/ Zope 4] is underway, and it will be based on 2.13 rather than 3.x. It includes a number of [http://docs.zope.org/zope2/releases/4.0/CHANGES.html#restructuring incompatible changes] with prior versions.
** Pool defragmentation (COW techniques used in ZFS mitigates the problem)
+
** Pool resizing
+
** Data compression (re-applying)
+
** Adding an additional device in a RAID-Z/Z2/Z3 pool to increase it size (however, it is possible to replace in sequence each one of the disks composing a RAID-Z/Z2/Z3)
+
* '''NOT A CLUSTERED FILESYSTEM''' like Lustre, GFS or OCFS2
+
* No data healing if used on a single device (corruption can still be detected), workaround if to force a data duplication on the drive
+
* No support of TRIMming (SSD devices)
+
  
== ZFS on well known operating systems ==
+
{{fancynote|This HOWTO targets Zope 2.13, which includes Five. It is typically the version you should be using for new Zope projects.}}
  
=== Linux ===
+
=== Zope Resources ===
  
Despite the source code of ZFS is open, its license (Sun CDDL) is incompatible with the license governing the Linux kernel (GNU GPL v2) thus preventing its direct integration. However a couple of ports exists, but suffers of maturity and lack of features. As of writing (February 2014) two known implementations exists:
+
Now that you understand what version of Zope you should be targeting (2.13), we can point you towards the correct documentation :)
  
* [http://zfs-fuse.net ZFS-fuse]: a totally userland implementation relying on FUSE. This implementation can now be considered as defunct as of February  2014). The original site of ZFS FUSE seems to have disappeared nevertheless the source code is still available on [http://freecode.com/projects/zfs-fuse http://freecode.com/projects/zfs-fuse]. ZFS FUSE stalled at version 0.7.0 in 2011 and never really evolved since then.
+
; [http://docs.zope.org/zope2/zope2book/ The Zope 2 Book]: This book provides a general introduction to Zope concepts and ZMI. It is a good place to start, but doesn't provide a direct introduction to Zope development. It's recommended that you skim through this book to familiarize yourself with Zope. It generally does not assume much prior knowledge about Web development or Python.
* [http://zfsonlinux.org ZFS on Linux]: a kernel mode implementation of ZFS in kernel mode which supports a lot of NFS features. The implementation is not as complete as it is under Solaris and its siblings like OpenIndiana (e.g. SMB integration is still missing, no encryption support...) but a lot of functionality is there. This is the implementation used for this article. As ZFS on Linux is an out-of-tree Linux kernel implementation, patches must be waited after each Linux kernel release. ZfsOnLinux currently supports zpools version 28.
+
; [http://docs.zope.org/zope2/zdgbook/ Zope Developer's Guide]: This guide will give you a better introduction to Zope development. It assumes you already know Python. Skip chapters 1 and 2 and start in [http://docs.zope.org/zope2/zdgbook/ComponentsAndInterfaces.html chapter 3], which covers components and interfaces. [http://docs.zope.org/zope2/zdgbook/Products.html Chapter 5] covers the creation of your first product.
 +
; Five: We're not done yet. There is a bunch of stuff in Zope 2.13 that is not in the official documentation. Namely, the stuff in Five. Check out [http://codespeak.net/z3/five/manual.html The Five Manual].
 +
; ZTK: [http://docs.zope.org/ztkpackages.html ZTK Documentation]  
 +
; ZCA: [http://www.muthukadan.net/docs/zca.html A Comprehensive Guide to Zope Component Architecture] offers a good introduction to the programming concepts of ZCA. We also have a new page on [[Zope Component Architecture]] which will help you to understand the big picture of ZCA and why it is useful. ZCML ("Z-camel") is a part of ZCA and  was introduced in Zope 3, so typically you will find ZCML documented within Zope 3 documentation and book.
 +
; Content Components: Views and Viewlets: [http://docs.zope.org/zope.viewlet/index.html This tutorial on viewlets] also contains some viewlet-related ZCML examples near the end. The "Content Component way" of developing in Zope seems to be a Zope 3 thing and tied to ZCML. Chapter 13+ of Stephan Richter's ''Zope 3 Developer's Handbook'' (book) seems to cover this quite well. You will probably also want to check out Philipp Weitershausen's ''Web Component Development with Zope 3'' (book).
 +
; [http://wiki.zope.org/zope2/Zope2Wiki Zope 2 Wiki]: Main wiki page for all things related to Zope 2.
 +
; [http://docs.zope.org docs.zope.org]: This is the main site for Zope documentation.
  
=== Solaris/OpenIndiana ===
+
== First Steps ==
  
* '''Oracle Solaris:''' remains the de facto reference platform for ZFS implementation: ZFS on this platform is now considered as mature and usable on production systems. Solaris 11 uses ZFS even for its "system" pool (aka ''rpool''). A great advantage of this: it is now quite easy to revert the effect of a patch at the condition a snapshot has been taken just before applying it. In the "old good" times of Solaris 10 and before, reverting a patch was possible but could be tricky and complex when possible. ZFS is far from being new in Solaris as it takes its roots in 2005 to be, then, integrated in Solaris 10 6/06 introduced in June 2006.
+
First, you will need to emerge Zope:
  
* '''[http://openindiana.org OpenIndiana]:''' is based on the Illuminos kernel (a derivative of the now defunct OpenSolaris) which aims to provide absolute binary compatibility with Sun/Oracle Solaris. Worth mentioning that Solaris kernel and the [https://www.illumos.org Illumos kernel] were both sharing the same code base, however, they now follows a different path since Oracle announced the discontinuation of OpenSolaris (August 13th 2010). Like Oracle Solaris, OpenIndiana uses ZFS for its system pool. The illumos kernel ZFS support lags a bit behind Oracle: it  supports zpool version 28 where as Oracle Solaris 11 has zpool version 31 support, data encryption being supported at zpool version 30.
+
<console>
 +
# ##i## emerge --jobs=10 zope
 +
</console>
  
=== *BSD ===
+
Zope is now installed.
  
* '''FreeBSD''': ZFS is present in FreeBSD since FreeBSD 7 (zpool version 6) and FreeBSD can boot on a ZFS volume (zfsboot). ZFS support has been vastly enhanced in FreeBSD 8.x (8.2 supports zpool version 15, version 8.3 supports version 28), FreeBSD 9 and FreeBSD 10 (both supports zpool version 28). ZFS in FreeBSD is now considered as fully functional and mature. FreeBSD derivatives such as the popular [http://www.freenas.org FreeNAS] takes befenits of ZFS and integrated it in their tools. In the case of that latter,  it have, for example, supports for zvol though its Web management interface (FreeNAS >= 8.0.1).
+
== Project Skeleton ==
  
* '''NetBSD''': ZFS has been started to be ported as a GSoC project in 2007 and is present in the NetBSD mainstream since 2009 (zpool version 13).  
+
{{fancynote|Zope should be used by a regular user account, not as the root user.}}
  
* '''OpenBSD''': No ZFS support yet and not planned until Oracle changes some policies according to the project FAQ.
+
The first step in using Zope is to ensure that you are using a regular user account. Create a new directory called <tt>zope_test</tt>:
 
+
== ZFS alternatives ==
+
 
+
* WAFL seems to have severe limitation [http://unixconsult.org/wafl/ZFS%20vs%20WAFL.html] (document is not dated), also an interesting article lies [http://blogs.netapp.com/dave/2008/12/is-wafl-a-files.html here]
+
* BTRFS is advancing every week but it still lacks such features like the capability of emulating a virtual block device over a storage pool (zvol) and built-in support for RAID-5/6 is not complete yet (cf. [https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg29169.html Btrfs mailing list]). At date of writing, it is still experimental where as ZFS is used on big production servers. 
+
* VxFS has also been targeted by comparisons like [http://blogs.oracle.com/dom/entry/zfs_v_vxfs_iozone this one] (a bit [http://www.symantec.com/connect/blogs/suns-comparision-vxfs-and-zfs-scalability-flawed controversial]). VxFS has been known in the industry since 1993 and is known for its legendary flexibility. Symantec acquired VxFS and proposed a basic version (no clustering for example) of it under the same [http://www.symantec.com/enterprise/sfbasic/index.jsp Veritas Storage Foundation Basic]
+
* An interesting discussion about modern filesystems can be found on [http://www.osnews.com/story/19665/Solaris_Filesystem_Choices OSNews.com]
+
 
+
== ZFS vs BTRFS ==
+
 
+
BTRFS and ZFS are sib in their concepts and of course have differences:
+
* both are transactional filesystems (in BTRFS a transaction is a sequence of low level operations)
+
* both implement for example the pool concept (called a "volume" in BTRFS)
+
* both can do snapshots although in ZFS a snapshot is a read only thing and its attributes can't be modified. BTRFS on the other hand has writable snapshots (known as clones in ZFS)
+
* both can organize their storage pool in several logical divisions (called datasets in ZFS and subvolumes in BTRFS).
+
* As their equivalent in BTRFS (subvolumes), ZFS datasets appears as directories
+
* Where as a ZFS snapshot is "hidden" in a sub-directory (named .zfs), BTRFS snapshots appears as visible directories
+
* While ZFS manages rollback in a transparent manner (the filesystem knows where and how to rollback the data), rolling back data in BTRFS requires a bit more work as the system administrator must umount/remount a BTRFS subvolume.
+
* ZFS has a kind of sophisticated RAID-5 called RAID-Z (and now RAID-Z2 ~ RAID-6), similar capabilities are planned for BTRFS but not yet available as of February 2014 because of some work has to be done on parity logging (see [http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg30103.html]).
+
* Both supports the concept of sending and receiving a snapshot of dataset/subvolume, however those are not compatible (a BTRFS snapshot cannot be restored as a ZFS dataset and vice-versa)
+
* Whereas ZFS makes an aggressive use of properties to govern the behaviour of the different datasets (quotas, sharing over NFS, encryption, compression and so on), BTRFS does not use this notion or in a much light manner and only through the ''mount'' command.
+
* '''ZFS has no journal (!)''', this is not a design flaw but an interesting intrinsic feature :) See page 7 of [http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf ''"ZFS The last word on filesystems"''].
+
* While BTRFS subvolumes data can be defragmented, ZFS, even in Solaris/OpenIndiana, has no such capability nevertheless ZFS has deduplication capabilities. Deduplication is possible with BTRFS but not with its "stock" userland software, a third party tool known as [https://github.com/g2p/bedup bedup] is required.
+
 
+
= ZFS resource naming restrictions =
+
 
+
Before going further, you must be aware of restrictions concerning the names you can use on a ZFS filesystem. The general rule is: you can can use all of the alphanumeric characters plus the following specials are allowed:
+
* Underscore (_)
+
* Hyphen (-)
+
* Colon (:)
+
* Period (.)
+
 
+
The name used to designate a ZFS pool has no particular restriction except:
+
* it can't use one the reserved words in particular:
+
** ''mirror''
+
** ''raidz'' (''raidz2'', ''raidz3'' and so on)
+
** ''spare''
+
** ''cache''
+
** ''log''
+
* names must begin with an alphanumeric character (same for ZFS datasets).
+
 
+
= ZFS concepts =
+
 
+
 
+
 
+
= Playing with ZFS  =
+
== Requirements ==
+
* ZFS userland tools installed (package ''sys-fs/zfs'')
+
* ZFS kernel modules built and installed (package ''sys-fs/zfs-kmod''), there is a known issue with kernel 3.13 series see [http://forums.funtoo.org/viewtopic.php?id=2442 this thread on Funtoo's forum]
+
* Disk size of 64 Mbytes as a bare minimum (128 Mbytes is the minimum size of a pool). Multiple disk will be simulated through the use of several raw images accessed via the Linux loopback devices.
+
* At least 512 MB of RAM
+
 
+
== Preparing ==
+
Once your have emerged ''sys-fs/zfs'' and ''sys-fs/zfs-kmod'', that latter being brought in by dependency, launch the startup script ''/etc/init.d/zfs'':
+
<pre># rc-service zfs start</pre>
+
 
+
This will load all required kernel modules (zfs, spl, zunicode...) and will mount all known ZFS datasets conditional to their ''canmount'' attribute not set to ''noauto''. At this stage of the tutorial, you can just manually load the kernel module ''zfs'' all others being loaded by dependency for you:
+
 
+
<pre># modprobe zfs
+
# lsmod | grep zfs -o spl
+
zfs                  874072  0
+
zunicode              328120  1 zfs
+
zavl                  12997  1 zfs
+
zcommon                35739  1 zfs
+
znvpair                48570  2 zfs,zcommon
+
spl                    58011  5 zfs,zavl,zunicode,zcommon,znvpair
+
</pre>
+
 
+
== Your first ZFS pool ==
+
 
+
To start with, four raw disks (2 GB each) are created:
+
  
 
<pre>
 
<pre>
# for i in 0 1 2 3; do dd if=/dev/zero of=/tmp/zfs-test-disk0${i}.img bs=2G count=1; done
+
$ cd
0+1 records in
+
$ mkdir zope_test
0+1 records out
+
2147479552 bytes (2.1 GB) copied, 40.3722 s, 53.2 MB/s
+
...
+
 
</pre>
 
</pre>
  
Then let's see what loopback devices are in use and which is the first free:
+
Now, enter the directory, and create an "instance", which is a set of files and directories that are used to contain a Zope project:
  
 
<pre>
 
<pre>
# losetup -a
+
$ cd zope_test
# losetup -f
+
$ /usr/lib/zope-2.13/bin/mkzopeinstance
/dev/loop0
+
 
</pre>
 
</pre>
  
In the above example nothing is used and the first available loopback device is /dev/loop0. Now associate all of the disks with a loopback device (/tmp/zfs-test-disk00.img -> /dev/loop/0, /tmp/zfs-test-disk01.img -> /dev/loop/1 and so on):
+
You will see the following output, and will be prompted to answer a few questions:
  
 
<pre>
 
<pre>
# for i in 0 1 2 3; do losetup /dev/loop${i} /tmp/zfs-test-disk0${i}.img; done
+
Please choose a directory in which you'd like to install
# losetup -a
+
Zope "instance home" files such as database files, configuration
/dev/loop0: [000c]:781455 (/tmp/zfs-test-disk00.img)
+
files, etc.
/dev/loop1: [000c]:806903 (/tmp/zfs-test-disk01.img)
+
/dev/loop2: [000c]:807274 (/tmp/zfs-test-disk02.img)
+
/dev/loop3: [000c]:781298 (/tmp/zfs-test-disk03.img)
+
</pre>
+
  
=== Pool creation ===
+
Directory: instance
 +
Please choose a username and password for the initial user.
 +
These will be the credentials you use to initially manage
 +
your new Zope instance.
  
It is now time to create our first ZFS data pool and this is accomplished by one of the two commands you have to retain: zfspool. For now, we will ask it to do a simple job: get all of the just created devices and create an aggregated pool:
+
Username: admin
 +
Password: ****
 +
Verify password: ****
  
<pre>
 
# zfs create myfirstpool /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
 
# mount
 
...
 
kstat on /zfs-kstat type fuse (rw,nosuid,nodev,allow_other)
 
myfirstpool on /myfirstpool type fuse (rw,allow_other,default_permissions)
 
 
</pre>
 
</pre>
  
Note that the pool has also been mounted on /myfirstpool! Forget kstat for now, it is mounted automatically by zfs-fuse and countains some performance statistics. Oh by the way, we have used block devices (loopback devices are block devices) to create our ZFS pool, however ZFS can also deal directly with files and the taxonomy used in the ZFS world retains the term '''vdev''' (virtual device). Let's be curious a bit and see what df reports:
+
Now, we will start our Zope instance:
  
 
<pre>
 
<pre>
# df -h
+
$ cd instance
# myfirstpool                          7.9G  21K  7.9G  1% /myfirstpool
+
$ bin/runzope
 
</pre>
 
</pre>
  
Cool! About 8GB are reported, this is barely the sum of our four ''vdevs'' minus some metadata. What can we do with 8 GB of free storage space? Copy some files in it of course!
+
Now that Zope is running, you can visit <tt>localhost:8080</tt> in your Web browser. You will see a nice introductory page to Zope.
  
=== Some file operations ===
+
If you now go to the <tt>localhost:8080/manage</tt> URL, you will be prompted to log in. Enter the username and password you specified. You are now logged in to the ZMI (Zope Management Interface.)
  
<pre>
+
You can stop your application by pressing Control-C. In the future, you can start and stop your Zope instance using the following commands:
# cp -a  /usr/src/linux-3.1-rc4 /myfirstpool
+
# df -h
+
myfirstpool                          7.9G  662M  7.2G  9% /myfirstpool
+
# cd /myfirstpool
+
# ls -l /myfirstpool
+
total 3
+
drwxrwxr-x 24 root root 56 Aug 29 08:41 linux-3.1-rc4
+
# ls -l /myfirstpool/linux-3.1-rc4
+
total 29
+
-rw-rw-r--  1 root root    18693 Aug 29 00:16 COPYING
+
-rw-rw-r--  1 root root    94790 Aug 29 00:16 CREDITS
+
drwxrwxr-x 94 root root      222 Aug 29 00:16 Documentation
+
-rw-rw-r--  1 root root    2464 Aug 29 00:16 Kbuild
+
-rw-rw-r--  1 root root      252 Aug 29 00:16 Kconfig
+
-rw-rw-r--  1 root root  200918 Aug 29 00:16 MAINTAINERS
+
-rw-rw-r--  1 root root    53537 Aug 29 00:16 Makefile
+
-rw-r--r--  1 root root  364907 Aug 29 08:41 Module.symvers
+
-rw-rw-r--  1 root root    17459 Aug 29 00:16 README
+
....
+
drwxrwxr-x 22 root root      41 Aug 29 08:41 sound
+
drwxrwxr-x  9 root root        9 Aug 29 00:16 tools
+
drwxrwxr-x  2 root root      11 Aug 29 08:38 usr
+
drwxrwxr-x  3 root root        3 Aug 29 00:16 virt
+
-rwxr-xr-x  1 root root 13126551 Aug 29 08:41 vmlinux
+
-rw-r--r--  1 root root 14771911 Aug 29 08:41 vmlinux.o
+
# make clean
+
# df -h
+
Filesystem                          Size  Used Avail Use% Mounted on
+
...
+
myfirstpool                          7.9G  444M  7.4G  6% /myfirstpool
+
</pre>
+
 
+
In fact nothing magic, a ZFS pool is acting just like any other existing filesystem :)
+
 
+
=== Unmounting/remounting the pool ===
+
 
+
If ZFS behaves just like any other filesystem, can we unmount it?
+
  
 
<pre>
 
<pre>
# umount /myfirstpool
+
$ zopectl start
# mount | grep myfirstpool
+
$ zopectl stop
#
+
 
</pre>
 
</pre>
  
No more /myfirstpool in our light of sight. So yes, it is possible to unmount a ZFS pool just like with any other filesystem. But... How can we remount it then? Simple! First check the list of all ZFS pools known by the system:
+
<tt>zopectl start</tt> will cause your instance to run in the background rather than consuming a shell console.
  
<pre>
+
== First Project ==
# zpool list
+
NAME          SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
+
myfirstpool  7.94G  444M  7.50G    5%  1.00x  ONLINE  -
+
</pre>
+
  
Then mount it again:
+
We will create a single very primitive Zope package, consisting of an Interface for a TODO class, and a TODO class.
  
<pre>
+
Create the following files and directories relative to your project root:
# zpool list
+
NAME          SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
+
myfirstpool  7.94G  444M  7.50G    5%  1.00x  ONLINE  -
+
# zfs mount myfirstpool
+
</pre>
+
  
Oh! Did you noticed? We used the '''zfs''' command instead of the '''zpool''' command. You will understand the reason of using '''zfs''' instead of '''zpool''' a bit later, for now just remember that '''zfs''' and zpool are the only two commands used to interact with the ZFS universe. Also note that '''zfs mount...''' is the one and only way to remount a ZFS pool in the VFS arborescence so you can't be confused or do errors.
+
* Create the directory <tt>lib/python/example</tt>.
 +
* Create the file <tt>lib/python/example/__init__.py</tt> by typing <tt>touch lib/python/example/__init__.py</tt>.
 +
* Create these files:
  
{{fancynote|The missing leading / ahead of myfirstpool '''is not a typo'''. When a pool is created, ZFS writes in the pool metadata where it must be mounted. Unless overridden, it is assumed that the pool is to be mounted directly under the VFS root in a mountpoint which has the same name of the pool.}}
+
=== <tt>etc/package-includes/example-configure.zcml</tt> ===
  
Let's check what happened:
+
This file registers the <tt>example</tt> directory you created in <tt>lib/python</tt> as a ''package'', so that it is seen by Zope:
  
 
<pre>
 
<pre>
# mount | grep myfirstpool
+
<include package="example" />
myfirstpool on /myfirstpool type fuse (rw,allow_other,default_permissions)
+
# ls -l /myfirstpool
+
total 3
+
drwxrwxr-x 23 root root 33 Sep  4 18:18 linux-3.1-rc4
+
 
</pre>
 
</pre>
  
Everything is back again!
+
=== <tt>lib/python/example/interfaces.py</tt> ===
  
== ZFS datasets ==
+
The following file defines the <tt>ITODO</tt> interface, and also uses some Zope Schema functions to define what kind of data we expect to store in objects that implement <tt>ITODO</tt>:
 
+
Just like your house is a kind of big container subdivided in many others container (rooms), a ZFS pool can be divided in several logical containers known as ''datasets''. Basically, the role of a dataset is to fullfill the so well known adage ''divide and conquer'' as they define the frontiers where all ZFS operations take place: it is '''only''' possible, for example, to take a snapshot/do a rollback of a dataset '''taken at whole'''.
+
 
+
=== Creating and destroying datasets ===
+
 
+
Creating a dataset in a pool is pretty easy to achieve: you invoke the '''zfs''' command, you give it the name of the pool to divide and the name of the dataset to create. To create three datasets named ''myfirstDS, mysecondDS, mythirdDS'' in ''myfirstpool''(again the missing / ahead of ''myfirstpool'' is '''not''' a typo) :
+
  
 
<pre>
 
<pre>
# zfs create myfirstpool/myfirstDS
+
from zope.interface import Interface
# zfs create myfirstpool/mysecondDS
+
from zope.schema import List, Text, TextLine, Int
# zfs create myfirstpool/mythirdDS
+
# ls -l /myfirstpool
+
total 7
+
drwxrwxr-x 23 root root 33 Sep  4 18:18 linux-3.1-rc4
+
drwxr-xr-x  2 root root  2 Sep  4 23:34 myfirstDS
+
drwxr-xr-x  2 root root  2 Sep  4 23:34 mysecondDS
+
drwxr-xr-x  2 root root  2 Sep  4 23:34 mythirdDS
+
</pre>
+
  
Datasets are appearing just as if they were regular directories. Are they? Try to remove one of those:
+
class ITODO(Interface):
 
+
    name = TextLine(title=u'Name', required=True)
<pre>
+
    todo = List(title=u"TODO Items", required=True, value_type=TextLine(title=u'TODO'))
# rmdir /myfirstpool/myfirstDS
+
    daysleft = Int(title=u'Days left to complete', required=True)
rmdir: failed to remove `/myfirstpool/myfirstDS': Device or resource busy
+
    description = Text(title=u'Description', required=True)
 
</pre>
 
</pre>
  
This behavior is absolutely normal, datasets are special entities and must be managed via ZFS commands. Trouble: how a regular directory with files opened by a running process can be distinguished from a ZFS dataset? Both looks similar! Here again, the '''zfs''' command rescues us:
+
=== <tt>lib/python/example/TODO.py</tt> ===
  
<pre>
+
Now, we define <tt>TODO</tt> to be a ''persistent'' object, meaning it can be stored in the ZODB. We specify that it implements our previously-defined <tt>ITODO</tt> interface, and provide reasonable defaults for all values when we create a new TODO object:
# zfs list
+
NAME                    USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool              444M  7.38G  444M  /myfirstpool
+
myfirstpool/myfirstDS    21K  7.38G    21K  /myfirstpool/myfirstDS
+
myfirstpool/mysecondDS    21K  7.38G    21K  /myfirstpool/mysecondDS
+
myfirstpool/mythirdDS    21K  7.38G    21K  /myfirstpool/mythirdDS
+
</pre>
+
 
+
 
+
Not obvious but '''zfs list''' also reveals you a great secret: '''we lied you''' in the previous paragraphs. It it not possible to mount a ZFS pool in the VFS arborescence as '''only''' datasets can be mounted. So where is the prank? Our ''myfirstpool'' had been mounted in the VFS and you never defined any datasets in it. How is that possible? Is there some ZFS black magic lying behind? No. When you created the ZFS pool ''myfirstpool'', a special dataset had also been created in the pool automatically for you: the ''root dataset''. When you typed '''zfs mount mypool''',  you had in fact interact with this root dataset and not with the pool in itself. The operation was transparent for you and you never noticed its presence although using the zfs command instead of zpool could have given you a hint about what lies under the hood. You see that root dataset in the first line of what zfs list reported in the example above.
+
 
+
So the root dataset (myfirstpool) is mounted on /myfirstpool, myfirstDS is then mounted inside (/myfirstpool/myfirstDS) ditto for mysecondDS and mythirdDS. ''Mounted'' is the exact term because if we have a look at what the '''mount''' command reports we can see that those datasets have been '''''effectively''''' mounted:
+
  
 
<pre>
 
<pre>
# mount
+
from persistent import Persistent
rootfs on / type rootfs (rw)
+
from zope.interface import implements
...
+
from example.interfaces import ITODO
myfirstpool on /myfirstpool type fuse (rw,allow_other,default_permissions)
+
myfirstpool/myfirstDS on /myfirstpool/myfirstDS type fuse (rw,allow_other,default_permissions)
+
myfirstpool/mysecondDS on /myfirstpool/mysecondDS type fuse (rw,allow_other,default_permissions)
+
myfirstpool/mythirdDS on /myfirstpool/mythirdDS type fuse (rw,allow_other,default_permissions)
+
</pre>
+
  
As we did before, we can copy some files in the newly created datasets just like they were regular directories:
+
class TODO(Persistent):
 
+
    implements(ITODO)
<pre>
+
    name = u''
# cp -a /usr/portage /myfirstpool/mythirdDS
+
    todo = []
# ls -l /myfirstpool/mythirdDS/*
+
     daysleft = 0
total 438
+
     description = u''
drwxr-xr-x  45 root root      46 Aug 31 07:37 app-accessibility
+
drwxr-xr-x  202 root root     203 Sep  2 07:21 app-admin
+
drwxr-xr-x    3 root root      4 Aug 18 18:13 app-antivirus
+
drwxr-xr-x  93 root root      94 Aug 18 18:13 app-arch
+
drwxr-xr-x  38 root root      39 Aug 18 18:13 app-backup
+
drwxr-xr-x  30 root root      31 Aug 18 18:13 app-benchmarks
+
drwxr-xr-x  66 root root      67 Aug 18 18:13 app-cdr
+
drwxr-xr-x  96 root root      97 Aug 18 18:13 app-crypt
+
drwxr-xr-x  358 root root     359 Aug 18 18:13 app-dicts
+
...
+
# df -h | grep DS               
+
myfirstpool/myfirstDS                5.6G  21K  5.6G  1% /myfirstpool/myfirstDS
+
myfirstpool/mysecondDS              5.6G  21K  5.6G  1% /myfirstpool/mysecondDS
+
myfirstpool/mythirdDS                7.4G  1.9G  5.6G  25% /myfirstpool/mythirdDS
+
 
</pre>
 
</pre>
  
Notice what '''df''' returns: our four datasets shares (don't forget the root dataset!) shares the same storage capacity. Logical indeed: as they are all contained in the same pool they cannot exceed its own storage capacity. Is it possible to cap the maximum capacity of a dataset? Yes, for now just retain that datasets:
+
=== <tt>lib/python/example/configure.zcml</tt> ===
# are logical containers where ZFS operations take place
+
# are concerned at whole by ZFS operations (again: you cannot snapshot/rollback a particular directory located in a dataset, '''you can only operate at the dataset level''')
+
  
We have three datasets, but the third is pretty useless and contains a lot of garbage. Is it possible to remove it with a simple '''rm -rf'''? Let's try:
+
Create an empty <tt>configure.zcml</tt> configuration file:
  
 
<pre>
 
<pre>
# rm -rf /myfirstpool/mythirdDS
+
<configure xmlns="http://namespaces.zope.org/zope"
rm: cannot remove `/myfirstpool/mythirdDS': Device or resource busy
+
    xmlns:five="http://namespaces.zope.org/five"
 +
    xmlns:browser="http://namespaces.zope.org/browser">
 +
</configure>
 
</pre>
 
</pre>
  
This is perfectly normal, remember that datasets are special entities that requires special care and they are not deletable through regular shell commands. However it is possible to destroy them and here again, the '''zfs''' command comes at our rescue:
+
== Debug Mode ==
  
<pre>
+
We can test our first project by entering debug mode:
# zfs destroy myfirstpool/mythirdDS
+
# zfs list
+
NAME                    USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool              444M  7.38G  444M  /myfirstpool
+
myfirstpool/myfirstDS    21K  7.38G    21K  /myfirstpool/myfirstDS
+
myfirstpool/mysecondDS    21K  7.38G    21K  /myfirstpool/mysecondDS
+
</pre>
+
 
+
''Et voila''! No more third dataset. :)
+
 
+
A bit more subtle case: let's mythirdDS and put another nested one in it then try to destroy mythirdDS again:
+
  
 
<pre>
 
<pre>
# zfs create myfirstpool/mythirdDS
+
$ bin/zopectl debug
# zfs create myfirstpool/mythirdDS/nestedSD
+
Starting debugger (the name "app" is bound to the top-level Zope object)
# zfs list
+
NAME                            USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool                      444M  7.38G  444M  /myfirstpool
+
myfirstpool/myfirstDS            21K  7.38G    21K  /myfirstpool/myfirstDS
+
myfirstpool/mysecondDS            21K  7.38G    21K  /myfirstpool/mysecondDS
+
myfirstpool/mythirdDS            42K  7.38G    21K  /myfirstpool/mythirdDS
+
myfirstpool/mythirdDS/nestedDS    21K  7.38G    21K  /myfirstpool/mythirdDS/nestedDS
+
# zfs destroy myfirstpool/mythirdDS
+
cannot destroy 'myfirstpool/mythirdDS': filesystem has children
+
use '-r' to destroy the following datasets:
+
myfirstpool/mythirdDS/nestedDS
+
 
</pre>
 
</pre>
  
'''zfs''' tells us it has found some others datasets located in ''mythirdDS'' and, thus, is unable to delete it without you consent to make a recursive destruction (-r parameter). Before trying to destroy the dataset again let's create some more nested datasets plus a couple of directories inside ''mythirdDS'':
+
Now, let's try creating a new TODO object and writing it out to a ZODB database:
  
 
<pre>
 
<pre>
# zfs create myfirstpool/mythirdDS/nestedSD
+
>>> from ZODB import FileStorage, DB
# zfs create myfirstpool/mythirdDS/nestedSD2
+
>>> storage = FileStorage.FileStorage('mydatabase.fs')
# zfs create myfirstpool/mythirdDS/nestedSD3
+
>>> db = DB(storage)
# mkdir /myfirstpool/mythirdDS/dir1
+
>>> connection = db.open()
# mkdir /myfirstpool/mythirdDS/dir2
+
>>> import transaction
# mkdir /myfirstpool/mythirdDS/dir3
+
>>> root = connection.root()
# zfs list
+
>>> from example.TODO import TODO
NAME                                USED  AVAIL  REFER  MOUNTPOINT
+
>>> a = TODO
myfirstpool                        444M  7.38G  444M  /myfirstpool
+
>>> a.name = u'My TODOs'
myfirstpool/myfirstDS                21K  7.38G    21K  /myfirstpool/myfirstDS
+
>>> a.TODOS = [ u'Do Laundry', u'Wash Dishes' ]
myfirstpool/mysecondDS              21K  7.38G    21K  /myfirstpool/mysecondDS
+
>>> a.daysleft = 1
myfirstpool/mythirdDS                84K  7.38G    21K  /myfirstpool/mythirdDS
+
>>> a.description = u'Things I need to do today.'
myfirstpool/mythirdDS/mynestedDS    21K  7.38G    21K  /myfirstpool/mythirdDS/mynestedDS
+
>>> root[u'today'] = a
myfirstpool/mythirdDS/mynestedDS2    21K  7.38G    21K  /myfirstpool/mythirdDS/mynestedDS2
+
>>> transaction.commit()
myfirstpool/mythirdDS/mynestedDS3    21K  7.38G    21K  /myfirstpool/mythirdDS/mynestedDS3
+
# zfs destroy -r myfirstpool/mythirdDS
+
 
</pre>
 
</pre>
  
Now what happens if we try to destroy mythird again this time with '-r'?
+
[[Category:HOWTO]]
 
+
[[Category:Python]]
<pre>
+
[[Category:Web]]
# zfs destroy -r myfirstpool/mythirdDS       
+
[[Category:Zope]]
cannot destroy 'myfirstpool/mythirdDS/mynestedDS': dataset is busy
+
[[Category:Developer]]
+
[[Category:Featured]]
</pre>
+
 
+
This is not as exactly normal as it should and seems to be a bug in zfs-fuse, the expected behavior is to automatically unmount any dataset contained inside ''mythirdDS'' then destroy it including ''mythirdDS'' itself. The same kind of operation on a Solaris machine with a similar dataset structure gives:
+
 
+
<pre>
+
# zfs list
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
....
+
rpool1/swap                      4.04G  23.2G  123M  -
+
testpool/test                    55.4K  3.76T  55.4K  /testpool/test
+
testpool/test/ds1                44.9K  3.76T  44.9K  /testpool/test/ds1
+
testpool/test/ds2                44.9K  3.76T  44.9K  /testpool/test/ds2
+
testpool/test/ds3                44.9K  3.76T  44.9K  /testpool/test/ds3
+
testpool/test2                  44.9K  3.76T  44.9K  /testpool/test2
+
# mkdir /testpool/test/dir1
+
# mkdir /testpool/test/dir2
+
# mkdir /testpool/test/dir1
+
# zfs destroy -r testpool/test
+
# zfs list
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
....
+
rpool1/swap                      4.04G  23.2G  123M  -
+
testpool/test2                  44.9K  3.76T  44.9K  /testpool/test2
+
</pre>
+
 
+
To go back on ZFS Fuse, just do a few attempts and ''mythirdDS'' should vanish (you may also have to do an explicit '''zfs destroy mythirdDS''' at the end).
+
 
+
=== Snapshotting and rolling back a dataset ===
+
 
+
This is, by far, one of the coolest feature of ZFS: you can litterally take a photograph of a dataset, do whatever you want with the dataset then restore it in the '''exact''' same state just as if nothing had ever happened in the middle. To start with, let's copy some files in ''mysecondDS'':
+
 
+
<pre>
+
# cp -a /usr/portage /myfirstpool/mysecondDS
+
# ls /myfirstpool/mysecondDS/portage
+
total 200
+
drwxr-xr-x  45 root root      46 Aug 31 07:37 app-accessibility
+
drwxr-xr-x  202 root root    203 Sep  2 07:21 app-admin
+
drwxr-xr-x    3 root root      4 Aug 18 18:13 app-antivirus
+
drwxr-xr-x  93 root root      94 Aug 18 18:13 app-arch
+
...
+
drwxr-xr-x  57 root root      58 Aug 22 08:56 x11-wm
+
drwxr-xr-x  16 root root      17 Aug 18 18:13 xfce-base
+
drwxr-xr-x  54 root root      55 Aug 18 18:13 xfce-extra
+
</pre>
+
 
+
Now, let's take a snapshot of ''mysecondDS''. Because we manipulate a dataset and not the pool, we rely on the '''zfs''' command:
+
 
+
<pre>
+
# zfs snapshot myfirstpool/mysecondDS@Charlie
+
</pre>
+
 
+
{{fancynote|The syntax is always ''pool/dataset@snapshot-name'', the name of the snapshot is left at your discretion however '''you must use an at sign (@)''' to separate the snapshot name from the rest of the path.}}
+
 
+
After running that command,
+
 
+
<pre>
+
# ls -la /myfirstpool/mysecondDS
+
total 9
+
drwxr-xr-x  3 root root  3 Sep  5 16:49 .
+
drwxr-xr-x  6 root root  6 Sep  5 15:43 ..
+
drwxr-xr-x 164 root root 169 Aug 18 18:25 portage
+
</pre>
+
 
+
You were not thinking you would see something like ''@Charlie'' or ''Charlie'' lying in /myfirstpool/mysecondDS were you? Of course not, this is obvious ;-) Can '''zfs''' be of any help this time? It has rescued us several times in the past:
+
 
+
<pre>
+
# zfs list
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool                      2.27G  5.54G  444M  /myfirstpool
+
myfirstpool/myfirstDS              21K  5.54G    21K  /myfirstpool/myfirstDS
+
myfirstpool/mysecondDS            1.84G  5.54G  1.84G  /myfirstpool/mysecondDS
+
#
+
</pre>
+
 
+
''So where the heck'' is Charlie? And how on earth can we use it if '''*nothing*''' is visible to us. Again the answer is: '''zfs'''! This time we invoke it with the -t parameter set to 'all' meaning "list all dataset '''including snapshots'''":
+
 
+
<pre>
+
# zfs list
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool                      2.27G  5.54G  444M  /myfirstpool
+
myfirstpool/myfirstDS              21K  5.54G    21K  /myfirstpool/myfirstDS
+
myfirstpool/mysecondDS            1.84G  5.54G  1.84G  /myfirstpool/mysecondDS
+
myfirstpool/mysecondDS@Charlie      37K      -  1.84G  -
+
#
+
</pre>
+
 
+
Notice that ''Charlie'' is not mounted and although ''mysecondDS'' holds near 2GB of data, ''Charlie'' takes only a couple of kilobytes in the dataset. This is the consequence of ZFS being a Copy-on-write filesystem, duplicating all of the data blocks is not required. They will be duplicated only when needed: when ZFS sense a change in a data block, it will create a copy of it thus leaving intact the datablock pointed by a snapshot. At the time they are taken, snapshots occupy very little space in the datasets however as the time goes on they tend to "stick"more and more data blocks to be in use. It is wise to delete snapshots when become not needed anymore.
+
 
+
{{fancynote|'''OpenIndiana''' and '''Oracle Solaris''' supports an interesting feature not available in ZFS Fuse: a kind of secret door in the form of a virtual directory named ''.zfs'' (notice the dot ahead). "secret door" because it is really secret! You cannot see it ''even'' with''' ls -la''', however ''.zfs'' is present in just any of your datasets and holds some very interesting clues:
+
 
+
<pre>
+
# zfs list -t all
+
...
+
testpool/test2                    205K  3.76T  70.3K  /testpool/test2
+
testpool/test2@snap1                  0      -  70.3K  -
+
# cd /testpool/test2
+
# ls -la
+
total 22
+
drwxr-xr-x  11 root root 11 2011-09-05 17:34 .
+
drwxr-xr-x  6 root root  6 2011-09-05 16:13 ..
+
drwxr-xr-x  2 root root  2 2011-09-05 17:34 .sometest
+
drwxr-xr-x  2 root root  2 2011-09-05 17:34 .xyz
+
drwxr-xr-x  2 root root  2 2011-09-05 16:13 dir1
+
drwxr-xr-x  2 root root  2 2011-09-05 16:13 dir2
+
...
+
# cd /testpool/test2/.zfs
+
# pwd
+
/testpool/test2/.zfs
+
# ls -l
+
ls -l
+
total 2
+
dr-xr-xr-x 2 root root 2 2011-09-05 16:13 shares
+
dr-xr-xr-x 3 root root 3 2011-09-05 17:19 snapshot
+
# cd snapshot
+
# ls -l
+
total 2
+
drwxr-xr-x 9 root root 9 2011-09-05 17:19 snap1
+
# cd snap1
+
# ls -l
+
total 22
+
drwxr-xr-x  11 root root 11 2011-09-05 17:34 .
+
drwxr-xr-x  6 root root  6 2011-09-05 16:13 ..
+
drwxr-xr-x  2 root root  2 2011-09-05 17:34 .sometest
+
drwxr-xr-x  2 root root  2 2011-09-05 17:34 .xyz
+
drwxr-xr-x  2 root root  2 2011-09-05 16:13 dir1
+
drwxr-xr-x  2 root root  2 2011-09-05 16:13 dir2
+
...
+
</pre>
+
 
+
Despite you cannot change the snapshot contents, you can access it without having to roll it back to examine its contents. Extremely nifty design choice from the ZFS designers!
+
}}
+
 
+
Now we have found Charlie, let's do some changes in the ''mysecondDS'':
+
 
+
<pre>
+
# rm -rf /myfirstpool/mysecondDS/portage
+
# echo "Hello, world" >  /myfirstpool/mysecondDS/hello.txt
+
# ls -l  /myfirstpool/mysecondDS
+
total 1
+
-rw-r--r-- 1 root root 13 Sep  5 18:07 hello.txt
+
# cat /myfirstpool/mysecondDS/hello.txt
+
Hello, world
+
</pre>
+
 
+
Whooops...removing portage was not the best idea to have and we do not bother about hello.txt. We will have to move back at checkpoint Charlie!
+
 
+
<pre>
+
# zfs rollback myfirstpool/mysecondDS@Charlie
+
# ls -l /myfirstpool/mysecondDS
+
total 6
+
drwxr-xr-x 164 root root 169 Aug 18 18:25 portage
+
</pre>
+
 
+
Again, ZFS handled everything for you and you now have the contents of ''mysecondDS'' exactly as it was at the time the snapshot ''Charlie'' was taken. Not more complicated than that. Hang on you hat, we have not finished.
+
 
+
=== Dealing with several snapshots (time-traveling machine) ===
+
 
+
So far we only used a single snapshot just to keep things simple. However a dataset can hold several snapshots and moreover you can do a delta between two snapshots and nothing is really much more complicated than you have seen so far.
+
 
+
Let's consider myfirstDS this time. This dataset should be empty as we did nothing in it so far:
+
 
+
<pre>
+
# ls -la /myfirstpool/myfirstDS
+
total 3
+
drwxr-xr-x 2 root root 2 Sep  4 23:34 .
+
drwxr-xr-x 6 root root 6 Sep  5 15:43 ..
+
</pre>
+
 
+
Now generate some contents, take a snapshot (snapshot-1), add more content, take a snapshot again (snapshot-2), do some more modifications and take a third snapshot (snapshot-3):
+
 
+
<pre>
+
# echo "Hello, world" >  /myfirstpool/myfirstDS/hello.txt
+
# cp /usr/src/linux-3.1-rc4.tar.bz2 /myfirstpool/myfirstDS
+
# ls -l /myfirstpool/myfirstDS
+
# ls -l /myfirstpool/myfirstDS
+
total 75580
+
-rw-r--r-- 1 root root      13 Sep  5 22:38 hello.txt
+
-rw-r--r-- 1 root root 77220912 Sep  5 22:38 linux-3.1-rc4.tar.bz2
+
# zfs snapshot myfirstpool/myfirstDS@snapshot-1
+
# echo "Goodbye, world" >  /myfirstpool/myfirstDS/goodbye.txt
+
# echo "Are you there?" >> /myfirstpool/myfirstDS/hello.txt
+
# cp /usr/src/linux-3.0.tar.bz2 /myfirstpool/myfirstDS
+
# rm /myfirstpool/myfirstDS/linux-3.1-rc4.tar.bz2
+
# zfs snapshot myfirstpool/myfirstDS@snapshot-2
+
# echo "Still there?" >> /myfirstpool/myfirstDS/goodbye.txt
+
# rm /myfirstpool/myfirstDS/hello.txt
+
# cp /proc/config.gz /myfirstpool/myfirstDS
+
# zfs snapshot myfirstpool/myfirstDS@snapshot-3
+
# zfs list -t all
+
# zfs list -t all
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool                      2.41G  5.40G  444M  /myfirstpool
+
myfirstpool/myfirstDS              147M  5.40G  73.3M  /myfirstpool/myfirstDS
+
myfirstpool/myfirstDS@snapshot-1  73.8M      -  73.8M  -
+
myfirstpool/myfirstDS@snapshot-2    20K      -  73.3M  -
+
myfirstpool/myfirstDS@snapshot-3      0      -  73.3M  -
+
</pre>
+
 
+
Wow, nice demonstration on how a Copy-on-Write filesystem like ZFS works: what do we observe? First it is quite obvious to see that ''snapshot-1'' is quite big. Is is possible that having a so big snapshot to be the consequence of removing /myfirstDS/linux-3.1-rc4.tar.bz2? Absolutely. Remember that a snapshot is a photograph of what a dataset contains at a given time, deleted information and unmodified original information is retained by the snapshot even you delete it from the dataset or bring in some changes to it. If you look again at the command history between snapshot-2 and snapshot-3, you will notice that we removed a small file and changed another small file a bit thus having a little information delta between what the dataset content at this time and what it also actually contains leading to a very small snapshot at the end. The third dataset is the exact copy of what the current dataset contains thus its size is very close to zero (truncated to zero on what you see).
+
 
+
$100 question: "How can I see what changed between snapshots?". Answer: ''yes, you can!'' Nuance is: ZFS Fuse does not support it yet :( Nevertheless here is what snapshots diffing looks like on an OpenIndiana/Solaris machine:
+
 
+
<pre>
+
# zfs create testpool/test2
+
# cd /testpool/test2
+
# wget http://www.kernel.org/pub/linux/kernel/v3.0/testing/patch-3.1-rc4.bz2
+
# echo "Hello,world" > hello.txt
+
# zfs snapshot testpool/test2@s1
+
 
+
# rm patch-3.1-rc4.bz2
+
# echo 'Goodbye!' > goodbye.txt
+
# echo 'Still there?' >> hello.txt
+
# zfs snapshot testpool/test2@s2
+
 
+
# echo 'Hello, again' >> hello.txt
+
# ln -s goodbye.txt goodbye2.txt
+
# mv hello.txt hello-new.txt
+
# zfs snapshot testpool/test2@s3
+
 
+
# zfs list -t all | grep test2
+
testpool/test2                    8.49M  3.76T  47.9K  /testpool/test2
+
testpool/test2@s1                8.41M      -  8.42M    -
+
testpool/test2@s2                29.2K      -  46.4K    -
+
testpool/test2@s3                    0      -  47.9K    -
+
+
# zfs diff testpool/test2@s1 testpool/test2@s2
+
M      /testpool/test2/
+
-      /testpool/test2/patch-3.1-rc4.bz2
+
M      /testpool/test2/hello.txt
+
+      /testpool/test2/goodbye.txt
+
 
+
# zfs diff testpool/test2@s2 testpool/test2@s3
+
M      /testpool/test2/
+
R      /testpool/test2/hello.txt -> /testpool/test2/hello-new.txt
+
+      /testpool/test2/goodbye2.txt
+
 
+
# zfs diff testpool/test2@s1 testpool/test2@s3
+
M      /testpool/test2/
+
-      /testpool/test2/patch-3.1-rc4.bz2
+
R      /testpool/test2/hello.txt -> /testpool/test2/hello-new.txt
+
+      /testpool/test2/goodbye.txt
+
+      /testpool/test2/goodbye2.txt
+
 
+
# zfs diff testpool/test2@s3 san/test2@s1
+
Unable to obtain diffs:
+
  Not an earlier snapshot from the same fs
+
</pre>
+
 
+
Where M,R,+,- stands for:
+
 
+
* M: item has been modified
+
* R: item has been renamed
+
* +: item has been added
+
* -: item has been removed
+
 
+
Observe the output of each diff and draw you own conclusion on what we did at each step and what appears in the diff. It is not possible to get a detailed diff similar to what Git and others gives but you have the big picture of what changed between snapshots. 
+
 
+
If ZFS-Fuse does not implements (yet) a snapshot diffing capability, it can deal with several snapshots and is able to jump across several steps backwards. Suppose we want ''myfirstDS'' to go back exactly is was when we took the dataset photograph named ''snapshot-1'':
+
 
+
<pre>
+
# zfs rollback myfirstpool/myfirstDS@snapshot-1
+
cannot rollback to 'myfirstpool/myfirstDS@snapshot-1': more recent snapshots exist
+
use '-r' to force deletion of the following snapshots:
+
myfirstpool/myfirstDS@snapshot-3
+
myfirstpool/myfirstDS@snapshot-2
+
</pre>
+
 
+
This is not a bug, this is absolutely normal. The '''zfs''' command asks you to give it the explicit permission to remove the two others snapshots as they becomes useless (restoring them would be an absolute no sense) once snapshot-1 is restored. Second attempt:
+
 
+
<pre>
+
# zfs rollback myfirstpool/myfirstDS@snapshot-1
+
# ls -l /myfirstpool/myfirstDS
+
total 75580
+
-rw-r--r-- 1 root root      13 Sep  5 22:38 hello.txt
+
-rw-r--r-- 1 root root 77220912 Sep  5 22:38 linux-3.1-rc4.tar.bz2
+
# zfs list -t all
+
                                                       
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
myfirstpool                      2.34G  5.47G  444M  /myfirstpool
+
myfirstpool/myfirstDS            73.8M  5.47G  73.8M  /myfirstpool/myfirstDS
+
myfirstpool/myfirstDS@snapshot-1      0      -  73.8M  -
+
myfirstpool/mysecondDS            1.84G  5.47G  1.84G  /myfirstpool/mysecondDS
+
myfirstpool/mysecondDS@snapshot1    37K      -  1.84G  -
+
</pre>
+
 
+
''myfirstDS'' effectively returned to its state when ''snapshot-2'' was taken and the snapshots ''snapshot-2'' and ''snapshot-3'' vanished.
+
 
+
{{fancynote|You can leap several steps backward at the cost of '''loosing''' your subsequent modifications forever. }}
+
 
+
=== Streaming datasets over the network ===
+
 
+
{{fancyimportant|'''Nothing in an infrastructure is as much critical as having reliable backups of the data''' used by an organization. Whereas a server can be remounted from scratch, the data it contains is very likely to be lost '''forever''' whenever a disaster occurs. Of course, as the data is the blood of an organization business processes, its '''integrity''' and '''confidentiality''' must be preserved in all cases. }}
+
 
+
You find ZFS snaphots useful? Well, you have seen just a small part of their potential. As a snapshot is a photograph  of what a dataset contains frozen in the time, snapshots can be seen as being no more than a data backup. Like any backup, they must not stay on the local machine but must be put elsewhere and the common good sense tells to keep backups in a safe place, making them travel through a secure channel. By "secure channel" we intend something like a trusted person in your organization whose job consists of bringing a box of tapes off-site in a secure location but we also intend a secure communication channel like an SSH tunnel over two hosts without any human intervention.
+
 
+
ZSH designers had the same vision and made possible for a dataset to be able to be sent over a network. How is that possible? Simple: the process involves two peers who can use through a communication channel like the one established by '''netcat''' (OpenSSH supports a similar functionality but with an encrypted communication channel).  For the sake of the demonstration, we will use two Solaris boxes at each end-point.
+
 
+
How stream some ZFS bits over the network? Here again, '''zfs''' is the answer. A nifty move from the designers was to use ''stdin'' and ''stdout'' as transmission/reception channels thus allowing great a flexibility in processing the ZFS stream. You can envisage, for instance, to compress your stream then crypt it then encode it in base64 then sign it and so on. It sounds a bit overkill but it is possible and in the general case you can use any tool that swallow the data from ''stdin'' and spit it through ''stdout'' in your plumbing.
+
 
+
{{fancynote|The rest of this section has been done entirely on two Solaris 11 machines.}}
+
 
+
1. Sender side:
+
 
+
<pre>
+
# zfs create testpool2/zfsstreamtest
+
# echo 'Hello, world!' > /testpool2/zfsstreamtest/hello.txt
+
# echo 'Goodbye, world' > /testpool2/zfsstreamtest/goodbye.txt
+
# zfs snapshot zfs testpool2/zfsstreamtest@s1
+
# zfs list -t snapshot
+
NAME                              USED  AVAIL  REFER  MOUNTPOINT
+
testpool2/zfsstreamtest@s1            0      -    32K          -
+
</pre>
+
 
+
2. Receiver side (the dataset ''zfs-stream-test'' will be created and should not be present):
+
<pre>
+
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test
+
</pre>
+
 
+
At this point the receiver is waiting after some data.
+
 
+
3. Sender side:
+
<pre>
+
# zfs send testpool2/zfsstreamtest@s1 | nc 192.168.aaa.bbb.ccc 7000
+
</pre>
+
 
+
4. Receiver side:
+
<pre>
+
# zfs list -t snapshot
+
NAME                          USED  AVAIL  REFER
+
...
+
testpool2/zfs-stream-test@s1      0      -  46.4K  -
+
</pre>
+
 
+
Note that we did not set an explicit snapshot name in the second step but it could have been possible to choose anything else but the default which is the name of the snapshot sent over the network. In that case the dataset which will contain the snapshot needs to be created first:
+
<pre>
+
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test@mysnapshot01
+
</pre>
+
 
+
Once received you would get:
+
 
+
<pre>
+
# zfs list -t snapshot
+
NAME                                      USED  AVAIL  REFER
+
...
+
testpool2/zfs-stream-test@mysnapshot01      0      -  46.4K  -
+
</pre>
+
 
+
5. Just for the sake of the curiosity let's do a rollback on the receiver side:
+
 
+
<pre>
+
# zfs rollback testpool2/zfsstreamtest@s1
+
# ls -l /testpool2/zfs-stream-test
+
total 2
+
-rw-r--r-- 1 root root 15 2011-09-06 23:54 goodbye.txt
+
-rw-r--r-- 1 root root 13 2011-09-06 23:53 hello.txt
+
# cat /testpool2/zfs-stream-test/hello.txt
+
Hello, world
+
</pre>
+
 
+
Because ZFS streaming operates using the starnd input and output (''stdin'' / ''stdout'') you can build a bit more complex pipeline like:
+
 
+
<pre>
+
# zfs send testpool2/zfsstreamtest@s1 | gzip | nc 192.168.aaa.bbb.ccc 7000
+
</pre>
+
+
The above example was using two hosts but a simpler setup is also possible: you are not required to send you data over the network with '''netcat''', you can store it to a regular file then mail it or store it on a USB key. By the way: we have not finished! We took only a simple case here: it is absolutely possible to do the exact same operation with the difference between snapshots (incremental). Just like an incremental backup takes only what has changed, ZFS can determine the difference between two snapshots and streaming instead of streaming a snapshot taken at whole. Although ZFS can detect and act on differentials, it does not operate (yet) at the block level: if only a few bytes of a very big file have changed, the whole file will be taken into consideration (operating at data block level is possible with some tools like the well-known '''rsync''').
+
 
+
Consider the following:
+
 
+
* A dataset snapshot (S1) contains two files:
+
** A -> 10 MB
+
** B -> 4 GB
+
* A bit later some files (named C, D and E) are added to the dataset and another snapshot is (S2) taken. S2 contains:
+
** A -> 10 MB
+
** B -> 4 GB
+
** C -> 3 MB
+
** D -> 500 KB
+
** E -> 1GB
+
 
+
With a full transfer of S2 A,B,C,D and E would be streamed whereas an incremental transfert (S2-S1), zfs would only process C, D and E. The next $100 question:''"How can we stream a difference of snapshot? '''zfs''' again?"'' Yes! This time with a subtle difference: a special option specified on the command line telling it must use a difference rather than a full snapshot. Assuming a few more files are added in ''testpool2/zfsstreamtest'' dataset and a snapshot (s2) is has been taken, the delta between s2 and s1 (s2-s1) giving s3 can be send like this (on the receiver side the same as shown above is used, nothing special is required alos notice the presence of the -i option):
+
 
+
* Sender:
+
<pre>
+
# zfs send -i testpool2/zfsstreamtest@s1 testpool2/zfsstreamtest@s2 | nc 192.168.aaa.bbb.ccc 7000
+
</pre>
+
 
+
* Receiver:
+
<pre>
+
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test
+
# zfs list -t snapshot
+
testpool/zfs-stream-test@s1      28.4K      -  46.4K  -
+
testpool/zfs-stream-test@s2          0      -  47.1K  -
+
</pre>
+
 
+
Note that although we did not specified any snapshot name to use on the receiver side, ZFS used by default the name of the second snapshot involved in the delta (''s2'' here).
+
 
+
 
+
$200 question: suppose we delete all of the received snapshots so far on the receiver side and we try to send the difference between s2 and s1, what would happen? ZFS will protest on the receiver side although no error message will be visible on the sender side:
+
<pre>
+
cannot receive incremental stream: destination testpool/zfs-stream-test has been modified
+
since most recent snapshot
+
</pre>
+
 
+
It is even worse if we remove the dataset used to receive the data:
+
 
+
<pre>
+
cannot receive incremental stream: destination 'testpool/zfs-stream-test' does not exist
+
</pre>
+
 
+
{{fancyimportant|ZFS streaming over a network has '''no underlying protocol''', therefore the sender just assumes the data has been successfully received and processed. It '''does not care''' whether a processing error occurs.}}
+
 
+
=== Govern a dataset by attributes ===
+
 
+
So far, most of a filesystem capabilities were driven by separate and scarced command line line tools (e.g. tune2fs, edquota, rquota, quotacheck...) which all have their own ways to handle tasks and can go through tricky ways sometimes especially the quota-related management utilities. Moreover, there was no easy way to handle a limitations on a directory rather than putting it a a dedicated partition or logical volume implying downtimes when additional space was to be added. Quota management is however one of the many facets disk space management includes.
+
 
+
In the ZFS world, many aspects are now managed by simply setting/clearing a property attached to a ZFS dataset through the now so well-known command '''zfs'''.You can, for example:
+
 
+
* put a size limit on a dataset
+
* reserve a space for dataset (that space is ''guaranteed'' to be available in the future although not being allocated at the time the reservation is made)
+
* control if new files are encrypted and/or compressed
+
* define a quota per user or group of users
+
* control checksum usage  => '''never turn that property off unless having very good reasons you are likely to never have''' (no checksums = no silent data corruption detection)
+
* share a dataset by NFS/CIFS
+
* control automatic data deduplication
+
 
+
Not all of a dataset properties are settable, some of them are set and managed by the operating system in the background for you and thus cannot be modified.
+
 
+
{{fancynote|Solaris/OpenIndiana users: ZFS has a tight integration with the NFS/CIFS server, thus it is possible to share a zfs dataset by setting adequate attributes. ZFS on Linux (native kernel mode port) also has a tight integration with the built-in Linux NFS server, the same for ZFS fuse although still experimental. Under FreeBSD ZFS integration has been done both with NFS and Samba (CIFS).}}
+
 
+
Like any other action concerning datasets, properties are sets and unset via the zfs command. On our Funtoo box running zfs-Fuse we can, for example, start by seeing the value of all properties for the dataset ''myfirstpool/myfirstDS'':
+
 
+
<pre>
+
# zfs get all myfirstpool/myfirstDS
+
zfs get all myfirstpool/myfirstDS
+
NAME                  PROPERTY              VALUE                  SOURCE
+
myfirstpool/myfirstDS  type                  filesystem              -
+
myfirstpool/myfirstDS  creation              Sun Sep  4 23:34 2011  -
+
myfirstpool/myfirstDS  used                  73.8M                  -
+
myfirstpool/myfirstDS  available            5.47G                  -
+
myfirstpool/myfirstDS  referenced            73.8M                  -
+
myfirstpool/myfirstDS  compressratio        1.00x                  -
+
myfirstpool/myfirstDS  mounted              yes                    -
+
myfirstpool/myfirstDS  quota                none                    default
+
myfirstpool/myfirstDS  reservation          none                    default
+
myfirstpool/myfirstDS  recordsize            128K                    default
+
myfirstpool/myfirstDS  mountpoint            /myfirstpool/myfirstDS  default
+
myfirstpool/myfirstDS  sharenfs              off                    default
+
myfirstpool/myfirstDS  checksum              on                      default
+
myfirstpool/myfirstDS  compression          off                    default
+
myfirstpool/myfirstDS  atime                on                      default
+
myfirstpool/myfirstDS  devices              on                      default
+
myfirstpool/myfirstDS  exec                  on                      default
+
myfirstpool/myfirstDS  setuid                on                      default
+
myfirstpool/myfirstDS  readonly              off                    default
+
myfirstpool/myfirstDS  zoned                off                    default
+
myfirstpool/myfirstDS  snapdir              hidden                  default
+
myfirstpool/myfirstDS  aclmode              groupmask              default
+
myfirstpool/myfirstDS  aclinherit            restricted              default
+
myfirstpool/myfirstDS  canmount              on                      default
+
myfirstpool/myfirstDS  xattr                on                      default
+
myfirstpool/myfirstDS  copies                1                      default
+
myfirstpool/myfirstDS  version              4                      -
+
myfirstpool/myfirstDS  utf8only              off                    -
+
myfirstpool/myfirstDS  normalization        none                    -
+
myfirstpool/myfirstDS  casesensitivity      sensitive              -
+
myfirstpool/myfirstDS  vscan                off                    default
+
myfirstpool/myfirstDS  nbmand                off                    default
+
myfirstpool/myfirstDS  sharesmb              off                    default
+
myfirstpool/myfirstDS  refquota              none                    default
+
myfirstpool/myfirstDS  refreservation        none                    default
+
myfirstpool/myfirstDS  primarycache          all                    default
+
myfirstpool/myfirstDS  secondarycache        all                    default
+
myfirstpool/myfirstDS  usedbysnapshots      18K                    -
+
myfirstpool/myfirstDS  usedbydataset        73.8M                  -
+
myfirstpool/myfirstDS  usedbychildren        0                      -
+
myfirstpool/myfirstDS  usedbyrefreservation  0                      -
+
myfirstpool/myfirstDS  logbias              latency                default
+
myfirstpool/myfirstDS  dedup                off                    default
+
myfirstpool/myfirstDS  mlslabel              off                    -
+
</pre>
+
 
+
How can we set a limit that prevents ''myfirstpool/myfirstDS'' to not use more than 1 GB of space in the pool? Simple, just set the ''quota'' property:
+
 
+
<pre>
+
# zfs set quota=1G myfirstpool/myfirstDS
+
# zfs get quota myfirstpool/myfirstDS
+
NAME                  PROPERTY  VALUE  SOURCE
+
myfirstpool/myfirstDS  quota    1G    local
+
</pre>
+
 
+
May be something poked your curiosity: ''what "SOURCE" means?'' "SOURCE" describes how the property has been determined for the dataset and can have several values:
+
* '''local''': the property has been explicitly set for this dataset
+
* '''default''': a default value has been assigned by the operating system if not explicitely set by the system adminsitrator (e.g SUID allowed or not in the above example).
+
* '''dash (-)''': not modifiable intrinsic property (e.g. dataset creation time, whether the dataset is currently mounted or not, dataset space usage in the pool, average compression ratio...)
+
 
+
Before copying some files in the dataset, let's fix a binary (on/off) property:
+
<pre>
+
# zfs set compression=on myfirstpool/myfirstDS
+
</pre>
+
 
+
Now try to put more than 1GB of data in the dataset:
+
 
+
<pre>
+
# dd if=/dev/zero of=/myfirstpool/myfirstDS/one-GB-test bs=2G count=1
+
dd: writing `/myfirstpool/myfirstDS/one-GB-test': Disk quota exceeded
+
</pre>
+
 
+
=== Permission delegation ===
+
 
+
ZFS brings a feature known as delegated administration. Delegated administration enables ordinary users to handle administrative tasks on a dataset without being administrators. '''It is however not a sudo replacement as it covers only ZFS related tasks''' such as sharing/unsharing, disk quota management and so on. Permission delegation shines in flexibility because such delegation can be handled by inheritance though nested datasets. Pewrmission deleguation is handled via '''zfs''' through its '''allow''' and '''disallow''' options.
+
 
+
= Data redundancy with ZFS =
+
 
+
Nothing is perfect and the storage medium (even in datacenter-class equipment) is prone to failures and fails on a regular basis. Having data redundancy is mandatory to help in preventing single-points of failure (SPoF). Over the past decades, RAID technologies were powerful however their power is precisely their weakness: as operating at the block level, they do not care about what is stored on the data blocks and have no ways to interact with the filesystems stored on them to ensure data integrity is properly handled.
+
 
+
== Some statistics ==
+
 
+
It is not a secret to tell that a general trend in the IT industry is the exponential growth of data quantities. Just thinking about the amount of data Youtube, Google or Facebook generates every day taking the case of the first [http://www.website-monitoring.com/blog/2010/05/17/youtube-facts-and-figures-history-statistics some statistics] gives:
+
* 24 hours of video is generated every ''minute'' in March 2010 (May 2009 - 20h / October 2008 - 15h / May 2008 - 13h)
+
* More than 2 ''billions'' views a day
+
* More video is produced on Youtube every 60 days than 3 major US broadcasting networks did in the last 60 years
+
 
+
Facebook is also impressive (Facebook own stats):
+
 
+
* over 900 million objects that people interact with (pages, groups, events and community pages)
+
* Average user creates 90 pieces of content each month (750 millions users active)
+
* More than 2.5 million websites have integrated with Facebook
+
 
+
What is true with Facebook and Youtube is also true with many other cases (think one minutes about the amount of data stored in iTunes) especially with the growing popularity of cloud computing infrastructures. Despite the progress of the technology a "bottleneck" still exists: the storage reliability is nearly the same over the years. If only one organization in the world generate huge quantities of data it would be the [http://public.web.cern.ch CERN] (''Conseil Européen pour la Recherche Nucléaire'', now officially known as ''European Organization for Nuclear Research'') as their experiments can generate spikes of many terabytes of data within a few seconds. A study done in 2007 quoted by a [http://www.zdnet.com/blog/storage/data-corruption-is-worse-than-you-know/191 ZDNet article] reveals that:
+
 
+
* Even ECC memory cannot be always be helpful: 3 double-bit errors (uncorrectable) occurred in 3 months on 1300 nodes. Bad news: it should be '''zero'''.
+
* RAID systems cannot protect in all cases: monitoring 492 RAID controller for 4 weeks showed an average error rate of 1 per ~10^14 bits, giving roughly 300 errors for every 2.4 petabytes
+
* Magnetic storage is still not reliable even on high-end datacenter class drives: 500 errors found over 100 nodes while writing 2 GB file to 3000+ nodes every 2 hours then read it again and again for 5 weeks.
+
 
+
Overall this means: 22 corrupted files (1 in every 1500 files) for a grand total of 33700 files holding 8.7TB of data. And this study is 5 years old....
+
 
+
== Source of silent data corruption ==
+
 
+
http://www.zdnet.com/blog/storage/50-ways-to-lose-your-data/168
+
 
+
Not an exhaustive list but we can quote:
+
 
+
* Cheap controller or buggy driver that does not reports errors/pre-failure conditions to the operating system;
+
* "bit-leaking": an harddrive consists of many concentric magnetic tracks. When the hard drive magnetic head writes bits on the magnetic surface it generates a very weak magnetic field however sufficient to "leak" on the next track and change some bits. Drives can generally, compensate those situations because they also records some error correction data on the magnetic surface
+
* magnetic surface defects (weak sectors)
+
* Hard drives firmware bugs
+
* Cosmic rays hitting your RAM chips or hard drives cache memory/electronics
+
*
+
 
+
== Building a mirrored pool ==
+
 
+
 
+
== ZFS RAID-Z ==
+
 
+
=== ZFS/RAID-Z vs RAID-5 ===
+
 
+
RAID-5 is very commonly used nowadays because of its simplicity, efficiency and fault-tolerance. Although the technology did its proof over decades, it has a major drawback known as "The RAID-5 write hole". if you are familiar with RAID-5 you already know that is consists of spreading the stripes across all of the disks within the array and interleaving them with a special stripe called the parity. Several schemes of spreading stripes/parity between disks exists in the natures, each one with its own pros and cons, however the "standard" one (also known as ''left-asynchronous'') is:
+
 
+
<pre>
+
Disk_0  | Disk_1  | Disk_2  | Disk_3
+
[D0_S0] | [D0_S1] | [D0_S2] | [D0_P]
+
[D1_S0] | [D1_S1] | [D1_P]  | [D1_S2]
+
[D2_S0] | [D2_P]  | [D2_S1] | [D2_S2]
+
[D2_P]  | [D2_S0] | [D2_S1] | [D2_S2]
+
</pre>
+
 
+
The parity is simply computed by XORing the stripes of the same "row", thus giving the general equation:
+
* [Dn_S0] XOR [Dn_S1] XOR ... XOR [Dn_Sm] XOR [Dn_P] = 0
+
This equation can be rewritten in several ways:
+
* [Dn_S0] XOR [Dn_S1] XOR ... XOR [Dn_Sm] = [Dn_P]
+
* [Dn_S1] XOR [Dn_S2] XOR ... XOR [Dn_Sm] XOR [Dn_P] = [Dn_S0]
+
* [Dn_S0] XOR [Dn_S2] XOR ... XOR [Dn_Sm] XOR [Dn_P] = [Dn_S1]
+
* ...and so on!
+
 
+
Because the equations are a combinations of exclusive-or, it is  possible to easily compute a parameter if it is missing. Let say we have 3 stripes plus one parity composed of 4 bits each but one of them is missing due to a disk failure:
+
 
+
* D0_S0 = 1011
+
* D0_S1 = 0010
+
* D0_S2 = <missing>
+
* D0_P  = 0110
+
 
+
However we know that:
+
* D0_S0 XOR D0_S1 XOR D0_S2 XOR D0_P = 0000 also rewritten as:
+
* D0_S2 = D0_S1 XOR D0_S2 XOR D0_P
+
 
+
Applying boolean algebra it gives:''' D0_S2 = 1011 XOR 0010 XOR 0110 = 1111'''.
+
Proof: '''1011 XOR 0010 XOR 1111 = 0110''' this is the same as '''D0_P'''
+
 
+
''''''So what's the deal?''''''
+
Okay now the funny part, forgot the above hypothesis and imagine we have this:
+
 
+
* D0_S0 = 1011
+
* D0_S1 = 0010
+
* D0_S2 = 1101
+
* D0_P  = 0110
+
 
+
Applying boolean algebra magics gives 1011 XOR 0010 XOR 1101 => 0100. Problem: this is different of D0_P  (0110). Can you tell which one (or which ONES) of the four terms lies? If you find a mathematically acceptable solution, found your company because you have just solved a big computer science problem. If humans can't solve the question, imagine how hard it is for the poor little RAID-5 controller to determine which stripe is right and which one lies and the resulting "datageddon" (i.e. massive data corruption on the RAID-5 array) when the RAID-5 controller detect error and start to rebuild the array.
+
 
+
This is not science fiction, this a pure reality and the weakness stays in the RAID-5 simplicity. Here is how it can happen: an urban legend with RAID-5 arrays is that they update stripes in an atomic transaction (all of the stripes+parity are written or none of them). Too bad, this is just not true, the data is written on the fly and if for a reason or another the machine where the RAID-5 array has a power outage or crash, the RAID-5 controller will simply have no idea about what he was doing and which stripes are up to date which ones are not up to date. Of course, RAID controllers in servers do have a replaceable on-board battery and most of the time the server they reside in is connected to an auxiliary source like a battery-based UPS or a diesel/gas electricity generator. However, Murphy laws or unpredictable hazards can, sometimes, happens....
+
 
+
Another funny scenario: imagine a machine with a RAID-5 array (on UPS this time) but with non ECC memory. the RAID-5 controller splits the data buffer in stripes, computes a data stripe and starts to write them on the different disks of the array. But...but...but... For some odd reason, only one bit in one of the stripes flips (cosmic rays, RFI...) after the parity calculation. Too bad too sad, one of the written stripes contains corrupted data and it is silently written on the array. Datageddon in sight!
+
 
+
Not to make you freaking: storage units have sophisticated error correction capability (a magnetic surface or an optical recording surface is not perfect and reading/writing error occurs) masking most the cases. However, some  established statistics estimates that even with error correction mechanism one bit over 10^16 bits transferred is incorrect. 10^16 is really huge but unfortunately in this beginning of the XXIst century with datacenters brewing massive amounts of data with several hundreds to not say thousands servers this this number starts to give headaches:  '''a big datacenter can face to silent data corruption every 15 minutes''' (Wikepedia). No typo here, a potential disaster may silently appear 5 times an hour for every single day of the year. Detection techniques exists but traditional RAID-5 arrays in them selves can be a problem. Ironic for a so popular and widely used solution :)
+
 
+
If RAID-5 was an acceptable trade-off in the past decades, it simply made its time.  RAID-5 is dead? '''*Horray!*'''
+
 
+
= More advanced topics =
+
 
+
== ZFS Intention Log (ZIL) ==
+
 
+
= Final words and lessons learned =
+
+
ZFS surpasses by far (as of September 2011) every of the well-known filesystems around there: none of them propose such an integration of features and certainly not with this management simplicity and robustness. However in the Linux world it is definitely a no-go in the short term especially for production systems. The two known implementations are not ready for production environments  and lacks some important features or behave in a clunky manner, this is absolutely correct as none of them pretend to be at this level of maturity and the licensing incompatibility between the code opened by Sun Microsystems some years ago and the GNU/GPL does not help the cause. However, both look '''very promising''' once their corners will become rounded.
+
 
+
For a Linux system, the nearest plan B is you seek for a BTRFS like filesystem covering some of the functionalities offered by ZFS is BTRFS (still considered as experimental, be prepared to a disaster sooner or later although BTRFS is used by some Funtoo core team members since 2 years and proved to be quite stable in practise). BTRFS however does not pushes the limits as much as ZFS does: it does not have built-in snapshot differentiation tool nor implement built-in filesystem streaming capabilities and roll-backing a BTRFS subvolume is a bit more manual than in ''"the ZFS way of life"''.
+
 
+
 
+
= Footnotes & references =
+
Source: [http://docs.huihoo.com/opensolaris/solaris-zfs-administration-guide/html/index.html solaris-zfs-administration-guide]
+
[[Category:Labs]]
+
[[Category:Articles]]
+
[[Category:Filesystems]]
+

Revision as of 19:06, January 11, 2014

This page documents how to use Zope with Funtoo Experimental, which currently has good Zope support thanks to Progress Overlay Python integration.

About Zope

Zope is an Open Source application server framework written in Python. It has an interesting history which you should familiarize yourself with before starting Zope development, as it contains several interesting twists and turns.

Zope History

There are two versions of Zope, Zope 2 and Zope 3. One might assume that Zope 3 is the version that people should use for new software development projects by default, but this is not the case. Most Zope-based projects continue to use Zope 2. Zope 3 was an attempt to redesign Zope 2 from scratch, and is completely different from Zope 2, but it was not adopted by the community.

There is also something called Five (named because it is "2 + 3") that backports many of the new features of Zope 3 into the Zope 2 framework. Several projects will use Zope 2 plus Five in order to use some of the newer features in Zope. Five was merged into mainline Zope 2 in early 2010, and first appeared in Zope 2.8.

You can learn more about the history of Zope 2, 3 and Five in the Five README.

To make things even more interesting, work on Zope 4 is underway, and it will be based on 2.13 rather than 3.x. It includes a number of incompatible changes with prior versions.

Note

This HOWTO targets Zope 2.13, which includes Five. It is typically the version you should be using for new Zope projects.

Zope Resources

Now that you understand what version of Zope you should be targeting (2.13), we can point you towards the correct documentation :)

The Zope 2 Book
This book provides a general introduction to Zope concepts and ZMI. It is a good place to start, but doesn't provide a direct introduction to Zope development. It's recommended that you skim through this book to familiarize yourself with Zope. It generally does not assume much prior knowledge about Web development or Python.
Zope Developer's Guide
This guide will give you a better introduction to Zope development. It assumes you already know Python. Skip chapters 1 and 2 and start in chapter 3, which covers components and interfaces. Chapter 5 covers the creation of your first product.
Five
We're not done yet. There is a bunch of stuff in Zope 2.13 that is not in the official documentation. Namely, the stuff in Five. Check out The Five Manual.
ZTK
ZTK Documentation
ZCA
A Comprehensive Guide to Zope Component Architecture offers a good introduction to the programming concepts of ZCA. We also have a new page on Zope Component Architecture which will help you to understand the big picture of ZCA and why it is useful. ZCML ("Z-camel") is a part of ZCA and was introduced in Zope 3, so typically you will find ZCML documented within Zope 3 documentation and book.
Content Components
Views and Viewlets: This tutorial on viewlets also contains some viewlet-related ZCML examples near the end. The "Content Component way" of developing in Zope seems to be a Zope 3 thing and tied to ZCML. Chapter 13+ of Stephan Richter's Zope 3 Developer's Handbook (book) seems to cover this quite well. You will probably also want to check out Philipp Weitershausen's Web Component Development with Zope 3 (book).
Zope 2 Wiki
Main wiki page for all things related to Zope 2.
docs.zope.org
This is the main site for Zope documentation.

First Steps

First, you will need to emerge Zope:

#  emerge --jobs=10 zope

Zope is now installed.

Project Skeleton

Note

Zope should be used by a regular user account, not as the root user.

The first step in using Zope is to ensure that you are using a regular user account. Create a new directory called zope_test:

$ cd
$ mkdir zope_test

Now, enter the directory, and create an "instance", which is a set of files and directories that are used to contain a Zope project:

$ cd zope_test
$ /usr/lib/zope-2.13/bin/mkzopeinstance

You will see the following output, and will be prompted to answer a few questions:

Please choose a directory in which you'd like to install
Zope "instance home" files such as database files, configuration
files, etc.

Directory: instance
Please choose a username and password for the initial user.
These will be the credentials you use to initially manage
your new Zope instance.

Username: admin
Password: ****
Verify password: **** 

Now, we will start our Zope instance:

$ cd instance
$ bin/runzope

Now that Zope is running, you can visit localhost:8080 in your Web browser. You will see a nice introductory page to Zope.

If you now go to the localhost:8080/manage URL, you will be prompted to log in. Enter the username and password you specified. You are now logged in to the ZMI (Zope Management Interface.)

You can stop your application by pressing Control-C. In the future, you can start and stop your Zope instance using the following commands:

$ zopectl start
$ zopectl stop

zopectl start will cause your instance to run in the background rather than consuming a shell console.

First Project

We will create a single very primitive Zope package, consisting of an Interface for a TODO class, and a TODO class.

Create the following files and directories relative to your project root:

  • Create the directory lib/python/example.
  • Create the file lib/python/example/__init__.py by typing touch lib/python/example/__init__.py.
  • Create these files:

etc/package-includes/example-configure.zcml

This file registers the example directory you created in lib/python as a package, so that it is seen by Zope:

<include package="example" />

lib/python/example/interfaces.py

The following file defines the ITODO interface, and also uses some Zope Schema functions to define what kind of data we expect to store in objects that implement ITODO:

from zope.interface import Interface
from zope.schema import List, Text, TextLine, Int

class ITODO(Interface):
    name = TextLine(title=u'Name', required=True)
    todo = List(title=u"TODO Items", required=True, value_type=TextLine(title=u'TODO'))
    daysleft = Int(title=u'Days left to complete', required=True)
    description = Text(title=u'Description', required=True)

lib/python/example/TODO.py

Now, we define TODO to be a persistent object, meaning it can be stored in the ZODB. We specify that it implements our previously-defined ITODO interface, and provide reasonable defaults for all values when we create a new TODO object:

from persistent import Persistent
from zope.interface import implements
from example.interfaces import ITODO

class TODO(Persistent):
    implements(ITODO)
    name = u''
    todo = []
    daysleft = 0
    description = u''

lib/python/example/configure.zcml

Create an empty configure.zcml configuration file:

<configure xmlns="http://namespaces.zope.org/zope"
     xmlns:five="http://namespaces.zope.org/five"
     xmlns:browser="http://namespaces.zope.org/browser">
</configure>

Debug Mode

We can test our first project by entering debug mode:

$ bin/zopectl debug
Starting debugger (the name "app" is bound to the top-level Zope object)

Now, let's try creating a new TODO object and writing it out to a ZODB database:

>>> from ZODB import FileStorage, DB
>>> storage = FileStorage.FileStorage('mydatabase.fs')
>>> db = DB(storage)
>>> connection = db.open()
>>> import transaction
>>> root = connection.root()
>>> from example.TODO import TODO
>>> a = TODO
>>> a.name = u'My TODOs'
>>> a.TODOS = [ u'Do Laundry', u'Wash Dishes' ]
>>> a.daysleft = 1
>>> a.description = u'Things I need to do today.'
>>> root[u'today'] = a
>>> transaction.commit()