134 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			134 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
 | |
| # Copyright (C) 2005-2014 Junjiro R. Okajima
 | |
| # 
 | |
| # This program is free software; you can redistribute it and/or modify
 | |
| # it under the terms of the GNU General Public License as published by
 | |
| # the Free Software Foundation; either version 2 of the License, or
 | |
| # (at your option) any later version.
 | |
| # 
 | |
| # This program is distributed in the hope that it will be useful,
 | |
| # but WITHOUT ANY WARRANTY; without even the implied warranty of
 | |
| # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 | |
| # GNU General Public License for more details.
 | |
| # 
 | |
| # You should have received a copy of the GNU General Public License
 | |
| # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 | |
| 
 | |
| Lookup in a Branch
 | |
| ----------------------------------------------------------------------
 | |
| Since aufs has a character of sub-VFS (see Introduction), it operates
 | |
| lookup for branches as VFS does. It may be a heavy work. Generally
 | |
| speaking struct nameidata is a bigger structure and includes many
 | |
| information. But almost all lookup operation in aufs is the simplest
 | |
| case, ie. lookup only an entry directly connected to its parent. Digging
 | |
| down the directory hierarchy is unnecessary.
 | |
| 
 | |
| VFS has a function lookup_one_len() for that use, but it is not usable
 | |
| for a branch filesystem which requires struct nameidata. So aufs
 | |
| implements a simple lookup wrapper function. When a branch filesystem
 | |
| allows NULL as nameidata, it calls lookup_one_len(). Otherwise it builds
 | |
| a simplest nameidata and calls lookup_hash().
 | |
| Here aufs applies "a principle in NFSD", ie. if the filesystem supports
 | |
| NFS-export, then it has to support NULL as a nameidata parameter for
 | |
| ->create(), ->lookup() and ->d_revalidate(). So the lookup wrapper in
 | |
| aufs tests if ->s_export_op in the branch is NULL or not.
 | |
| 
 | |
| When a branch is a remote filesystem, aufs basically trusts its
 | |
| ->d_revalidate(), also aufs forces the hardest revalidate tests for
 | |
| them.
 | |
| For d_revalidate, aufs implements three levels of revalidate tests. See
 | |
| "Revalidate Dentry and UDBA" in detail.
 | |
| 
 | |
| 
 | |
| Test Only the Highest One for the Directory Permission (dirperm1 option)
 | |
| ----------------------------------------------------------------------
 | |
| Let's try case study.
 | |
| - aufs has two branches, upper readwrite and lower readonly.
 | |
|   /au = /rw + /ro
 | |
| - "dirA" exists under /ro, but /rw. and its mode is 0700.
 | |
| - user invoked "chmod a+rx /au/dirA"
 | |
| - the internal copy-up is activated and "/rw/dirA" is created and its
 | |
|   permission bits are set to world readble.
 | |
| - then "/au/dirA" becomes world readable?
 | |
| 
 | |
| In this case, /ro/dirA is still 0700 since it exists in readonly branch,
 | |
| or it may be a natively readonly filesystem. If aufs respects the lower
 | |
| branch, it should not respond readdir request from other users. But user
 | |
| allowed it by chmod. Should really aufs rejects showing the entries
 | |
| under /ro/dirA?
 | |
| 
 | |
| To be honest, I don't have a best solution for this case. So aufs
 | |
| implements 'dirperm1' and 'nodirperm1' and leave it to users.
 | |
| When dirperm1 is specified, aufs checks only the highest one for the
 | |
| directory permission, and shows the entries. Otherwise, as usual, checks
 | |
| every dir existing on all branches and rejects the request.
 | |
| 
 | |
| As a side effect, dirperm1 option improves the performance of aufs
 | |
| because the number of permission check is reduced when the number of
 | |
| branch is many.
 | |
| 
 | |
| 
 | |
| Loopback Mount
 | |
| ----------------------------------------------------------------------
 | |
| Basically aufs supports any type of filesystem and block device for a
 | |
| branch (actually there are some exceptions). But it is prohibited to add
 | |
| a loopback mounted one whose backend file exists in a filesystem which is
 | |
| already added to aufs. The reason is to protect aufs from a recursive
 | |
| lookup. If it was allowed, the aufs lookup operation might re-enter a
 | |
| lookup for the loopback mounted branch in the same context, and will
 | |
| cause a deadlock.
 | |
| 
 | |
| 
 | |
| Revalidate Dentry and UDBA (User's Direct Branch Access)
 | |
| ----------------------------------------------------------------------
 | |
| Generally VFS helpers re-validate a dentry as a part of lookup.
 | |
| 0. digging down the directory hierarchy.
 | |
| 1. lock the parent dir by its i_mutex.
 | |
| 2. lookup the final (child) entry.
 | |
| 3. revalidate it.
 | |
| 4. call the actual operation (create, unlink, etc.)
 | |
| 5. unlock the parent dir
 | |
| 
 | |
| If the filesystem implements its ->d_revalidate() (step 3), then it is
 | |
| called. Actually aufs implements it and checks the dentry on a branch is
 | |
| still valid.
 | |
| But it is not enough. Because aufs has to release the lock for the
 | |
| parent dir on a branch at the end of ->lookup() (step 2) and
 | |
| ->d_revalidate() (step 3) while the i_mutex of the aufs dir is still
 | |
| held by VFS.
 | |
| If the file on a branch is changed directly, eg. bypassing aufs, after
 | |
| aufs released the lock, then the subsequent operation may cause
 | |
| something unpleasant result.
 | |
| 
 | |
| This situation is a result of VFS architecture, ->lookup() and
 | |
| ->d_revalidate() is separated. But I never say it is wrong. It is a good
 | |
| design from VFS's point of view. It is just not suitable for sub-VFS
 | |
| character in aufs.
 | |
| 
 | |
| Aufs supports such case by three level of revalidation which is
 | |
| selectable by user.
 | |
| 1. Simple Revalidate
 | |
|    Addition to the native flow in VFS's, confirm the child-parent
 | |
|    relationship on the branch just after locking the parent dir on the
 | |
|    branch in the "actual operation" (step 4). When this validation
 | |
|    fails, aufs returns EBUSY. ->d_revalidate() (step 3) in aufs still
 | |
|    checks the validation of the dentry on branches.
 | |
| 2. Monitor Changes Internally by Inotify/Fsnotify
 | |
|    Addition to above, in the "actual operation" (step 4) aufs re-lookup
 | |
|    the dentry on the branch, and returns EBUSY if it finds different
 | |
|    dentry.
 | |
|    Additionally, aufs sets the inotify/fsnotify watch for every dir on branches
 | |
|    during it is in cache. When the event is notified, aufs registers a
 | |
|    function to kernel 'events' thread by schedule_work(). And the
 | |
|    function sets some special status to the cached aufs dentry and inode
 | |
|    private data. If they are not cached, then aufs has nothing to
 | |
|    do. When the same file is accessed through aufs (step 0-3) later,
 | |
|    aufs will detect the status and refresh all necessary data.
 | |
|    In this mode, aufs has to ignore the event which is fired by aufs
 | |
|    itself.
 | |
| 3. No Extra Validation
 | |
|    This is the simplest test and doesn't add any additional revalidation
 | |
|    test, and skip therevalidatin in step 4. It is useful and improves
 | |
|    aufs performance when system surely hide the aufs branches from user,
 | |
|    by over-mounting something (or another method).
 |