buvt-filter file format
Intro
The file is by default called ".buvt-filter"
It is placed throughout the source tree, to add "rules" ("rule" = "control string" + pattern (see below)). These rules instructs the parser on which parts of the tree (and which files) are to be included/excluded.
Notations
"
rule row": A row in the .buvt-filter file, that is "trimmed" (surrounding white space removed)
"
current folder": refers to the folder where the program is as it parses down the tree.
"
rule folder": the folder in which a rule was declared. (The folder containing the ".buvt-filter"-file that contained the rule.)
"
pattern": the second part of each "rule row". Can be a simple string or a
Regular expression (Perl-like syntax (interpreted by python))
"
candidate": the part (sub string) of the file- (folder-) path that is matched to the pattern (marked
green in the example below).
Parsing
As the program runs, it recursively goes through the tree.
The first thing as a folder is entered, a ".buvt-filter"-file is looked for, and its rules are added to the beginning the "rule array" (the first rule in the file goes to the first position of the array (see example below))
The rules are ignored as sub folders are entered, unless the "s" control character is specified.
The rules are removed (from the "rule array") the last thing when the folder is exited (going back to the parent folder).
The start position of the candidate string is determined by "r" control character (see below).
As the tree is parsed, if the candidate string matches the pattern, it is included/excluded based on the "+"/"-" control character. When/if a match is found the remaining rules are skipped.
The default (if no rule matches) is that the entry is included.
Included folders are further examined, while excluded folders are skipped.
The target will be a subset of the source tree. (It is not possible to keep files on the target tree that are not found in the source tree.)
Check the source code of "[BUVT_FOLDER]/buvtPyScript/buvtPyScript.py" (TreeParser.getBranch()) for more details.
Comments and empty rows
Empty rows and comments (rows starting with "#") are ignored.
Rule rows
Each rule consist of two parts: (Separated by the first white space character.)
1) The first part consists of control characters for the rule
2) The second part consists of the actual pattern.
By default a file/folder is included (if none of the rules patterns matches).
Control string (control characters)
The control string must always start with "+" or "-" (or "#"). All others can be left out, and the default value is used.
- Comment:
- "#": comment (everything remaining on the line is ignored)
- Include/exclude character:
- "+": include-rule: Files will be included in the array of files that are addressed (synced to database or synced to target tree). Folders will be entered, looking for more files.
- "-": exclude-rule: Opposite of "+" (Files are not added to the array of files). Folders are not be entered. On the target-side files/folders are deleted. (The target will be a subset of the source tree. (It is not possible to keep files on the target tree that are not found in the source tree))
Note: soft links (sym links) are treated as files (even soft links to folders) (see more below)
- Entry type:
- "f": the pattern is used on files (but not folders) (Default)
- "F": the pattern is used on folders (but not files)
- "B": the pattern is used on Both files and folders
Note 1: Having more than one of the above is not allowed.
Note 2: Soft links (sym links) are treated as files (even soft links to folders) (see more below)
- Usage of rule in sub folders:
- "s": The rule (pattern) is used in sub folders (as well as in the "rule folder").
- Default: the rule is ignored in sub folders.
- Starting point of the "candidate string": (See the green text in the example below.)
- "r": the starting point is in the "rule folder", (where the rule was added). (See example)
- Default: the starting point is in the "leaf-part" (the part of the entry path furthest to the right) of the entry that is examined. (See example)
- Pattern type:
- "R": the pattern is a regular expression (perl style (interpreted by python (So see its documentation))).
- "G": the pattern is a glob style regular expression (not really tested (so bugs may exist)).
- Default: normal string matching.
Note: Having more than one of the above is not allowed.
Example to show how the starting point of the candidate string works:
Consider the following file tree with four files:
/home/john/A/.buvt-filter
/home/john/A/a.txt
/home/john/A/A/a.txt
/home/john/A/A/A/a.txt
Assume we have two rules in /home/john/A/.buvt-filter
+fsr A/a.txt
-fs a.txt
- The program loops over the files in an outer loop and the rules in an inner loop:
|
fileCounter
|
File path (candidate string in green)
|
ruleCounter
|
Pattern
|
Candidate==Pattern
|
Comment
|
Included
|
|
0
|
/home/john/A/.buvt-filter
|
0
|
A/a.txt
|
No
|
|
Yes (by default)
|
|
/home/john/A/.buvt-filter
|
1
|
a.txt
|
No
|
|
|
1
|
/home/john/A/a.txt
|
0
|
A/a.txt
|
No
|
|
No
|
|
/home/john/A/a.txt
|
1
|
a.txt
|
Yes
|
The rules "-" sign => exclusion.
|
|
2
|
/home/john/A/A/a.txt
|
0
|
A/a.txt
|
Yes
|
The rules "+" sign => inclusion.
|
Yes
|
|
/home/john/A/A/a.txt
|
(1)
|
a.txt
|
(Yes)
|
Not actually tested (as the algorithm bails out after the first match)
|
|
3
|
/home/john/A/A/A/a.txt
|
0
|
A/a.txt
|
No
|
|
No
|
|
/home/john/A/A/A/a.txt
|
1
|
a.txt
|
Yes
|
The rules "-" sign => exclusion.
|
More examples (rsync vs buvt)
(I hope these examples are right, I'm not all that good on how rsync rules work)
.rsync-filter (to the left) vs .buvt-filter (to the right)
+ /a/ +F a // Matching a folder (called "a") in the same directory as the filter-file is located (but not sub folders)
+ /b.txt + b.txt // Like above but matching a file (called "b.txt") (see also exception for soft links below)
+ /c* // Like above but using rsyncs own regular-expression
+G c* // ... using a glob-style regular-expression
+R c.* // ... using perl-style regular expression (interpreted by python)
- d/ -Fs d // Matching folders called "d" in the same directory as the filter-file is located AND sub folders
- /* -BR // Matching anything
+ /a/ +F a
+ /a/b/ +Fsr a/b
+ /a/b/c/ +Fsr a/b/c
+ /a/b/c/d/ +Fsr a/b/c/d
+ /a/b/c/d/e**
+BsrG a/b/c/d/e*
+BsrR a/b/c/d/e.*
- /** -BsG
Soft links (sym links) are treated as files (even soft links to folders)
A soft link is in fact a small file containing the path to the target.
Thus it is matched against "file"-patterns (even soft links to folders).
A soft link to a folder (being treated as a file) is not parsed (even though it matched by an include-rule).