buvt, parsing and filter files

buvt-filter file format

Intro

The file is by default called ".buvt-filter"

It is placed throughout the source tree, to add rules (rule = "control string" + pattern (see below)). These rules instructs the parser on which parts of the tree (and which files) are to be included/excluded.

Parsing

As the program parses the tree, rules are added to the beginning of the "rule array" as they are found.

The first thing as a folder is entered, a ".buvt-filter"-file is looked for, and its rules are added to the beginning the "rule array"

The rules are kept in the array as sub folders are entered (although a rule may or may not be used in sub folders depending on control character 3 (see below)).

The rules are removed the last thing when the folder is exited (going back to the parent folder).


As the tree is parsed, each entry (folder/file) is matched to the current rule-array from start to end. When/if a match is found the remaining rules are skipped.

The default (if no rule matches) is that the entry is included.

Included folders are further examined, while excluded folders are skipped.


The target will be a subset of the source tree. (It is not possible to keep files on the target tree that are not found in the source tree.)


Check the source code of "[BUVT_FOLDER]/buvtPyScript/buvtPyScript.py" (TreeParser.getBranch()) for more details.

Notations

"rule row": A row in the .buvt-filter file, that is "trimmed" (surrounding white space removed)

"current folder": refers to the folder where the program is as it parses down the tree.

"rule folder": the folder in which a rule was declared. (The folder containing the ".buvt-filter"-file containing the rule.)

"pattern": the second part of each "rule row".

"candidate": the part (sub string) of the file- (folder-) path that is matched to the pattern (marked green in the example below).

Comments and empty rows

Empty rows and comments (rows starting with "#") are ignored.

Rule rows

Each rule consist of two parts: (Separated by the first space character.)

1) The first part consists of control characters for the rule

2) The second part consists of the actual pattern.


By default a file/folder is included (if none of the rules patterns matches).

Control string (control characters)

Summary
Char number Possible values ("_" means it can be left out if it is in the end of the control string)
1 #+-
2 fFB
3 _s
4 _r
5 _r

Character 3, 4 and 5 are converted to lower case before being interpreted.


The control characters are interpreted as below:

1st character

"#": comment (everything remaining on the line is ignored)

"+": include-rule: Files will be "synced". Folders will be entered, looking for more files.

"-": exclude-rule: Files will not be "synced". Folders will not be entered. On the target-side files/folders are deleted. (The target will be a subset of the source tree. (It is not possible to keep files on the target tree that are not found in the source tree))

Note soft links (sym links) are treated as files (even soft links to folders) (see more below)

2nd character

"f": the pattern is used on files (but not folders) (Default)

"F": the pattern is used on folders (but not files)

"B": the pattern is used on Both files and folders

Note soft links (sym links) are treated as files (even soft links to folders) (see more below)

3rd character (Use rule in sub folders)

"_" (default): The rule (pattern) IS NOT used in sub folders (is ignored in sub folders).

"s": The rule (pattern) IS used in sub folders (as well as in the "rule folder")

4th character (Starting point of the "candidate string")

"_" (default): the starting point is in the "current folder". (where the algorithm is) (See the green text in the example below.)

"r": the starting point is in the "Rule folder". (where the rule was added) (See the green text in the example below.)

Example:

Consider the following file tree with four files:

/home/john/A/┬A/───────────────┬A/────────┐
             ├.buvt-filter     └a.txt     └a.txt
             └a.txt
Assume we have two rules in /home/john/A/.buvt-filter
+fsr A/a.txt
-fs a.txt
The program loops over the files in an outer loop and the rules in an inner loop:

fileCounter File path (candidate string in green) ruleCounter Pattern Match Comment Included
0 /home/john/A/.buvt-filter 0 A/a.txt No Yes
0 /home/john/A/.buvt-filter 1 a.txt No Included by default
1 /home/john/A/a.txt 0 A/a.txt No No
1 /home/john/A/a.txt 1 a.txt Yes Excluded since the rule has a "-" sign.
2 /home/john/A/A/a.txt 0 A/a.txt Yes Included since the rule has a "+" sign. Yes
2 /home/john/A/A/a.txt (1) a.txt (Yes) Not tested (as the algorithm bails out after the first match)
3 /home/john/A/A/A/a.txt 0 A/a.txt No No
3 /home/john/A/A/A/a.txt 1 a.txt Yes Excluded since the rule has a "-" sign.

5th character (Regular expression)

"_" (default): the pattern is a normal string.

"r": the pattern is a regular expression (javascript).

Default characters at the end of the control characters can be left out

More examples (rsync vs buvt)

(I hope these examples are right, I'm not all that good on how rsync rules work)

.rsync-filter (to the left) vs .buvt-filter (to the right)

+ /a/            +F a         // Matching a folder (called "a") in the same directory as the filter-file is located (but not sub folders)
+ /b.txt         +f b.txt     // Like above but matching a file (called "b.txt") (see also exception for soft links below)
+ /c*            +f__r c.*    // Like above but using a glob-regular-expression (in case of rsync) or regular expression (in case of buvt)
- d/             -Fs d        // Matching folders called "d" in the same directory as the filter-file is located AND sub folders
- /*             -B__r        // Matching anything
+ /a/              +F a
+ /a/b/            +Fsr a/b
+ /a/b/c/          +Fsr a/b/c
+ /a/b/c/d/        +Fsr a/b/c/d
+ /a/b/c/d/e**     +Bsrr a/b/c/d/e.*
- /**              -Bs_r

Soft links (sym links) are treated as files (even soft links to folders)

A soft link is in fact a small file containing the path to the target.

Thus it is matched against "file"-patterns (even soft links to folders).

A soft link to a folder (being treated as a file) is not parsed (even though it matched by an include-rule).