Hashbert – A file verification tool that calculates hashcodes differentially.

Synchronizing (creating / updating) the hashcode-file

So assume you have a hierarchy of files such as an external hard drive used as back up.

Go to the root level of that hierarchy and run:

hashbert sync

A file called hashcodes.txt will be created, which has four columns (hashcode, modTime, filesize and filename). Ex:

d404401c8c6495b206fc35c95e55a6d5 1474896174 3 a.txt
bfcc9da4f2e1d313c63cd0a4ee7604e9 1474896180 3 dir1/b.txt

As files are added/changed/deleted to the file hierarchy, then you later again call hashbert sync and hashcodes.txt will be updated, although this time it will go much faster, since all files whose modTime and filesize hasn't changed are skipped.

Checking hashcodes

Later when you want to check the files, you run:

hashbert check

And all the files in hashcodes.txt have their hashcodes recalculated, to see if any files have gone corrupt.

How to obtain:

(Note, only tested on linux)

Download the files.
Compile with g++ -o hashbert hashbert.cc -lcrypto -std=gnu++17 -O3
- Note: the -lcrypto option requires an extra library to be installed:
  - On Debian 8.3 I needed to run sudo apt-get install libssl-dev
  - On Manjaro 17 I needed to install the boost-package.
Make sure the executable hashbert-file is found (is in a folder that is also in the execution path) (on Debian/Manjaro I move the file with the command: sudo cp hashbert /usr/local/bin/)

Usage

usage: hashbert <command> [<args>]

command:

sync: Synchronize (or create) hashcodes.txt.

check: Goes through hashcodes.txt row by row and recalculate the hashcode and reports any missmatch.

args:

--help Display link to this page

-f HASHCODEFILE Specify HASHCODEFILE (default hashcodes.txt)

-d DIRECTORY directory (containing the files that will end up in hashcodes.txt). (default: '.') If you use relative search paths (not starting with "/") then you must make sure you stand in the right directory when you run the program. (Open hashcodes.txt and see where the filenames start if you are uncertain.)

args (available for "check" only):

--start N Start on row N.

Example of how I personally use the program (in combination with rsync)

Other notes

Resume interrupted process

If the "sync" process becomes interupted, there will be a file with the suffix ".new.tmp" (hashcodes.txt.new.tmp) with the hashcodes calculated when the process was interupted. You can rename that to hashcodes.txt and restart the process and it will continue where it left of.