My archive-and-compression format with json compressed meta data
Intro
The actual files in archive-and-compression formats are compressed with pretty much established algorithms:
- Text files (like txt, html ...) uses DEFLATE.
- Images and other binary files are generally left untouched because they are already compressed with their own formats (jpeg, gif...).
What remains is the meta data. By using a standardized
Data exchange format (like json) for the meta data, it becomes almost trivial to add your own meta data (or change the whole json-object totally).
Of course the other archive-and-compression formats (
List_of_archive_formats) have (of what I can tell (I haven't checked them all)) custom formats that most likely use less space for the meta data. But today, the actual data is so big (images, videos ...) that the size of the meta-data becomes negligibly small.
So to summarize: I use established compression algorithms and json for the meta data.
Format description
Name
|
Offset (bytes)
|
Size (bytes)
|
Description
|
strID
|
0
|
4
|
String to identify the file type (In my implementation: "mZip")
|
intVersion
|
4
|
2
|
Version number
|
intSerializationFormat
|
6
|
1
|
Serialization method of objMeta (0=json ...)
|
intSize
|
7
|
8
|
Archive size (total including headers and footers) (set to 0 if the size is unknown)
|
File 0
|
15
|
|
|
File 1
|
|
|
|
...
|
objMeta
|
iMeta
|
(archive size)-iMeta-8
|
Serialized object of all the meta data of the files
|
iMeta
|
(archive size)-8
|
8
|
objMeta start position
|
The binary fields (intVersion, intSize and iMeta) are big endian (My view on
Big or little endian).
intSerializationFormat (How is objMeta serialized )
- 0: JSON (does not allow binary data)
- 1: ...
Several other
Data exchange formats exist.
Using a binary format would allow for storing for example thumbnails.
Standard fields of objMeta
Exactly how objMeta should look is really a question of its own.
The structure that I use in my implementation (link below):
{
IStart:[14, 564], // Pointers to the files.
StrName:["oak.txt", "oak.jpg"], // File names
IntCompMethod:[1, 0], // Compression method (like zip, see more below)
IntTMod:[1474476071, 1474476071], // Last modification times (in unix time)
IntSize:[2010, 4043], // File sizes (uncompressed)
IntCompSize:[550, 4043], // File sizes (compressed)
StrSha1:["fb4a947efc3d959858e32ee856b4011a7e01e4f6", "ff45ba2084497bc0cdf49dc0aac52816f3c48143"] // Sha1 hash codes
}
Other fields that one might want use:
- File permissions etc.
- Thumbnails (A binary data exchange format would be recommended)
- Newer and better hash codes tend to come every now and then.
IntCompMethod:
(like in the zip-format)
- 0: No compression
- 1: Deflate
- ...
My implementation (mZipper)