My archive-and-compression format with json compressed meta data

Intro

The actual files in archive-and-compression formats are compressed with pretty much established algorithms:

Text files (like txt, html ...) uses DEFLATE.
Images and other binary files are generally left untouched because they are already compressed with their own formats (jpeg, gif...).

What remains is the meta data. By using a standardized Data exchange format (like json) for the meta data, it becomes almost trivial to add your own meta data (or change the whole json-object totally).

Of course the other archive-and-compression formats (List_of_archive_formats) have (of what I can tell (I haven't checked them all)) custom formats that most likely use less space for the meta data. But today, the actual data is so big (images, videos ...) that the size of the meta-data becomes negligibly small.

So to summarize: I use:: established compression algorithms for the actual data; json for the meta data.

Format description


Name	Offset (bytes)	Size (bytes)	Description
strID	0	4	String to identify the file type (In my implementation: "mZip")
intVersion	4	2	Version number
intSerializationFormat	6	1	Serialization method of objMeta (0=json ...)
intSize	7	8	Archive size (total including headers and footers) (set to 0 if the size is unknown)
File 0	15
File 1
...
objMeta	iMeta	(archive size)-iMeta-8	Serialized object of all the meta data of the files
iMeta	(archive size)-8	8	objMeta start position

The binary fields (intVersion, intSize and iMeta) are big endian (My view on Big or little endian).

intSerializationFormat (How is objMeta serialized )

0: JSON (does not allow binary data)
1: ...

Several other Data exchange formats exist.

Using a binary format would allow for storing for example thumbnails.

Standard fields of objMeta

Exactly how objMeta should look is really a question of its own.

The structure that I use in my implementation (link below):

{
  IStart:[14, 564],                 // Pointers to the files.
  StrName:["oak.txt", "oak.jpg"],   // File names
  IntCompMethod:[1, 0],             // Compression method (like zip, see more below)
  IntTMod:[1474476071, 1474476071], // Last modification times (in unix time)
  IntSize:[2010, 4043],             // File sizes (uncompressed)
  IntCompSize:[550, 4043],          // File sizes (compressed)
  StrSha1:["fb4a947efc3d959858e32ee856b4011a7e01e4f6", "ff45ba2084497bc0cdf49dc0aac52816f3c48143"] // Sha1 hash codes 
}

Other fields that one might want use:

File permissions etc.
Thumbnails (A binary data exchange format would be recommended)
Newer and better hash codes tend to come every now and then.

IntCompMethod:

(like in the zip-format)

0: No compression
1: Deflate
...

My implementation (mZipper)

Last mod: 2024-11-01

Comments

		Sum/Sign			Diff
1	½y	/

SiteName:

All None

Parent name:

All None

Size:

100

300

10k

30k

100k

Public read access:

Public write access:

On sitemap:

Talk page:

Template:

Supplied by user:

Created:

¼h

¼y

Modification age:

¼h

¼y

TModCache:

¼h

¼y

NChild:

≥128

NImage:

≥128

IntPriority:

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

StrLang:

All None

Last Access:

¼h

¼y

nAccess:

100

300

10k

30k

100k

300k

NParent:

≥10

LastRev:

≥10

Page filter ()

SiteName:

All None

Parent name:

All None

Extension:

All None

Size:

100

300

10k

30k

100k

300k

Width:

100

300

1000

≥3000

Height:

100

300

1000

≥3000

Created:

¼h

¼y

Modification age:

¼h

¼y

Last Access:

¼h

¼y

nAccess:

100

300

10k

30k

100k

300k

Supplied by user:

NParent:

≥10

Image filter ()

My archive-and-compression format with json compressed meta data

Intro

Format description

intSerializationFormat (How is objMeta serialized )

Standard fields of objMeta

IntCompMethod:

My implementation (mZipper)

Upload Image: (max 1MB)

Rename

Set iso 639-1 language code

Set site of page(s)

Delete

Delete