========================================== Blockfile and Hosts Database Specification ========================================== Category: Formats Last updated: 2023-11 Accurate for: 0.9.59 Overview ======== This document specifies the I2P blockfile file format and the tables in the hostsdb.blockfile used by the Blockfile Naming Service [NAMING]. The blockfile provides fast Destination lookup in a compact format. While the blockfile page overhead is substantial, the destinations are stored in binary rather than in Base 64 as in the hosts.txt format. In addition, the blockfile provides the capability of arbitrary metadata storage (such as added date, source, and comments) for each entry. The metadata may be used in the future to provide advanced addressbook features. The blockfile storage requirement is a modest increase over the hosts.txt format, and the blockfile provides approximately 10x reduction in lookup times. A blockfile is simply on-disk storage of multiple sorted maps (key-value pairs), implemented as skiplists. The blockfile format is adopted from the Metanotion Blockfile Database [METANOTION]. First we will define the file format, then the use of that format by the BlockfileNamingService. Blockfile Format ================ The original blockfile spec was modified to add magic numbers to each page. The file is structured in 1024-byte pages. Pages are numbered starting from 1. The "superblock" is always at page 1, i.e. starting at byte 0 in the file. The metaindex skiplist is always at page 2, i.e. starting at byte 1024 in the file. All 2-byte integer values are unsigned. All 4-byte integer values (page numbers) are signed and negative values are illegal. All integer values are stored in network byte order (big endian). The database is designed to be opened and accessed by a single thread. The BlockfileNamingService provides synchronization. Superblock format: Byte Contents 0-5 Magic number 0x3141de493250 ("1A" 0xde "I2P") 6 Major version 0x01 7 Minor version 0x02 8-15 File length Total length in bytes 16-19 First free list page 20-21 Mounted flag 0x01 = yes 22-23 Span size Max number of key/value pairs per span (16 for hostsdb) Used for new skip lists. 24-27 Page size As of version 1.2. Prior to 1.2, 1024 is assumed. 28-1023 unused Skip list block page format: Byte Contents 0-7 Magic number 0x536b69704c697374 "SkipList" 8-11 First span page 12-15 First level page 16-19 Size (total number of keys - may only be valid at startup) 20-23 Spans (total number of spans - may only be valid at startup) 24-27 Levels (total number of levels - may only be valid at startup) 28-29 Span size - As of version 1.2. Max number of key/value pairs per span. Prior to that, specified for all skiplists in the superblock. Used for new spans in this skip list. 30-1023 unused Skip level block page format is as follows. All levels have a span. Not all spans have levels. Byte Contents 0-7 Magic number 0x42534c6576656c73 "BSLevels" 8-9 Max height 10-11 Current height 12-15 Span page 16- Next level pages ('current height' entries, 4 bytes each, lowest first) remaining bytes unused Skip span block page format is as follows. Key/value structures are sorted by key within each span and across all spans. Key/value structures are sorted by key within each span. Spans other than the first span may not be empty. Byte Contents 0-3 Magic number 0x5370616e "Span" 4-7 First continuation page or 0 8-11 Previous span page or 0 12-15 Next span page or 0 16-17 Max keys (16 for hostsdb) 18-19 Size (current number of keys) 20-1023 key/value structures Span Continuation block page format: Byte Contents 0-3 Magic number 0x434f4e54 "CONT" 4-7 Next continuation page or 0 8-1023 key/value structures Key/value structure format is as follows. Key and value lengths must not be split across pages, i.e. all 4 bytes must be on the same page. If there is not enough room the last 1-3 bytes of a page are unused and the lengths will be at offset 8 in the continuation page. Key and value data may be split across pages. Max key and value lengths are 65535 bytes. Byte Contents 0-1 key length in bytes 2-3 value length in bytes 4- key data value data Free list block page format: Byte Contents 0-7 Magic number 0x2366724c69737423 "#frList#" 8-11 Next free list block or 0 if none 12-15 Number of valid free pages in this block (0 - 252) 16-1023 Free pages (4 bytes each), only the first (valid number) are valid Free page block format: Byte Contents 0-7 Magic number 0x7e2146524545217e "~!FREE!~" 8-1023 unused The metaindex (located at page 2) is a mapping of US-ASCII strings to 4-byte integers. The key is the name of the skiplist and the value is the page index of the skiplist. Blockfile Naming Service Tables =============================== The tables created and used by the BlockfileNamingService are as follows. The maximum number of entries per span is 16. Properties Skiplist ------------------- "%%__INFO__%%" is the main database skiplist with String/Properties key/value entries containing only one entry: info a Properties (UTF-8 String/String Map), serialized as a [Mapping] : version "4" created Java long time (ms) upgraded Java long time (ms) (as of database version 2) lists Comma-separated list of host databases, to be searched in-order for lookups. Almost always "privatehosts.txt,userhosts.txt,hosts.txt". listversion_* The version of each database in lists, for example: listversion_hosts.txt=4. Used to identify partial or aborted upgrade of individual lists. (as of database version 4) Reverse Lookup Skiplist ----------------------- "%%__REVERSE__%%" is the reverse lookup skiplist with Integer/Properties key/value entries (as of database version 2): * The skiplist keys are 4-byte Integers, the first 4 bytes of the hash of the [Destination]. * The skiplist values are each a Properties (a UTF-8 String/String Map) serialized as a [Mapping] * There may be multiple entries in the properties, each one is a reverse mapping, as there may be more than one hostname for a given destination, or there could be collisions with the same first 4 bytes of the hash. * Each property key is a hostname. * Each property value is the empty string. hosts.txt, userhosts.txt, and privatehosts.txt Skiplists -------------------------------------------------------- For each host database, there is a skiplist containing the hosts for that database. Note that the version 4 format supports multiple Destinations per hostname. This format was introduced in I2P release 0.9.26. Version 3 databases are automatically migrated to verrsion 4. The keys/values in these skiplists are as follows: key a UTF-8 String (the hostname) value Database version 4: A DestEntry, which is: A one-byte number of Properties/Destination pairs to follow That number of pairs of: A Properties (a UTF-8 String/String Map) serialized as a [Mapping] followed by a binary [Destination] (serialized as usual). Database version 3: a DestEntry, which is a Properties (a UTF-8 String/String Map) serialized as a [Mapping] followed by a binary [Destination] (serialized as usual). The DestEntry Properties typically contains: "a" The time added (Java long time in ms) "m" The time last modified (Java long time in ms) "notes" User-supplied comments "s" The original source of the entry (typically a file name or subscription URL) "v" If the signature of the entry was verified, "true" or "false" Hostname keys are stored in lower-case and always end in ".i2p". References ========== [Destination] http://i2p.net./spec/common-structures#struct-destination [Mapping] http://i2p.net./spec/common-structures#type-mapping [METANOTION] http://www.metanotion.net/software/sandbox/block.html [NAMING] http://i2p.net./en/docs/naming