The Format Specification Language¶
bread
reads binary data according to a format specification. Format
specifications are just lists. Each element in the list is called a field
descriptor. Field descriptors describe how to consume a piece of binary data
from the input, usually to create a field in the resulting object. Field
descriptors are consumed from the format specification one at a time until the
entire list has been consumed.
Field Descriptors¶
Most field descriptors will consume a certain amount of binary data and produce a value of a certain basic type.
Integers¶
intX(num_bits, signed)
- the next num_bits
bits represent an integer. If
signed
is True
, the integer is interpreted as a signed, twos-complement
number.
For convenience and improved readability, the following shorthands are defined:
Field Descriptor | Width (Bits) | Signed |
bit |
1 | no |
semi_nibble |
2 | no |
nibble |
4 | no |
byte |
8 | no |
uint8 |
8 | no |
uint16 |
16 | no |
uint32 |
32 | no |
uint64 |
64 | no |
int8 |
8 | yes |
int16 |
16 | yes |
int32 |
32 | yes |
int64 |
64 | yes |
Strings¶
string(length, encoding)
- the next length
bytes represent a string of the given length. You can pick an encoding for the strings to encode and decode in; the default is utf-8
.
Booleans¶
boolean
- the next bit represents a boolean value. 0 is False
, 1 is True
Enumerations¶
enum(length, values, default=None)
- the next length
bits represent one
of a set of values, whose values are given by the dictionary values
. If
default
is specified, it will be returned if the bits do not correspond to
any value in values
. Otherwise, it raises a ValueError
.
Here is an example of a 2-bit field representing a card suit:
import bread as b
("suit", b.enum(2, {
0: "diamonds",
1: "hearts",
2: "spades",
3: "clubs"
}))
Arrays¶
array(count, field_or_struct)
- the next piece of data is count
occurrences of field_or_struct
which, as the name might imply, can be
either a field (including another array) or a format specification.
Here’s an example way of representing a deck of playing cards:
import bread as b
# A card is made up of a 2-bit suit and a 4-bit card number
card = [
("suit", b.enum(2, {
0: "diamonds",
1: "hearts",
2: "spades",
3: "clubs"
})),
("number", b.intX(4))]
# A deck consists of 52 cards, for a total of 312 bits or 39 bytes of data
deck = [("cards", b.array(52, card))]
Field Options¶
A dictionary of field options can be specified as the last argument to any field. A dictionary of global field options can also be defined at the beginning of the format spec (before any fields). Options defined on fields override these global options.
The following field options are defined:
str_format
- the function that should be used to format a field in the structure’s human-readable representation. For example:>>> import bread as b # Format spec without str_format ... >>> simple_spec = [('addr', b.uint8)] >>> parsed_data = b.parse(bytearray([42]), simple_spec) >>> print parsed_data addr: 42 # ... and with str_format >>> simple_spec_hex = [('addr', b.uint8, {"str_format": hex})] >>> parsed_data = b.parse(bytearray([42]), simple_spec_hex) >>> print parsed_data addr: 0x2a
endianness
- for integer types, the endianness of the bytes that make up that integer. Can either beLITTLE_ENDIAN
orBIG_ENDIAN
. Default is little-endian.A simple example:
endianness_test = [ ("big_endian", b.uint32, {"endianness" : b.BIG_ENDIAN}), ("little_endian", b.uint32, {"endianness" : b.LITTLE_ENDIAN}), ("default_endian", b.uint32)] data = bytearray([0x01, 0x02, 0x03, 0x04] * 3) test = b.parse(data, endianness_test) >>> test.big_endian == 0x01020304 True >>> test.little_endian == 0x04030201 True >>> test.default_endian == test.little_endian True
offset
- for integer types, the amount to add to the number after it has been parsed. Specifying a negative number will subtract that amount from the number.
Conditionals¶
Conditionals allow the format specification to branch based on the value of a previous field. Conditional field descriptors are specified as follows:
(CONDITIONAL "field_name", options)
where field_name
is the name of the field whose value determines the course
of the conditional, and options
is a dictionary giving format
specifications to evaluate based on the field’s value.
This is perhaps best illustrated by example:
import bread as b
# There are three kinds of widgets: type A, type B and type C. Each has
# its own format spec.
widget_A = [...]
widget_B = [...]
widget_C = [...]
# A widget may be of any of the three types, determined by its type field
widget = [
("type", b.string(1)),
(b.CONDITIONAL, "type", {
"A": widget_A,
"B": widget_B,
"C": widget_C
})]
Padding¶
padding(num_bits)
- indicates that the next num_bits
bits should be
ignored. Useful in situations where only the first few bits of a byte are
meaningful, or where the format skips multiple bits or bytes.