.. role:: py(code) :language: python :class: highlight The Format Specification Language ================================= ``bread`` reads binary data according to a *format specification*. Format specifications are just lists. Each element in the list is called a *field descriptor*. Field descriptors describe how to consume a piece of binary data from the input, usually to create a *field* in the resulting object. Field descriptors are consumed from the format specification one at a time until the entire list has been consumed. Field Descriptors ----------------- Most field descriptors will consume a certain amount of binary data and produce a value of a certain basic type. Integers ~~~~~~~~ ``intX(num_bits, signed)`` - the next ``num_bits`` bits represent an integer. If ``signed`` is ``True``, the integer is interpreted as a signed, twos-complement number. For convenience and improved readability, the following shorthands are defined: ==================== ================ ========== **Field Descriptor** **Width (Bits)** **Signed** -------------------- ---------------- ---------- ``bit`` 1 no ``semi_nibble`` 2 no ``nibble`` 4 no ``byte`` 8 no ``uint8`` 8 no ``uint16`` 16 no ``uint32`` 32 no ``uint64`` 64 no ``int8`` 8 yes ``int16`` 16 yes ``int32`` 32 yes ``int64`` 64 yes ==================== ================ ========== Strings ~~~~~~~ ``string(length, encoding)`` - the next ``length`` bytes represent a string of the given length. You can pick an encoding for the strings to encode and decode in; the default is ``utf-8``. Booleans ~~~~~~~~ ``boolean`` - the next bit represents a boolean value. 0 is ``False``, 1 is ``True`` Enumerations ~~~~~~~~~~~~ ``enum(length, values, default=None)`` - the next ``length`` bits represent one of a set of values, whose values are given by the dictionary ``values``. If ``default`` is specified, it will be returned if the bits do not correspond to any value in ``values``. Otherwise, it raises a ``ValueError``. Here is an example of a 2-bit field representing a card suit: :: import bread as b ("suit", b.enum(2, { 0: "diamonds", 1: "hearts", 2: "spades", 3: "clubs" })) Arrays ~~~~~~ ``array(count, field_or_struct)`` - the next piece of data is ``count`` occurrences of ``field_or_struct`` which, as the name might imply, can be either a field (including another array) or a format specification. Here's an example way of representing a deck of playing cards: :: import bread as b # A card is made up of a 2-bit suit and a 4-bit card number card = [ ("suit", b.enum(2, { 0: "diamonds", 1: "hearts", 2: "spades", 3: "clubs" })), ("number", b.intX(4))] # A deck consists of 52 cards, for a total of 312 bits or 39 bytes of data deck = [("cards", b.array(52, card))] Field Options ------------- A dictionary of field options can be specified as the last argument to any field. A dictionary of global field options can also be defined at the beginning of the format spec (before any fields). Options defined on fields override these global options. The following field options are defined: * ``str_format`` - the function that should be used to format a field in the structure's human-readable representation. For example: :: >>> import bread as b # Format spec without str_format ... >>> simple_spec = [('addr', b.uint8)] >>> parsed_data = b.parse(bytearray([42]), simple_spec) >>> print parsed_data addr: 42 # ... and with str_format >>> simple_spec_hex = [('addr', b.uint8, {"str_format": hex})] >>> parsed_data = b.parse(bytearray([42]), simple_spec_hex) >>> print parsed_data addr: 0x2a * ``endianness`` - for integer types, the endianness of the bytes that make up that integer. Can either be ``LITTLE_ENDIAN`` or ``BIG_ENDIAN``. Default is little-endian. A simple example: :: endianness_test = [ ("big_endian", b.uint32, {"endianness" : b.BIG_ENDIAN}), ("little_endian", b.uint32, {"endianness" : b.LITTLE_ENDIAN}), ("default_endian", b.uint32)] data = bytearray([0x01, 0x02, 0x03, 0x04] * 3) test = b.parse(data, endianness_test) >>> test.big_endian == 0x01020304 True >>> test.little_endian == 0x04030201 True >>> test.default_endian == test.little_endian True * ``offset`` - for integer types, the amount to add to the number after it has been parsed. Specifying a negative number will subtract that amount from the number. Conditionals ------------ Conditionals allow the format specification to branch based on the value of a previous field. Conditional field descriptors are specified as follows: :: (CONDITIONAL "field_name", options) where ``field_name`` is the name of the field whose value determines the course of the conditional, and ``options`` is a dictionary giving format specifications to evaluate based on the field's value. This is perhaps best illustrated by example: :: import bread as b # There are three kinds of widgets: type A, type B and type C. Each has # its own format spec. widget_A = [...] widget_B = [...] widget_C = [...] # A widget may be of any of the three types, determined by its type field widget = [ ("type", b.string(1)), (b.CONDITIONAL, "type", { "A": widget_A, "B": widget_B, "C": widget_C })] Padding ------- ``padding(num_bits)`` - indicates that the next ``num_bits`` bits should be ignored. Useful in situations where only the first few bits of a byte are meaningful, or where the format skips multiple bits or bytes.