Struct Packing and Unpacking

The module struct performs conversions between Python values and C structs represented as Python bytes objects.

This can be used in handling binary data stored in files or from network connections.

Format Specifiers

Packing converts multiple chars, integers, and strings into a single bytes object. In a sense, you can think of this process as printing those objects not into a screen but into a binary stream. Therefore, the packing function also needs a format string that will guide it so that each object may be "printed" correctly.

Here are the format specifiers:

b : char
B : unsigned char
h : short 
H : unsigned short
i : int
I : unsigned int
l : long
L : unsigned long
d : double
s : char[]

Packing: Objects to Bytes

Now, using the format specifiers above, we can pack multiple objects using struct.pack as follows.

>>> import struct
>>> data = struct.pack("iii",1,2,3)
>>> data
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'

Endian

Suppose we have a bytes object of length 4: [a, b, c, d].
Generally, the data in the network is represented with a big endian. So, we sometimes want to pack in using a different endian representation. Fortunately, the module struct allows us to specify which endian mode to be used when we pack objects.
<  little-endian
>  big-endian
Now, consider the following code:

>>> import struct
>>> data = struct.pack(">iii", 1, 2, 3)     ## Focus here on ">"
>>> data
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03'
As expected, we see the numbers have been encoded with a big endian method.

Packing a string

You can pack a string using the format specifier s. You also need to specify the length of a string in front. For example, if you want to pack a string "hello", you need to specify 5s.

>>> data = struct.pack(">BBHH5si", 1, 2, 3, 4, b"hello", 5)   ## Note: 5s for "hello" 
>>> data
b'\x01\x02\x00\x03\x00\x04hello\x00\x00\x00\x05'

Unpacking

You can unpack bytes object using function struct.unpack similary.

>>> import struct
>>> data = b'\x01\x02\x00\x03\x00\x04hello\x00\x00\x00\x05'
>>> unpacked = struct.unpack(">BBHH5si", data)
>>> unpacked
(1, 2, 3, 4, b'hello', 5)
>>> for i in range(len(unpacked)): print( type(unpacked[i]), ":", unpacked[i] )
...
<class 'int'> : 1
<class 'int'> : 2
<class 'int'> : 3
<class 'int'> : 4
<class 'bytes'> : b'hello'
<class 'int'> : 5

Activity

Pack the following IP address 10.3.2.4 into 4 bytes: The packed data should look as follows:
b'\n\x03\x02\x04'

Practice Question

Question

What is the output of the following code? Try to do it manually (with a calculator). You will see this kind of problem in the exam.
import struct
data = struct.pack("<H", 1000)
for d in data:
  print(f"{d:02x}", end=" ")

Answer

We walk the solutions step by step.
  1. "<" means little endian, and the format specifier H means unsigned short, which is 2 bytes long.
  2. First figure out a and b such that
     1000 = a*256 + b 
    We know that a = 3 and b = 232. This tells us that two 8-bit integers 3 and 232 are used to represent 1000.
  3. Now, we need to represent 3 and 232 with the hex format (do you remember what f"{d:02x}" means?). For 3, it's 0x03. For 232, we again need to figure out x and y such that
    232 = x*16 + y
    
    In the above, we have x=14 (0xe) and y=8 (0x8). This means, 232 = 0xe8.
  4. So far, we know 1000 = 0x03e8.
  5. The less than sign "<" means we need to use the little endian. So, the final answer is
    e8 03
    

Conversions: Int, Bytes, String, Hex String

Int → String

If you change a number into a string, you can simply call str() function.

>>> n = 10
>>> n
10
>>> s = str(n)
>>> s
'10'
Note that b[0] contains the ASCII code for character '1', and b[1] for character '0'.

Int → Bytes

Sometimes, you may want to pack a number into raw bytes. As we saw above, you can use struct.pack, or you can use to_bytes() function.

>>> n = 12345678901234567890
>>> b = n.to_bytes(10, 'big')
>>> b
b'\x00\x00\xabT\xa9\x8c\xeb\x1f\n\xd2'
  • The first argument to to_bytes() is telling it how many bytes to use for packing the number. In the above example, the number is encoded in 10 bytes.
  • The second argument is endian.

Int → Hex String

You may want encode a number into a hex string. In that case, you can use the hex() function,

>>> n = 12345678901234567890
>>> s = hex(n)
>>> s
'0xab54a98ceb1f0ad2'

Bytes → Int

Sometimes, you may want to decode raw bytes into a number. For this, you can use int.from_bytes() function.
  • The first argument to from_bytes is a bytes object.
  • The second argument is endian.

>>> b = b'\x00\x00\xabT\xa9\x8c\xeb\x1f\n\xd2'
>>> n = int.from_bytes(b, 'big')
>>> n
12345678901234567890

Bytes → String

You can call decode() function to decode a bytes object into a string.

>>> b = b'hello world'
>>> s = b.decode()
>>> s
'hello world'

Bytes → Hex String

You can use hex() function to convert a bytes object into a hex string.

>>> d =  b"\x01\xfa\x03"
>>> type(d)
<class 'bytes'>
>>> d.hex()
'01fa03'

String → Int

If you change a string into a number, you can simply call int() function.

>>> s = "1234567890"
>>> n = int(s)
>>> n
12334567890
>>> b = b"12334567890"
>>> n = int(b)
>>> n
12334567890

String → Bytes

In addition, you can use encode() function to convert a string into a bytes object.

>>> s = '10'
>>> b = s.encode()
>>> b
b'10'
>>> b[0], b[1]
(49, 48)

Hex String → Int

You can call int() function with an additional argument to convert a hex string into a number.

>>> s = '0xab54a98ceb1f0ad2'
>>> n = int(s, 16)
>>> n
12345678901234567890

Hex String → Bytes

You can call bytes.fromhex() to convert a hex string into a bytes object.

>>> s = '0xab54a98ceb1f0ad2'
>>> b = bytes.fromhex(s[2:])
>>> b
b'\xabT\xa9\x8c\xeb\x1f\n\xd2'
Note: The function fromhex() doesn't like "0x" prefix. So, using s[2:], we can skip that prefix.