Struct Packing and Unpacking
The module struct
performs conversions between Python values and C structs represented as Python
bytes objects.
This can be used in handling binary data stored in files or from network
connections.
Format Specifiers
Packing converts multiple chars, integers, and strings into a single bytes
object. In a sense, you can think of this process as printing those
objects not into a screen but into a binary stream. Therefore, the packing
function also needs a format string that will guide it so that each
object may be "printed" correctly.
Here are the format specifiers:
b : char
B : unsigned char
h : short
H : unsigned short
i : int
I : unsigned int
l : long
L : unsigned long
d : double
s : char[]
Packing: Objects to Bytes
Now, using the format specifiers above, we can pack multiple objects using
struct.pack as follows.
>>> import struct
>>> data = struct.pack("iii",1,2,3)
>>> data
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
- In
struct.pack("iii",1,2,3), the format string "iii" means packing 3 integers.
- The first four bytes in the packed bytes is
b"\x01\x00\x00\x00, which encodes 1. We see that the integer
1 has been encoded with the little endian.
- Likewise, the 2nd and 3rd four bytes are
b"x02\x00\x00\x00" and b"\x03\x00\x00\x00".
Endian
Suppose we have a bytes object of length 4: [a, b, c, d].
- Big Endian: the four bytes would represent a number a*256^3 + b*256^2
+ c*256 + d.
- Little Endian: the four bytes would represent a number d*256^3 + c*256^2
+ b*256 + a.
Generally, the data in the network is represented with a big endian. So, we
sometimes want to pack in using a different endian representation. Fortunately,
the module struct allows us to specify which endian mode to be
used when we pack objects.
< little-endian
> big-endian
Now, consider the following code:
>>> import struct
>>> data = struct.pack(">iii", 1, 2, 3) ## Focus here on ">"
>>> data
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03'
As expected, we see the numbers have been encoded with a big endian method.
Packing a string
You can pack a string using the format specifier s. You also need
to specify the length of a string in front. For example, if you want to pack
a string "hello", you need to specify 5s.
>>> data = struct.pack(">BBHH5si", 1, 2, 3, 4, b"hello", 5) ## Note: 5s for "hello"
>>> data
b'\x01\x02\x00\x03\x00\x04hello\x00\x00\x00\x05'
Unpacking
You can unpack bytes object using function struct.unpack similary.
>>> import struct
>>> data = b'\x01\x02\x00\x03\x00\x04hello\x00\x00\x00\x05'
>>> unpacked = struct.unpack(">BBHH5si", data)
>>> unpacked
(1, 2, 3, 4, b'hello', 5)
>>> for i in range(len(unpacked)): print( type(unpacked[i]), ":", unpacked[i] )
...
<class 'int'> : 1
<class 'int'> : 2
<class 'int'> : 3
<class 'int'> : 4
<class 'bytes'> : b'hello'
<class 'int'> : 5
Activity
Pack the following IP address 10.3.2.4 into 4 bytes:
- Using "BBBB" specifier.
- This time, using "i" specifier.
The packed data should look as follows:
b'\n\x03\x02\x04'
Practice Question
Question
What is the output of the following code? Try to do it manually
(with a calculator). You will see this kind of problem in the exam.
import struct
data = struct.pack("<H", 1000)
for d in data:
print(f"{d:02x}", end=" ")
Answer
We walk the solutions step by step.
- "<" means little endian, and the format specifier H means unsigned
short, which is 2 bytes long.
- First figure out a and b such that
1000 = a*256 + b
We know that a = 3 and b = 232.
This tells us that two 8-bit integers 3 and 232 are used to represent 1000.
- Now, we need to represent 3 and 232 with the hex format (do you remember
what f"{d:02x}" means?). For 3, it's 0x03. For 232, we again need to figure out
x and y such that
232 = x*16 + y
In the above, we have x=14 (0xe) and y=8 (0x8). This means, 232 = 0xe8.
- So far, we know 1000 = 0x03e8.
- The less than sign "<" means we need to use the little endian. So,
the final answer is
e8 03
Conversions: Int, Bytes, String, Hex String
Int → String
If you change a number into a string, you can simply call str()
function.
>>> n = 10
>>> n
10
>>> s = str(n)
>>> s
'10'
Note that b[0] contains the ASCII code for character '1', and b[1]
for character '0'.
|
Int → Bytes
Sometimes, you may want to pack a number into raw bytes. As we saw above,
you can use struct.pack, or you can use to_bytes()
function.
>>> n = 12345678901234567890
>>> b = n.to_bytes(10, 'big')
>>> b
b'\x00\x00\xabT\xa9\x8c\xeb\x1f\n\xd2'
-
The first argument to
to_bytes() is telling it how many bytes to use
for packing the number. In the above example, the number is encoded in 10
bytes.
-
The second argument is endian.
|
Int → Hex String
You may want encode a number into a hex string. In that case, you can use
the hex() function,
>>> n = 12345678901234567890
>>> s = hex(n)
>>> s
'0xab54a98ceb1f0ad2'
|
Bytes → Int
Sometimes, you may want to decode raw bytes into a number. For this, you can
use int.from_bytes() function.
-
The first argument to
from_bytes is a bytes object.
-
The second argument is endian.
>>> b = b'\x00\x00\xabT\xa9\x8c\xeb\x1f\n\xd2'
>>> n = int.from_bytes(b, 'big')
>>> n
12345678901234567890
|
Bytes → String
You can call decode() function to decode a bytes object into
a string.
>>> b = b'hello world'
>>> s = b.decode()
>>> s
'hello world'
|
Bytes → Hex String
You can use hex() function to convert a bytes object into a hex
string.
>>> d = b"\x01\xfa\x03"
>>> type(d)
<class 'bytes'>
>>> d.hex()
'01fa03'
|
String → Int
If you change a string into a number, you can simply call int()
function.
>>> s = "1234567890"
>>> n = int(s)
>>> n
12334567890
>>> b = b"12334567890"
>>> n = int(b)
>>> n
12334567890
|
String → Bytes
In addition, you can use encode() function to convert
a string into a bytes object.
>>> s = '10'
>>> b = s.encode()
>>> b
b'10'
>>> b[0], b[1]
(49, 48)
|
Hex String → Int
You can call int() function with an additional argument to convert
a hex string into a number.
>>> s = '0xab54a98ceb1f0ad2'
>>> n = int(s, 16)
>>> n
12345678901234567890
|
Hex String → Bytes
You can call bytes.fromhex() to convert
a hex string into a bytes object.
>>> s = '0xab54a98ceb1f0ad2'
>>> b = bytes.fromhex(s[2:])
>>> b
b'\xabT\xa9\x8c\xeb\x1f\n\xd2'
Note: The function fromhex() doesn't like "0x" prefix. So, using
s[2:], we can skip that prefix.
|