Pickling in python

DhineshSunder Ganapathi
Nerd For Tech
Published in
3 min readApr 3, 2022

--

From Python Documentation, Pickling is the process whereby a Python object hierarchy is converted into a byte stream, and Un-pickling is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

Pickling and Un-pickling is alternatively known as “serialization”, “marshalling,” or “flattening” Also, note that

Warning : The pickle module is not secure. Only un-pickle data you trust.

It is possible to construct malicious pickle data which will execute arbitrary code during un-pickling. Never un-pickle data that could have come from an untrusted source, or that could have been tampered with.

In this article let’s dive deep into pickling concepts

Photo by Mika Baumeister on Unsplash

Introduction to serialisation

Serialising an object turns it into a linear stream of bytes. This can be done to save an object on a file, or to transmit it to another process. The byte stream can be deserialised (un-marshalled) to reconstruct the original object.

The most common way to serialise Python objects is called pickling. Python can also use JSON and XML for serialisation.

GENERAL PROCESS

Python values and most built-in objects can be pickled, including user-defined classes at the top level of a module. Recursive and interconnected objects can also be pickled. However, generators, lambda functions, database connections, and threads, are a few things that cannot be pickled.

Pickling and un-pickling methods

To pickle or un-pickle objects, you first have to import its module. This can be done by the import pickle command. Now, you can use the following methods:

  • pickle.dump(object, file) saves object onto the file, which must be opened in wb (write binary) mode.
  • variable = pickle.load(file) reconstructs the object previously written to file, which must be opened in rb (read binary) mode.
  • str = pickle.dumps(object) saves object into the str variable as a string.
  • object = pickle.loads(str) reconstructs the object previously written to the string str.

The following figure will help you better understand the functionality of these methods.

import os
import pickle
x = {'a': 1, 'b': 2}
y = [x, 3, x]
x['c'] = y
print('x {} \nBefore pickling: {}'.format(x, y))
fi = open('ptest', 'wb')
pickle.dump(y, fi)
fi.close()
fi = open('ptest', 'rb')
z = pickle.load(fi)
print('After pickling:', z)

Advantages of using Pickle Module:

  1. Recursive objects (objects containing references to themselves): Pickle keeps track of the objects it has already serialised, so later references to the same object won’t be serialised again.
  2. Object sharing (references to the same object in different places): This is similar to self- referencing objects; pickle stores the object once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects.
  3. User-defined classes and their instances: Marshal does not support these at all, but pickle can save and restore class instances transparently. The class definition must be importable and live in the same module as when the object was stored.

As I said earlier, Pickling is not secure. It can contain code objects and data that can attack your system. Make sure that anything you un-pickle comes from a trusted source and has not been tampered with in transit. Until next time Adios!

--

--

DhineshSunder Ganapathi
Nerd For Tech

Data Engineer, Published Author, Book Worm, Tech Blogger, Intrigued to learn new things