Programming/Python | Posted by 알 수 없는 사용자 2009. 4. 9. 22:35

pickle in binary file

Pickler Protocols and cPickle

In recent Python releases, the pickler introduced the notion of protocolsstorage formats for pickled data. Specify the desired protocol by passing an extra parameter to the pickling calls (but not to unpickling calls: the protocol is automatically determined from the pickled data):

pickle.dump(object, file, protocol)

Pickled data may be created in either text or binary protocols. By default, the storage protocol is text (also known as protocol 0). In text mode, the files used to store pickled objects may be opened in text mode as in the earlier examples, and the pickled data is printable ASCII text, which can be read (it's essentially instructions for a stack machine).

The alternative protocols (protocols 1 and 2) store the pickled data in binary format and require that files be opened in binary mode (e.g., rb, wb). Protocol 1 is the original binary format; protocol 2, added in Python 2.3, has improved support for pickling of new-style classes. Binary format is slightly more efficient, but it cannot be inspected. An older option to pickling calls, the bin argument, has been subsumed by using a pickling protocol higher than 0. The pickle module also provides a HIGHEST_PROTOCOL variable that can be passed in to automatically select the maximum value.

One note: if you use the default text protocol, make sure you open pickle files in text mode later. On some platforms, opening text data in binary mode may cause unpickling errors due to line-end formats on Windows:

>>> f = open('temp', 'w')                  # text mode file on Windows
>>> pickle.dump(('ex', 'parrot'), f) # use default text protocol
>>> f.close( )
>>>
>>> pickle.load(open('temp', 'r')) # OK in text mode
('ex', 'parrot')
>>> pickle.load(open('temp', 'rb')) # fails in binary
Traceback (most recent call last):
File "<pyshell#337>", line 1, in -toplevel-
pickle.load(open('temp', 'rb'))
...lines deleted...
ValueError: insecure string pickle

One way to sidestep this potential issue is to always use binary mode for your files, even for the text pickle protocol. Since you must open files in binary mode for the binary pickler protocols anyhow (higher than the default 0), this isn't a bad habit to get into:

>>> f = open('temp', 'wb')                 # create in binary mode
>>> pickle.dump(('ex', 'parrot'), f) # use text protocol
>>> f.close( )
>>>
>>> pickle.load(open('temp', 'rb'))
('ex', 'parrot')
>>> pickle.load(open('temp', 'r'))
('ex', 'parrot')