Python 基础之序列化模块-蒲公英云

序列化

概念

将原本的字典、列表等内容转换成一个字符串的过程就叫做序列化。如：将 Python 代码转为文本，方便移植，转化文本这个过程为序列化。

目的

以某种存储形式使自定义对象持久化；
转移对象，方便携带移植；
使程序更具有维护性。

json

使用 json 函数要先导入 json 函数库：import json

dump和dumps 序列化方法。

dump：必须传文件描述符，将序列化的文件保存在文件中。
dumps：把数据结构直接转化为 json 字符串形式（较为常用）。

将下列数据编码转化为json格式：

import json
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
json = json.dumps(data)
print(json, type(json))

结果：

{“a”: 1, “b”: 2, “c”: 3, “d”: 4, “e”: 5}

Process finished with exit code 0

可以看到输出的 json 为字符串对象，下面我们来对比一下 Python 中的数据结构转化 json 的形式为：

Python	json
dict	object
list, tuple	array
str, unicode	string
int, long, float	number
True	true
False	false
None	null

load和loads 反序列化方法

load：用于接收文件描述符，完成读取文件和反序列化。
loads：把 json 字符串形式直接转化为数据结构（反序列化，较为常用）。

import json

json_data = ‘{“a”: 1, “b”: 2, “c”: 3, “d”: 4, “e”: 5}’
content = json.loads(json_data)
print(content, type(content))

结果：

{‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}

Process finished with exit code 0

注意：字典引号，json 对象中字典的是双引号，否则报错。

反过来，我们来看看 json 类型转化为 Python 类型的对照表：

json	Python
object	dict
array	list
string	unicode
number (int)	int, long
number (real)	float
true	True
false	False
null	None

json与demjson

demjson 是 python 的第三方模块库，可用于编码和解码 json 数据，包含了 jsonLint 的格式化及校验功能。

Github 地址：https://github.com/dmeranda/demjson

官方地址：http://deron.meranda.us/python/demjson/

安装介绍：http://deron.meranda.us/python/demjson/install

encode和decode函数

encode：将 Python 对象编码成 json 字符串。
decode：将已编码的 json 字符串解码为 Python 对象

（实现代码如json，在这里不做演示。）

pickle

与 json 都是序列化模块，区别如下：

json：用于字符串和 python 数据类型间进行转换。
pickle：用于 python 特有的类型和 python 的数据类型间进行转换，pickle类型的数据为二进制。

pickle 中的方法和 json 相同，都有 load、loads、dump 和 dumps。下面演示 dumps 和 loads 用法：

import pickle
dict_data_one = {"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
pickle_data = pickle.dumps(dict_data_one)  # serialization
print(pickle_data)  # binary
dict_data_two = pickle.loads(pickle_data)  # deserialization
print(dict_data_two)

结果：

b’\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01K\x01X\x01\x00\x00\x00bq\x02K\x02X\x01\x00\x00\x00cq\x03K\x03X\x01\x00\x00\x00dq\x04K\x04X\x01\x00\x00\x00eq\x05K\x05u.’
{‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}

Process finished with exit code 0

下面展示 dump 和 load 用法：

import time
import pickle
structure_time_one = time.localtime()
print(structure_time_one)
f = open('pickle_file', 'wb')
pickle.dump(structure_time_one, f)
f.close()
f = open('pickle_file', 'rb')
structure_time_two = pickle.load(f)
print(structure_time_two.tm_year,
      structure_time_two.tm_mon,
      structure_time_two.tm_mday)

结果：

time.struct_time(tm_year=2018, tm_mon=8, tm_mday=13, tm_hour=21, tm_min=41, tm_sec=50, tm_wday=0, tm_yday=225, tm_isdst=0)
2018 8 13

Process finished with exit code 0

shelve

shelve 也是 python 提供给我们的序列化工具，比 pickle 用起来更简单一些。shelve 只提供给我们一个 open 方法，是用 key 来访问的，使用起来和字典类似。

import shelve
f = shelve.open('shelve_file')
'''operate on the file handle and store the data'''
f['key'] = {'int': 10, 'float': 9.5, 'string': 'Sample data'}
f.close()
f = shelve.open('shelve_file')
content = f['key']
f.close()
print(content)

结果：

{‘int’: 10, ‘float’: 9.5, ‘string’: ‘Sample data’}

Process finished with exit code 0

这个模块有个限制，它不支持多个应用同一时间往同一个 DB 进行写操作。所以当我们知道我们的应用如果只进行读操作，我们可以让shelve 通过只读方式打开 DB。

由于 shelve 在默认情况下是不会记录待持久化对象的任何修改的，所以我们在 shelve.open() 时候需要修改默认参数，否则对象的修改不会保存。

import shelve
f1 = shelve.open('shelve_file')
print(f1['key'])
f1['key']['new_value'] = 'this was not here before'
f1.close()
f2 = shelve.open('shelve_file', writeback=True)
print(f2['key'])
f2['key']['new_value'] = 'this was not here before'
f2.close()

writeback方式有优点也有缺点。优点是减少了我们出错的概率，并且让对象的持久化对用户更加的透明了；但这种方式并不是所有的情况下都需要，首先，使用 writeback 以后，shelf 在 open() 的时候会增加额外的内存消耗，并且当 DB 在 close() 的时候会将缓存中的每一个对象都写入到 DB，这也会带来额外的等待时间。因为 shelve 没有办法知道缓存中哪些对象修改了，哪些对象没有修改，因此所有的对象都会被写入。