《使用base64编码解决json序列化zlib压缩出现的报错》
正如标题那样,我这边用python requests把网页的数据爬取下来,因为网页的源码空间有些大,所有会考虑使用zlib进行压缩,但是json针对zlib的数据序列化时会报错.
文章写的不是很严谨,欢迎来喷,另外该文后续有更新的,请到原文地址查看更新。
下面是报错信息.
#blog: xiaorui.cc Traceback (most recent call last): File "/Library/Python/2.7/site-packages/tornado/web.py", line 1332, in _execute result = method(*self.path_args, **self.path_kwargs) File "mock_server.py", line 77, in get self.write(json.dumps(self.mock(data['id'], data, res))) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 243, in dumps return _default_encoder.encode(obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 207, in encode chunks = self.iterencode(o, _one_shot=True) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 270, in iterencode return _iterencode(o, 0) UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 1: invalid start byte
会爆出UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x9c in position 1: invalid start byte 的错误, 这问题一般是编码的问题,算是常见的. 我们再来细化可能问题的流程.
这问题实在是诡异,json.dumps的时候会把对象通过转成unicode进行转换成字符串,但数据被改成zlib后,是无法解析编码的.
解决的方法是加一层base64编码.
#blog: xiaorui.cc In [56]: json.dumps({'data':base64.b64encode(z)}) Out[56]: '{"data": "eJxtkttOGlEUhl/FTBqjgTCHDcwQGRpL8SwKWkO9mcyZQWCGYXOwV2prK42pbaw18dDUpG16UZW2RhOhPg17gCtfoYOioDbZ2cn+1vrnX7PW8ktavkeTWCytQ0XPpSUs0OOPUwHr7S/rsHRZXbd+H1h7Jat0gdbK9Z/f0fs1a/+gVjmt77xCW6/R5vJlddmP2wpbBgKoct44OrJWflxWd2wKbJpL2ldSC5Cu5tIK+ruJ3myg8of64W5zqXT9KdsFbRzXKt8a787Qxba1e4K+7tQrm9bnPT9uK6/klKtRfmltndQqZ9aXKlo/b26fdKLAhcr7jeMttHpaq3xCq3+aH4/Q4XY7Ab+qQQj4caH1sn854M+KpmbAwISuTvNmyiXnWYzABm6epixycNGQ70Ajq5uQxUAX4VX5ni6bS0JOtDsJ7wQUU0+3eNbGZAcLvLjQpl3JmmG7eFyA6qCsprJYKDHizQjzWaaQm5b4vAMfB7PP8NjM0CAjjIG5FIjpk+kEyOhhXpGnisKcPkYXg5Q0vhAumPlhZjEe51MG7TUlZvpFOMKPwYQn+FzLOaIwKgVTMzyxaCiFJ/qkMjFSLHgTKTUyH9F0ORPkM5ORSCwmOGbd0dBCDoQ8o0OCAL18TBx9Go6OFzKqOufA1egoP5U0CkWG7e5JkhN5SLHYAwa6WVJXudYmkm4PoGkP6QYur5sifdiAH29P62ZqM6HBaHDEpcqQ46U8p6UVvY9wEk67tU7s+nAU1v9Q+Kil6cM0e3Rx47HJkgztAz6CAlQvZLuMacrt6xWzrEh6CUGhBdorMYRAu3mKoSVGcfsEWaAIyvc/j3ZxUDe41n5wcZhKXlVHOMn+gXZU0CHUU/cSOuHWpl1ze91uaWcBb03/AQwMah0="}'
或者是改变序列化工具,使用msgpack替换json.
#blog : xiaorui.cc In [54]: msgpack.packb({'data':z}) Out[54]: '\x81\xa4data\xda\x02\x9fx\x9cm\x92\xdbN\x1aQ\x14\x86_\xc5L\x1a\xa3\x810\x87\r\xcc\x10\x19\x1aK\xf1,\nZC\xbd\x99\xcc\x99A`\x86as\xb0Wjk+\x8d\xa9m\xac5\xf1\xd0\xd4\xa4mzQ\x95\xb6F\x13\xa1>\r{\x80+_\xa1\x83\xa2\xa06\xd9\xd9\xc9\xfe\xd6\xfa\xe7_\xb3\xd6\xf2KZ\xbeG\x93X,\xadCE\xcf\xa5%,\xd0\xe3\x8fS\x01\xeb\xed/\xeb\xb0tY]\xb7~\x1fX{%\xabt\x81\xd6\xca\xf5\x9f\xdf\xd1\xfb5k\xff\xa0V9\xad\xef\xbcB[\xaf\xd1\xe6\xf2eu\xd9\x8f\xdb\n[\x06\x02\xa8r\xde8:\xb2V~\\Vwl\nl\x9aK\xdaWR\x0b\x90\xae\xe6\xd2\n\xfa\xbb\x89\xdel\xa0\xf2\x87\xfa\xe1ns\xa9t\xfd)\xdb\x05m\x1c\xd7*\xdf\x1a\xef\xce\xd0\xc5\xb6\xb5{\x82\xbe\xee\xd4+\x9b\xd6\xe7=?n+\xaf\xe4\x94\xabQ~im\x9d\xd4*g\xd6\x97*Z?on\x9ft\xa2\xc0\x85\xca\xfb\x8d\xe3-\xb4zZ\xab|B\xab\x7f\x9a\x1f\x8f\xd0\xe1v;\x01\xbf\xaaA\x08\xf8q\xa1\xf5\xb2\x7f9\xe0\xcf\x8a\xa6f\xc0\xc0\x84\xaeN\xf3f\xca%\xe7Y\x8c\xc0\x06n\x9e\xa6,rp\xd1\x90\xef@#\xab\x9b\x90\xc5@\x17\xe1U\xf9\x9e.\x9bKBN\xb4;\t\xef\x04\x14SO\xb7x\xd6\xc6d\x07\x0b\xbc\xb8\xd0\xa6]\xc9\x9aa\xbbx\\\x80\xea\xa0\xac\xa6\xb2X(1\xe2\xcd\x08\xf3Y\xa6\x90\x9b\x96\xf8\xbc\x03\x1f\x07\xb3\xcf\xf0\xd8\xcc\xd0 #\x8c\x81\xb9\x14\x88\xe9\x93\xe9\x04\xc8\xe8a^\x91\xa7\x8a\xc2\x9c>F\x17\x83\x944\xbe\x10.\x98\xf9af1\x1e\xe7S\x06\xed5%f\xfaE8\xc2\x8f\xc1\x84\'\xf8\\\xcb9\xa20*\x05S3<\xb1h(\x85\'\xfa\xa421R,x\x13)52\x1f\xd1t9\x13\xe43\x93\x91H,&8f\xdd\xd1\xd0B\x0e\x84<\xa3C\x82\x00\xbd|L\x1c}\x1a\x8e\x8e\x172\xaa:\xe7\xc0\xd5\xe8(?\x954\nE\x86\xed\xeeI\x92\x13yH\xb1\xd8\x03\x06\xbaYRW\xb9\xd6&\x92n\x0f\xa0i\x0f\xe9\x06.\xaf\x9b"}\xd8\x80\x1foO\xebfj3\xa1\xc1hp\xc4\xa5\xca\x90\xe3\xa5<\xa7\xa5\x15\xbd\x8fp\x12N\xbb\xb5N\xec\xfap\x14\xd6\xffP\xf8\xa8\xa5\xe9\xc34{tq\xe3\xb1\xc9\x92\x0c\xed\x03>\x82\x02T/d\xbb\x8ci\xca\xed\xeb\x15\xb3\xacHz\tA\xa1\x05\xda+1\x84@\xbby\x8a\xa1%Fq\xfb\x04Y\xa0\x08\xca\xf7?\x8fvqP7\xb8\xd6~pq\x98J^UG8\xc9\xfe\x81vT\xd0!\xd4S\xf7\x12:\xe1\xd6\xa6]s{\xddnig\x01oM\xff\x01\x0c\x0cj\x1d'
END. 如果不是web api的话,可以直接选用msgpack,因为如果你又是zlib,又是base64他的cpu消耗要比msgpack多的.