Python - Compress Ascii String
Solution 1:
Using compression will not always reduce the length of a string!
Consider the following code;
import zlib
import bz2
defcomptest(s):
print'original length:', len(s)
print'zlib compressed length:', len(zlib.compress(s))
print'bz2 compressed length:', len(bz2.compress(s))
Let's try this on an empty string;
In [15]: comptest('')
original length: 0
zlib compressed length: 8
bz2 compressed length: 14
So zlib
produces an extra 8 characters, and bz2
14. Compression methods usually put a 'header' in front of the compressed data for use by the decompression program. This header increases the length of the output.
Let's test a single word;
In [16]: comptest('test')
original length: 4
zlib compressed length: 12
bz2 compressed length: 40
Even if you would substract the length of the header, the compression hasn't made the word shorter at all. That is because in this case there is little to compress. Most of the characters in the string occur only once. Now for a short sentence;
In [17]: comptest('This is a compression test of a short sentence.')
original length: 47
zlib compressed length: 52
bz2 compressed length: 73
Again the compression output is larger than the input text. Due to the limited length of the text, there is little repetition in it, so it won't compress well.
You need a fairly long block of text for compression to actually work;
In [22]: rings = '''
....: Three Rings for the Elven-kings under the sky,
....: Seven for the Dwarf-lords in their halls of stone,
....: Nine for Mortal Men doomed to die,
....: One for the Dark Lord on his dark throne
....: In the Land of Mordor where the Shadows lie.
....: One Ring to rule them all, One Ring to find them,
....: One Ring to bring them all andin the darkness bind them
....: In the Land of Mordor where the Shadows lie.'''In [23]: comptest(rings)
original length: 410
zlib compressed length: 205
bz2 compressed length: 248
Solution 2:
You don't even need you data to be ascii, you can feed zlib with anything
>>>import zlib>>>a='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'# + any binary data you want>>>print zlib.compress(a)
x�KL$
�
>>>
What you probably want here - compressed data to be ascii string? Am I right here? If so - you should know that you have very small alphabet to code compressed data => so you'd have more symbols used.
For example to code binary data in base64 (you will get ascii string) but you will use ~30% more space for that
Post a Comment for "Python - Compress Ascii String"