UnicodeEncodeError In Python3 When Redirection Is Used
Solution 1:
Redirection causes an error because the default encoding used by Python does not support one of the characters you're trying to output. In your case you're trying to output the bullet character •
using the GBK codec. This probably means you're using a Chinese version of Windows.
A version of Python 3.6 or later will work fine outputting to the terminal window on Windows, because character encoding is bypassed completely using Unicode. It's only when redirecting the output to a file that the Unicode must be encoded to a byte stream.
You can set the environment variable PYTHONIOENCODING
to change the encoding used for stdio. If you use UTF-8 it will be guaranteed to work with any Unicode character.
set PYTHONIOENCODING=utf-8
pdf2txt.py file.pdf > output.txt
Solution 2:
You seem to have somehow obtained unicode characters from the raw bytes but you need to encode it. I recommend you to use UTF-8 encoding for txt files.
Making the encoding parameter more explicit is probably what you want.
def gbk_to_utf8(source, target):
with open(source, "r", encoding="gbk") as src:
with open(target, "w", encoding="utf-8") as dst:
for line in src.readlines():
dst.write(line)
Post a Comment for "UnicodeEncodeError In Python3 When Redirection Is Used"