在 Python 3 中将字节转换为字符串-string-IT问答社区-解决你的IT疑问

在 Python 3 中将字节转换为字符串

Jaromanda X 1月前

我将外部程序的标准输出捕获到字节对象中：>>> from subprocess import *>>> stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]>>>

我将外部程序的标准输出捕获到一个 bytes 对象中：

>>> from subprocess import *
>>> stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

我想将其转换为普通的 Python 字符串，以便可以像这样打印它：

>>> print(stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

如何 bytes 使用 Python 3 str 对象转换为

_{See 在 Python 3 中将字符串转换为字节的最佳方法？ for the other way around.}

帖子版权声明 1、本帖标题：在 Python 3 中将字节转换为字符串
本站网址：http://xjnalaquan.com/
2、本网站的资源部分来源于网络，如有侵权，请联系站长进行删除处理。
3、会员发帖仅代表会员个人观点，并不代表本站赞同其观点和对其真实性负责。
4、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
5、站长邮箱：yeweds@126.com 除非注明，本帖由Jaromanda X在本站《string》版块原创发布，转载请注明出处！

最新回复 (0)

最新倒序只看楼主

Birendra Singh 1月前 0 取消查看

引用 14楼
如果您不知道编码，那么要以 Python 3 和 Python 2 兼容的方式将二进制输入读入字符串，请使用古老的 MS-DOS CP437 编码：
```
PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))
```
由于编码未知，因此需要将非英语符号转换为字符 cp437 （英语字符不会被转换，因为它们与大多数单字节编码和 UTF-8 匹配）。

将任意二进制输入解码为 UTF-8 是不安全的，因为您可能会得到以下结果：
```
>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte
```
同样的情况也发生 latin-1 在 Python 2 中，它是流行的（默认的？）。请参阅代码页布局 - 这是 Python 因臭名昭著的 ordinal not in range .

更新 20150604 ：有传言称 Python 3 具有 surrogateescape 将内容编码为二进制数据而不会导致数据丢失和崩溃的错误策略，但它需要转换测试 [binary] -> [str] -> [binary] 来验证性能和可靠性。

更新 20170116 ：感谢 Nearoo 的评论 - 还可以使用 backslashreplace 错误处理程序对所有未知字节进行斜线转义。这仅适用于 Python 3，因此即使使用此解决方法，您仍会从不同的 Python 版本获得不一致的输出：
```
PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))
```
请参阅 Python 的 Unicode 支持 。

更新 20170119 ：我决定实现适用于 Python 2 和 Python 3 的斜线转义解码。它应该比 cp437 解决方案慢，但它应该 在每个 Python 版本上 相同的结果
```
# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))
```

关于作者

Jaromanda X

UID:1178一级用户组

主题数
0

帖子数
0

精华数
0

注册排名
1178

导航

论坛

我的

在 Python 3 中将字节转换为字符串

Jaromanda X

TAG

作者主题

作者最近主题：