提取 DOCX 注释-parsing-IT问答社区-解决你的IT疑问

提取 DOCX 注释

Louise Sørensen 1月前

我是一名教师。我想要一份对我布置的论文发表评论的所有学生的名单，以及他们说了什么。Drive API 对我来说太难了，但我想我可以将它们下载为......

我是一名教师。我想要一份对我布置的论文发表评论的所有学生的名单，以及他们说了什么。Drive API 对我来说太难了，但我想我可以将它们下载为 zip 文件并解析 XML。

评论用标签 w:comment 标记，其中 w:t 表示评论文本，表示评论内容。这应该很容易，但是 XML (etree) 让我很头疼。

通过教程（和官方 Python 文档）：

z = zipfile.ZipFile('test.docx')
x = z.read('word/comments.xml')
tree = etree.XML(x)

然后我这样做：

children = tree.getiterator()
for c in children:
    print(c.attrib)

结果是：

{}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}author': 'Joe Shmoe', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id': '1', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date': '2017-11-17T16:58:27Z'}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidR': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidDel': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidP': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidRDefault': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidRPr': '00000000'}
{}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': '0'}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': '0'}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': '0'}

此后，我完全陷入了困境。我试过了 element.get() ，但 element.findall() 没成功。即使我复制/粘贴值 ( '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val' )，我也会得到 None 回报。

有人可以帮忙吗？

帖子版权声明 1、本帖标题：提取 DOCX 注释
本站网址：http://xjnalaquan.com/
2、本网站的资源部分来源于网络，如有侵权，请联系站长进行删除处理。
3、会员发帖仅代表会员个人观点，并不代表本站赞同其观点和对其真实性负责。
4、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
5、站长邮箱：yeweds@126.com 除非注明，本帖由Louise Sørensen在本站《parsing》版块原创发布，转载请注明出处！

最新回复 (0)

最新倒序只看楼主

clrscr_clr 1月前 0 取消查看

引用 7楼
如果您还想要与评论相关的文本：
```
def get_document_comments(docxFileName):
       comments_dict = {}
       comments_of_dict = {}
       docx_zip = zipfile.ZipFile(docxFileName)
       comments_xml = docx_zip.read('word/comments.xml')
       comments_of_xml = docx_zip.read('word/document.xml')
       et_comments = etree.XML(comments_xml)
       et_comments_of = etree.XML(comments_of_xml)
       comments = et_comments.xpath('//w:comment', namespaces=ooXMLns)
       comments_of = et_comments_of.xpath('//w:commentRangeStart', namespaces=ooXMLns)
       for c in comments:
          comment = c.xpath('string(.)', namespaces=ooXMLns)
          comment_id = c.xpath('@w:id', namespaces=ooXMLns)[0]
          comments_dict[comment_id] = comment
       for c in comments_of:
          comments_of_id = c.xpath('@w:id', namespaces=ooXMLns)[0]
          parts = et_comments_of.xpath(
            "//w:r[preceding-sibling::w:commentRangeStart[@w:id=" + comments_of_id + "] and following-sibling::w:commentRangeEnd[@w:id=" + comments_of_id + "]]",
            namespaces=ooXMLns)
          comment_of = ''
          for part in parts:
             comment_of += part.xpath('string(.)', namespaces=ooXMLns)
             comments_of_dict[comments_of_id] = comment_of
        return comments_dict, comments_of_dict
```

关于作者

Louise Sørensen

UID:39291一级用户组

主题数
0

帖子数
0

精华数
0

注册排名
39291

导航

论坛

我的

提取 DOCX 注释

Louise Sørensen

TAG

作者主题

作者最近主题：