如何从转置格式将 .txt 文件读入 pandas DataFrame-file-IT问答社区-解决你的IT疑问

如何从转置格式将 .txt 文件读入 pandas DataFrame

Gabriele Gatti 1月前

我正在尝试将数据集读入 pandas 数据框。数据集当前位于 .txt 文件中，其内容如下：name: hello_worldrating: 5description: basic programname: python

我正在尝试将数据集读入 pandas 数据框。数据集当前位于 .txt 文件中，如下所示：

name: hello_world
rating: 5
description: basic program

name: python
rating: 10
description: programming language

如您所见，每行开头都是列名，后面跟着数据。数据框的不同行由一条额外的线分隔。有没有一种简单的方法可以将这种类型的文件读入 pandas，还是我必须手动完成？

谢谢！

编辑：感谢大家的帮助。看来答案是，是的，你必须手动完成。我在下面发布了我手动执行的方法，但我相信还有其他更有效的方法。

帖子版权声明 1、本帖标题：如何从转置格式将 .txt 文件读入 pandas DataFrame
本站网址：http://xjnalaquan.com/
2、本网站的资源部分来源于网络，如有侵权，请联系站长进行删除处理。
3、会员发帖仅代表会员个人观点，并不代表本站赞同其观点和对其真实性负责。
4、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
5、站长邮箱：yeweds@126.com 除非注明，本帖由Gabriele Gatti在本站《file》版块原创发布，转载请注明出处！

最新回复 (0)

最新倒序只看楼主

M.S.WebFustion 1月前 0 只看Ta

引用 2楼

我有一个解决方案……只需要花点时间在笔记本电脑上写下来。我会尝试过一段时间再发布。
Roy Rodney 1月前 0 只看Ta

引用 3楼

我认为您必须手动完成此操作。如果您检查 Pandas 的 I/O API（ https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html ），则无法定义自定义读取过程。
MaDudeeK 1月前 0 只看Ta

引用 4楼

这是一个糟糕的（不完整/无用的）答案，因为有很多方法可以实现这一点。是的，pandas 是一个优秀的工具，但不是唯一的工具。这需要一些外部（低级）思考。
ShadowRanger 1月前 0 只看Ta

引用 5楼
我的看法。再次作为我学习熊猫的一部分。
```
import pandas as pd
from io import StringIO

data = '''\
name: hello_world
rating: 5
description: basic program

name: python
rating: 10
description: programming language

name: foo
rating: 20
description: bar
'''
buffer = StringIO()
buffer.write('field: value\n')  # add column headers
buffer.write(data)
buffer.seek(0)

df = pd.read_csv(buffer, delimiter=':')

transposed = df.T

_, col_count = transposed.shape

x = []
for i in range(0, col_count, 3):
    tmp = transposed[[i, i + 1, i + 2]]
    columns = tmp.iloc[0]
    tmp = tmp[1:]
    tmp.columns = columns
    x.append(tmp)

out = pd.concat(x)
print(out.to_string(index=False))
```
我真的很感激有饲养熊猫经验的人能告诉我更好的方法。
KillerCube 1月前 0 只看Ta

引用 6楼
数据.txt：
```
name: hello_world
rating: 5
description: basic program

name: python
rating: 10
description: programming language
```
代码：
```
import pandas as pd
with open('data.txt', 'rt') as fin:
    lst = [line[:-1] for line in fin if line[:-1]]
print(lst)

# Soln 1
d = dict()
d['name'] = [ele.split(':')[1] for ele in lst if ele.startswith('name:')]
d['rating'] = [ele.split(':')[1] for ele in lst if ele.startswith('rating:')]
d['description'] = [ele.split(':')[1] for ele in lst if ele.startswith('description:')]
df = pd.DataFrame(data=d)
print(df)
```
＃或者
```
data_tuples_lst = [(lst[i].split(':')[1], lst[i+1].split(':')[1], lst[i+2].split(':')[1]) for  i in range(0, len(lst), 3) ]
df1 = pd.DataFrame(data=data_tuples_lst, columns = ['name', 'rating', 'description'])
print(df1)
```
输出：
```
['name: hello_world', 'rating: 5', 'description: basic program', 'name: python', 'rating: 10', 'description: programming language']
           name rating            description
0   hello_world      5          basic program
1        python     10   programming language
           name rating            description
0   hello_world      5          basic program
1        python     10   programming language
```
anvar 1月前 0 只看Ta

引用 7楼

在第一个解决方案中，您可以避免使用字典。我添加了第三个解决方案的答案，即在获得您的列表之后。
authorizeduser 1月前 0 只看Ta

引用 8楼
这是处理“横向”数据集的一种方法。此代码已针对上一个答案进行了编辑以提高效率。

示例代码：
```
import pandas as pd
from collections import defaultdict

# Read the text file into a list.
with open('prog.txt') as f:
    text = [i.strip() for i in f]

# Split the list into lists of key, value pairs.
d = [i.split(':') for i in text if i]
# Create a data container.
data = defaultdict(list)
# Store the data in a DataFrame-ready dict.
for k, v in d:
    data[k].append(v.strip())

# Load the DataFrame.
df = pd.DataFrame(data)
```
输出：
```
          name rating           description
0  hello_world      5         basic program
1       python     10  programming language
```
John Winston 1月前 0 只看Ta

引用 9楼
以防有人稍后来这里，这就是我所做的。我只是将输入文件转换为 csv（除了我使用“|”作为分隔符，因为数据集包含字符串）。感谢大家的意见，但我忘了提到这是一个 2GB 的数据文件，所以我不想对我那可怜的过度工作的笔记本电脑做任何过于密集的事情。
```
import pandas as pd


ofile = open("out_file.csv", 'w')
ifile = open("in_file.txt", 'r', encoding='cp1252')

for l in ifile:
  if l == '\n':
    ofile.write('\n')
  else:
    ofile.write(l.split(':')[1][:-1] + '|')

ofile.close()
ifile.close()
```
然后我使用以下方法打开数据框：
```
import pandas as pd
df =pd.read_csv('out_file.csv', sep="|", skipinitialspace=True, index_col=False)
```
Lorraine 1月前 0 只看Ta

引用 10楼
在获得@aaj-kaal 提出的包含以下代码的列表后：
```
import pandas as pd
with open('data.txt', 'rt') as fin:
    lst = [line[:-1] for line in fin if line[:-1]]
```
您可以通过以下方式直接获取数据框：
```
dict_df=pd.DataFrame()
dict_df['name'] = [ele.split(':')[1] for ele in lst if ele.startswith('name:')]
dict_df['rating'] = [ele.split(':')[1] for ele in lst if \
                    ele.startswith('rating:')]
dict_df['description'] = [ele.split(':')[1] for ele in lst\
                         if ele.startswith('description:')]
dict_df
```
输出
```
name    rating          description
0       hello_world 5   basic program
1       python  10      programming language
```
Claudiu Creanga 1月前 0 只看Ta

引用 11楼
通用建议：
```
import pandas as pd
def from_txt_transposed_to_pandas(file):
    """
    take a txt file like this:

    "
    name: hello_world
    rating: 5
    description: basic program

    name: python
    rating: 10
    description: programming language 
    "

    -of any length- and returns a dataframe.
    """
    tabla = pd.read_table(file)
    cols = list(set([x.split(":")[0] for x in tabla.iloc[::,0]]))
    tabla_df= pd.DataFrame(columns = cols)
    elem = list(tabla[tabla.columns[0]])+[tabla.columns[0]]
    for n in range(len(cols)):  
        tabla_df[cols[n]]= [x.split(":")[1] for x in elem if\ 
        x.startswith(cols[n])]
    return tabla_df
```