【Python-pandas】如何改变dataframe中部分元素的格式？ - 2021年1月12日未名空间存档

未名空间

5 年多

楼主 (未名空间)

求问版上高手，本人用Python里的神包pandas将csv格式或者xlsx/xlsm（Excel）格式的
原始数据read_csv或者read_excel进来形成dataframe，但由于原数据里string和number都有，结果大家在dataframe里统统被自动搞成str的format，那么需要将dataframe表格中数字部分的str变成float格式、字符部分继续保持原str格式，也就是说，需要将dataframe二维表格中某些行、某些列或者某些行的某部分、某些列的某部分进行str到float的格式转变，请问如果做？求高手指点，最好能给两行关键code示范，谢谢！

i

iloveYolanda

5 年多

def convert(x):
try:
return(float(x))
except:
return(x)

df = df.applymap(convert)

g

greydog

4 年多

Beautiful and succinct code!

Did something else as below, seems working too:

import re

f = lambda x: float(x) if re.sub(r'(^-|\.|e\+|e\-)', '', x).isdigit() else xdf = df.applymap(f)

Note:
the line below solves the issue to verify a string 'x' is "int/float" string or "str" string:

f = lambda x: float(x) if re.sub(r'(^-|\.|e\+|e\-)', '', x).isdigit() else x

test:

l = ['123', '12.3', '-12.3', '--1.2','1a', 'a1', '-1.a', '12-', '-0.08e+3', '1.2e-3']

for item in l:
print('Original: %10s'%repr(item), '\t', '==>', type(f(item)), f(item))
output:

Original: '123' ==> 123.0
Original: '12.3' ==> 12.3
Original: '-12.3' ==> -12.3
Original: '--1.2' ==> --1.2
Original: '1a' ==> 1a
Original: 'a1' ==> a1
Original: '-1.a' ==> -1.a
Original: '12-' ==> 12-
Original: '-0.08e+3' ==> -80.0
Original: '1.2e-3' ==> 0.0012

Explaining the pattern: r'(^-|\.|e\+|e\-)'
is to match:
1) "^-" (for negative number, only when "-" is at beginning, middle "-" will
be counted as str string),
2) decimal "\.", and
3) scientific notion "e\+" or "e\-"
and replace with '' (nothing), then if .isdigit() is True,
means it can be
convert to float: if not, str string. Note, the thousands "," is not handled yet, but better to be done before this step (either during csv read or do a .str.replace), because builtin float() will return ValueError.