深夜学python编程遇到鬼 - 2024年9月10日北美华人网存档

2 个月

楼主 (北美华人网)

请看下面代码，没有任何错误。但是无论怎么样，也不能把x['C']的类型从pd.Series 转换到pd.Categorical 请问python大侠。这是怎么回事。是深夜遇到鬼了吗？还是潘大师（pandas V2.2.2）的bug？
import pandas as pd
x = pd.DataFrame({ "A": pd.Categorical(["A", "B", "B"], categories=["A", "B"]), "B": [1,2, 3], "C": ["D", "E", "E"] }, index=["A", "B", "C"])
print(type(x['C'])) #<class 'pandas.core.series.Series'>
x['C'] = pd.Categorical(x['C'], categories=["D", "E"]) print(type(x['C'])) #输出还是Series。<class 'pandas.core.series.Series'>

qt62

2 个月

回复 1楼 microsat 的帖子
GPT 给的答案:
Based on the code you've provided, here's the analysis of the types for x['C']:After the initial DataFrame creation: print(type(x['C'])) This will output: <class 'pandas.core.series.Series'>Initially, x['C'] is a regular pandas Series containing string values. After the conversion: x['C'] = pd.Categorical(x['C'], categories=["D", "E"]) print(type(x['C'])) This will still output: <class 'pandas.core.series.Series'>However, the type of the data within the Series has changed. The important distinction here is:The type of x['C'] itself remains a pandas Series in both cases. What changes is the dtype of the Series. To see this change more clearly, you could print:print(x['C'].dtype)Before the conversion, this would show: objectAfter the conversion, it would show: categorySo, to answer your question directly: The type of x['C'] should be pandas.core.series.Series in both cases. However, the dtype of the Series changes from 'object' (string) to 'category' after the conversion.If you want to see the categorical nature of the data after conversion, you could use:print(x['C'].cat.categories)This would show: Index(['D', 'E'], dtype='object')Confirming that it's now a categorical Series with categories 'D' and 'E'.

crichris

2 个月

microsat 发表于 2024-09-08 02:21
请看下面代码，没有任何错误。但是无论怎么样，也不能把x['C']的类型从pd.Series 转换到pd.Categorical 请问python大侠。这是怎么回事。是深夜遇到鬼了吗？还是潘大师（pandas V2.2.2）的bug？
import pandas as pd
x = pd.DataFrame({ "A": pd.Categorical(["A", "B", "B"], categories=["A", "B"]), "B": [1,2, 3], "C": ["D", "E", "E"] }, index=["A", "B", "C"])
print(type(x['C'])) #<class 'pandas.core.series.Series'>
x['C'] = pd.Categorical(x['C'], categories=["D", "E"]) print(type(x['C'])) #输出还是Series。<class 'pandas.core.series.Series'>

https://gprivate.com/6d5ou
jokes aside 这个categorical 难道不是dtype么
你在convert 之前和之后print(x["c"].dtype)看一看

microsat

2 个月

crichris 发表于 2024-09-08 08:04
https://gprivate.com/6d5ou
jokes aside 这个categorical 难道不是dtype么
你在convert 之前和之后print(x["c"].dtype)看一看

请问如果我想知道一个变量是什么class，然后可以调用它的什么函数。
这个时候是用type(x['C']) 还是用x['C'].dtype来查看呢？
如果用x['C'].dtype，显示是category。但是不能使用 Categorical的categories成员变量
import pandas as pd
x = pd.DataFrame({ "A": pd.Categorical(["A", "B", "B"], categories=["A", "B"]), "B": [1,2, 3], "C": ["D", "E", "E"] }, index=["A", "B", "C"])
print(type(x['C'])) print(x["C"].dtype)
x['C'] = pd.Categorical(x['C'], categories=["D", "E"]) print(type(x['C'])) print(x["C"].dtype)

>>> print(x["C"].dtype) category >>> x['C'].categories
AttributeError: 'Series' object has no attribute 'categories'

crichris

2 个月

microsat 发表于 2024-09-08 16:39
请问如果我想知道一个变量是什么class，然后可以调用它的什么函数。
这个时候是用type(x['C']) 还是用x['C'].dtype来查看呢？
如果用x['C'].dtype，显示是category。但是不能使用 Categorical的categories成员变量
import pandas as pd
x = pd.DataFrame({ "A": pd.Categorical(["A", "B", "B"], categories=["A", "B"]), "B": [1,2, 3], "C": ["D", "E", "E"] }, index=["A", "B", "C"])
print(type(x['C'])) print(x["C"].dtype)
x['C'] = pd.Categorical(x['C'], categories=["D", "E"]) print(type(x['C'])) print(x["C"].dtype)

>>> print(x["C"].dtype) category >>> x['C'].categories
AttributeError: 'Series' object has no attribute 'categories'

我... 要不给你推荐一套python课吧...不是托
上个月在另一个帖子推荐过被否了
https://www.udemy.com/user/fredbaptiste/
这上面的所有python课. 新会员应该很便宜我印象. 而且你多访问访问coursera 搞不好udemy上的课更便宜
你上完一套应该就有点儿概念了
对于你的特定问题
https://pandas.pydata.org/docs/reference/api/pandas.Series.cat.html
另外对于想看instance variable是哪个class 除了type之外你也可以用 x['C'].__class__
记住两个返回的都是class object

cheeelgo

2 个月

crichris 发表于 2024-09-09 00:43
我... 要不给你推荐一套python课吧...不是托
上个月在另一个帖子推荐过被否了
https://www.udemy.com/user/fredbaptiste/
这上面的所有python课. 新会员应该很便宜我印象. 而且你多访问访问coursera 搞不好udemy上的课更便宜
你上完一套应该就有点儿概念了
对于你的特定问题
https://pandas.pydata.org/docs/reference/api/pandas.Series.cat.html
另外对于想看instance variable是哪个class 除了type之外你也可以用 x['C'].__class__
记住两个返回的都是class object

👍专业

microsat

2 个月

我... 要不给你推荐一套python课吧...不是托
上个月在另一个帖子推荐过被否了
https://www.udemy.com/user/fredbaptiste/
这上面的所有python课. 新会员应该很便宜我印象. 而且你多访问访问coursera 搞不好udemy上的课更便宜
你上完一套应该就有点儿概念了
对于你的特定问题
https://pandas.pydata.org/docs/reference/api/pandas.Series.cat.html
另外对于想看instance variable是哪个class 除了type之外你也可以用 x['C'].__class__
记住两个返回的都是class object

crichris 发表于 2024-09-09 00:43

谢谢大师。但是即使是你推荐的这个课的老师能回答下面这个问题吗？
为什么上例中的 x[''''''''A''''''''] 和 x[''''''''C''''''''] 的类型都不是pd.Categorical, 而都是pd.Series.
重新贴一下代码，拷贝就能运行出结果。
import pandas as pd
x = pd.DataFrame({ "A": pd.Categorical(["A", "B", "B"], categories=["A", "B"]), "B": [1,2, 3], "C": ["D", "E", "E"] }, index=["A", "B", "C"])
print(type(x[''''''''C''''''''])) print(x["C"].dtype) print(x[''''''''C''''''''].__class__)
x[''''''''C''''''''] = pd.Categorical(x[''''''''C''''''''], categories=["D", "E"]) print(type(x[''''''''C''''''''])) print(x["C"].dtype) print(x[''''''''C''''''''].__class__) print(x["A"].dtype) print(x[''''''''A''''''''].__class__)
#为什么x[''''''''A'''''''']和x[''''''''C'''''''']的类型都不是pd.Categorical, 而都是pd.Series. #也请问python大牛。这是不是python的pandas库的bug？
##我想把上例中的x[''''''''A'''''''']和x[''''''''C'''''''']的类型强制性的，永远的，真正的从pd.Series转换成pd.Categorical, 应该怎么做？

crichris

2 个月

microsat 发表于 2024-09-09 15:22
谢谢大师。但是即使是你推荐的这个课的老师能回答下面这个问题吗？
为什么上例中的 x[''''''''A''''''''] 和 x[''''''''C''''''''] 的类型都不是pd.Categorical, 而都是pd.Series.
重新贴一下代码，拷贝就能运行出结果。
import pandas as pd
x = pd.DataFrame({ "A": pd.Categorical(["A", "B", "B"], categories=["A", "B"]), "B": [1,2, 3], "C": ["D", "E", "E"] }, index=["A", "B", "C"])
print(type(x[''''''''C''''''''])) print(x["C"].dtype) print(x[''''''''C''''''''].__class__)
x[''''''''C''''''''] = pd.Categorical(x[''''''''C''''''''], categories=["D", "E"]) print(type(x[''''''''C''''''''])) print(x["C"].dtype) print(x[''''''''C''''''''].__class__) print(x["A"].dtype) print(x[''''''''A''''''''].__class__)
#为什么x[''''''''A'''''''']和x[''''''''C'''''''']的类型都不是pd.Categorical, 而都是pd.Series. #也请问python大牛。这是不是python的pandas库的bug？
##我想把上例中的x[''''''''A'''''''']和x[''''''''C'''''''']的类型强制性的，永远的，真正的从pd.Series转换成pd.Categorical, 应该怎么做？

通过你问的问题以及使用的状况我主要是觉得你对python了解不太深刻, 上了这个课会好一些
不过这个课能不能具体回答这个问题我也不太清楚. 里面有一章专门讲pandas我跳过了
具体到你这个问题, 我pandas 用的不熟
但是你可以试试series instance object的 array property. 你试一试就发现x['C'].array 已经是pandas.core.arrays.categorical.Categorical 了
至于说你非得把x['C']改成pandas.core.arrays.categorical.Categorical 我不太清楚是不是可能
就算你多加一列然后给这一列赋值 Categorical它最后还是会变成pd.Series ------------------------------------------------------------------------------------------------------------------------------ DD = pd.Categorical([1, 2, 3]) print(type(DD))
x['D'] = DD
print(type(x['D'])) print(type(x['D'].array))
------------------------------------------------------------------------------------------------------------------------------

如果你去看pd.DataFrame.__getitem__ 的定义里面如果你的key是single的话至少这个comment应该永远会 return pd.Series
如果你有兴趣可以读一读这个class是怎么写的

------------------------------------------------------------------------------------------------------------------------------

if is_single_key: # What does looking for a single key in a non-unique index return? # The behavior is inconsistent. It returns a Series, except when # - the key itself is repeated (test on data.shape, #9519), or # - we have a MultiIndex on columns (test on self.columns, #21309) if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex): # GH#26490 using data[key] can cause RecursionError return data._get_item_cache(key)
return data

Hope this helps

microsat

2 个月

crichris 发表于 2024-09-09 18:22
通过你问的问题以及使用的状况我主要是觉得你对python了解不太深刻, 上了这个课会好一些
不过这个课能不能具体回答这个问题我也不太清楚. 里面有一章专门讲pandas我跳过了
具体到你这个问题, 我pandas 用的不熟
但是你可以试试series instance object的 array property. 你试一试就发现x['C'].array 已经是pandas.core.arrays.categorical.Categorical 了
至于说你非得把x['C']改成pandas.core.arrays.categorical.Categorical 我不太清楚是不是可能
就算你多加一列然后给这一列赋值 Categorical它最后还是会变成pd.Series ------------------------------------------------------------------------------------------------------------------------------ DD = pd.Categorical([1, 2, 3]) print(type(DD))
x['D'] = DD
print(type(x['D'])) print(type(x['D'].array))
------------------------------------------------------------------------------------------------------------------------------

如果你去看pd.DataFrame.__getitem__ 的定义里面如果你的key是single的话至少这个comment应该永远会 return pd.Series
如果你有兴趣可以读一读这个class是怎么写的

------------------------------------------------------------------------------------------------------------------------------

if is_single_key: # What does looking for a single key in a non-unique index return? # The behavior is inconsistent. It returns a Series, except when # - the key itself is repeated (test on data.shape, #9519), or # - we have a MultiIndex on columns (test on self.columns, #21309) if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex): # GH#26490 using data[key] can cause RecursionError return data._get_item_cache(key)
return data

Hope this helps

谢谢！这个是pandas的bug。
一个pd.Series既然用了x['C'] = pd.Categorical(x['C'], categories=["D", "E"])
那么就应该改变类型。结果没改变。

crichris

2 个月

microsat 发表于 2024-09-10 12:14
谢谢！这个是pandas的bug。
一个pd.Series既然用了x['C'] = pd.Categorical(x['C'], categories=["D", "E"])
那么就应该改变类型。结果没改变。

强烈推荐你上一门课
加油