Series的基本使用
Series的基本使用
一、定义
Series类似于一维数组, 由一组数据(类似于一维numpy对象)及一组标签(索引)组成
创建方法
Series(可迭代一维数据 [, index=索引列表 [, dtype=数据类型]])
注:可迭代对象可以使用一维链表,一维numpy对象,字典(使用字典时,index为字典的key),可迭代对象必须是一维,否则报错:Data must be 1-dimensional
举例说明
import numpy as np
from pandas import Seriesprint(Series(range(3)))
print("#" * 30)
print(Series(range(3), index = ["first", "second", "third"]))
print("#" * 30)
print(Series(range(3), index = ["first", "second", "third"], dtype=int))
print("#" * 30)
print(Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int))
print("#" * 30)
print(Series({"first": 1, "second": 2, "third": 3}, dtype=int))
0 0
1 1
2 2
dtype: int64
##############################
first 0
second 1
third 2
dtype: int64
##############################
first 0
second 1
third 2
dtype: int32
##############################
first 0
second 1
third 2
dtype: int32
##############################
first 1
second 2
third 3
dtype: int32
属性
Series对象的属性有:dtype, index, values, name
Series.index有属性:name
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0.index = ["语文", "数学", "英语"]
print(series0.dtype)
print("##############################")
print(series0.index)
print("##############################")
print(series0.values)series0.name = "Series0"
series0.index.name = "idx"
print("##############################")
print(series0)
int32
##############################
Index(['语文', '数学', '英语'], dtype='object')
##############################
[0 1 2]
##############################
idx
语文 0
数学 1
英语 2
Name: Series0, dtype: int32
Series的增删查改
Series查询
常规查询
可以使用索引,也可以使用序号
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0[1])
print("#" * 30)
print(series0["first"])
1
##############################
0
切片查询
1、索引切片,闭区间
2、序号切片,前闭后开
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0["second": "third"])
print("#" * 30)
print(series0[1:2])
second 1
third 2
dtype: int32
##############################
second 1
dtype: int32
条件查询
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0[series0 > 0])
second 1
third 2
dtype: int32
新增
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0["fourth"] = 3
print(series0)
first 0
second 1
third 2
fourth 3
dtype: int64
删除
只能根据索引进行删除,无法直接删除值
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0 = series0.drop("third")
print(series0)
first 0
second 1
dtype: int32
修改
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0["first"] = "first-modify"
print(series0)
series0[1] = "second-modify"
print(series0)
first first-modify
second 1
third 2
dtype: object
first first-modify
second second-modify
third 2
dtype: object
缺失值检测
1、使用新索引,原有索引 –> 值对应关系不变,新索引对应的值为np.nan, 显示为NaN
2、缺失值检测方法:pd.isnull(series对象) or series对象.isnull(), pd.notnull(series对象) or series对象.notnull()
3、缺失值过滤:series对象[pd.notnull(series对象)]
scores = Series({"Tom": 90, "Jim": 98, "Zera": 59})
print(scores)
print("#" * 30)
new_index = ["Joe", "Tom", "Jim", "Zera"]
scores = Series(scores, index=new_index)
print(scores)
print("#" * 30)
print(pd.isnull(scores))
print("#" * 30)
print(pd.notnull(scores))
print("#" * 30)
print("scores[pd.isnull(scores)] \n", scores[pd.isnull(scores)])
print("scores[scores.isnull()] \n", scores[scores.isnull()])
print("#" * 30)
print("scores[pd.notnull(scores)] \n", scores[pd.notnull(scores)])
print("scores[scores.notnull()] \n", scores[scores.notnull()])
Jim 98
Tom 90
Zera 59
dtype: int64
##############################
Joe NaN
Tom 90.0
Jim 98.0
Zera 59.0
dtype: float64
##############################
Joe True
Tom False
Jim False
Zera False
dtype: bool
##############################
Joe False
Tom True
Jim True
Zera True
dtype: bool
##############################
scores[pd.isnull(scores)] Joe NaN
dtype: float64
scores[scores.isnull()] Joe NaN
dtype: float64
##############################
scores[pd.notnull(scores)] Tom 90.0
Jim 98.0
Zera 59.0
dtype: float64
scores[scores.notnull()] Tom 90.0
Jim 98.0
Zera 59.0
dtype: float64
Series的自动对齐
两个索引未对齐(索引顺序不一样)的Series对象,进行运算后会自动对齐,索引相同的值对应做运算
product_num = Series([1, 2, 3, 4], index=['p1', 'p2', 'p3', 'p4'])
product_price = Series([3, 2, 1, 6], index=['p3', 'p2', 'p5', 'p1'])
product_sum = product_num * product_price
print(product_sum)
p1 6.0
p2 4.0
p3 9.0
p4 NaN
p5 NaN
dtype: float64
运算不改变键值对应关系
series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0/5)
print("#" * 30)
print(np.exp(series0))
first 0.0
second 0.2
third 0.4
dtype: float64
##############################
first 1.000000
second 2.718282
third 7.389056
dtype: float64