Series的基本使用

Series的基本使用

一、定义

Series类似于一维数组, 由一组数据(类似于一维numpy对象)及一组标签(索引)组成

创建方法

Series(可迭代一维数据 [, index=索引列表 [, dtype=数据类型]])

注:可迭代对象可以使用一维链表,一维numpy对象,字典(使用字典时,index为字典的key),可迭代对象必须是一维,否则报错:Data must be 1-dimensional

举例说明

import numpy as np
from pandas import Seriesprint(Series(range(3)))
print("#" * 30)
print(Series(range(3), index = ["first", "second", "third"]))
print("#" * 30)
print(Series(range(3), index = ["first", "second", "third"], dtype=int))
print("#" * 30)
print(Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int))
print("#" * 30)
print(Series({"first": 1, "second": 2, "third": 3}, dtype=int))
0    0
1    1
2    2
dtype: int64
##############################
first     0
second    1
third     2
dtype: int64
##############################
first     0
second    1
third     2
dtype: int32
##############################
first     0
second    1
third     2
dtype: int32
##############################
first     1
second    2
third     3
dtype: int32

属性

Series对象的属性有:dtype, index, values, name
Series.index有属性:name

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0.index = ["语文", "数学", "英语"]
print(series0.dtype)
print("##############################")
print(series0.index)
print("##############################")
print(series0.values)series0.name = "Series0"
series0.index.name = "idx"
print("##############################")
print(series0)
int32
##############################
Index(['语文', '数学', '英语'], dtype='object')
##############################
[0 1 2]
##############################
idx
语文    0
数学    1
英语    2
Name: Series0, dtype: int32

Series的增删查改

Series查询

常规查询

可以使用索引,也可以使用序号

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0[1])
print("#" * 30)
print(series0["first"])
1
##############################
0

切片查询

1、索引切片,闭区间

2、序号切片,前闭后开

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0["second": "third"])
print("#" * 30)
print(series0[1:2])
second    1
third     2
dtype: int32
##############################
second    1
dtype: int32

条件查询

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0[series0 > 0])
second    1
third     2
dtype: int32

新增

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0["fourth"] = 3
print(series0)
first     0
second    1
third     2
fourth    3
dtype: int64

删除

只能根据索引进行删除,无法直接删除值

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0 = series0.drop("third")
print(series0)
first     0
second    1
dtype: int32

修改

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
series0["first"] = "first-modify"
print(series0)
series0[1] = "second-modify"
print(series0)
first     first-modify
second               1
third                2
dtype: object
first      first-modify
second    second-modify
third                 2
dtype: object

缺失值检测

1、使用新索引,原有索引 –> 值对应关系不变,新索引对应的值为np.nan, 显示为NaN

2、缺失值检测方法:pd.isnull(series对象) or series对象.isnull(), pd.notnull(series对象) or series对象.notnull()

3、缺失值过滤:series对象[pd.notnull(series对象)]

scores = Series({"Tom": 90, "Jim": 98, "Zera": 59})
print(scores)
print("#" * 30)
new_index = ["Joe", "Tom", "Jim", "Zera"]
scores = Series(scores, index=new_index)
print(scores)
print("#" * 30)
print(pd.isnull(scores))
print("#" * 30)
print(pd.notnull(scores))
print("#" * 30)
print("scores[pd.isnull(scores)] \n", scores[pd.isnull(scores)])
print("scores[scores.isnull()] \n", scores[scores.isnull()])
print("#" * 30)
print("scores[pd.notnull(scores)] \n", scores[pd.notnull(scores)])
print("scores[scores.notnull()] \n", scores[scores.notnull()])
Jim     98
Tom     90
Zera    59
dtype: int64
##############################
Joe      NaN
Tom     90.0
Jim     98.0
Zera    59.0
dtype: float64
##############################
Joe      True
Tom     False
Jim     False
Zera    False
dtype: bool
##############################
Joe     False
Tom      True
Jim      True
Zera     True
dtype: bool
##############################
scores[pd.isnull(scores)] Joe   NaN
dtype: float64
scores[scores.isnull()] Joe   NaN
dtype: float64
##############################
scores[pd.notnull(scores)] Tom     90.0
Jim     98.0
Zera    59.0
dtype: float64
scores[scores.notnull()] Tom     90.0
Jim     98.0
Zera    59.0
dtype: float64

Series的自动对齐

两个索引未对齐(索引顺序不一样)的Series对象,进行运算后会自动对齐,索引相同的值对应做运算

product_num = Series([1, 2, 3, 4], index=['p1', 'p2', 'p3', 'p4'])
product_price = Series([3, 2, 1, 6], index=['p3', 'p2', 'p5', 'p1'])
product_sum = product_num * product_price
print(product_sum)
p1    6.0
p2    4.0
p3    9.0
p4    NaN
p5    NaN
dtype: float64

运算不改变键值对应关系

series0 = Series(np.array(range(3)), index = ["first", "second", "third"], dtype=int)
print(series0/5)
print("#" * 30)
print(np.exp(series0))
first     0.0
second    0.2
third     0.4
dtype: float64
##############################
first     1.000000
second    2.718282
third     7.389056
dtype: float64