文章目录
数据分析
资料
链接:网盘地址
提取码:f8by
Numpy
Numpy是什么?
-
Numrical Python,数值的Python,应用于数值分析领域的Python语言工具;
-
Numpy是一个开源的科学计算库;
-
Numpy弥补了作为通用编程语言的Python在数值计算方面,能力弱,速度慢的不足;
-
Numpy拥有丰富的数学函数、强大的多维数组和优异的运算性能;
-
Numpy与Scipy、scikit、matplotlib等其它科学计算库可以很好地协调工作;
-
Numpy可以取代matlab等工具,允许用户进行快速开发的同时完成交互式的原型设计。
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np n = 100000 start = dt.datetime.now() A, B = [], [] for i in range(n): A.append(i ** 2) B.append(i ** 3) C = [] for a, b in zip(A, B): C.append(a + b) print((dt.datetime.now() - start).microseconds) start = dt.datetime.now() C = np.arange(n) ** 2 + np.arange(n) ** 3 print((dt.datetime.now() - start).microseconds)
多维数组
-
numpy中的多维数组是numpy.ndarray类类型的对象,可用于表示数据结构中的任意维度的数组;
-
创建多维数组对象:
numpy.arange(起始, 终止, 步长)->一维数组,首元素就是起始值,尾元素为终止值之前的最后一个元素,步长即每次递增的公差。缺省起始值为0,缺省步长为1
numpy.array(任何可被解释为数组的容器) -
内存连续,元素同质。
-
ndarray.dtype属性表示元素的数据类型。通过dtype参数和astype()方法可以指定和修改元素的数据类型。
-
ndarray.shape属性表示数组的维度:
-
(高维度数, …, 低维度数)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(10) print(a) b = np.arange(1, 10) print(b) c = np.arange(1, 10, 2) print(c) d = np.array([]) print(d) e = np.array([10, 20, 30, 40, 50]) print(e) f = np.array([ [1, 2, 3], [4, 5, 6]]) print(f) print(type(f)) print(type(f[0][0])) print(f.dtype) g = np.array(['1', '2', '3'], dtype=np.int32) print(type(g[0])) print(g.dtype) h = g.astype(np.str_) print(type(h[0])) print(h.dtype) print(e.shape) print(f.shape) i = np.array([ [np.arange(1, 5), np.arange(5, 9), np.arange(9, 13)], [np.arange(13, 17), np.arange(17, 21), np.arange(21, 25)]]) print(i.shape) print(i)
-
-
元素索引,从0开始
-
数组[索引]
-
数组[行索引][列索引]
-
数组[页索引][行索引][列索引]
-
数组[页索引, 行索引, 列索引]
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) print(a) print(a[0]) print(a[0][0]) print(a[0][0][0]) for i in range(a.shape[0]): for j in range(a.shape[1]): for k in range(a.shape[2]): print(a[i][j][k], a[i, j, k]) b = np.array([1, 2, 3], dtype=int) # int->np.int32 print(b.dtype) c = b.astype(float) # float->np.float64 print(c.dtype) d = c.astype(str) # str->np.str_ print(d.dtype)
-
numpy的内置类型和自定义类型
-
numpy的内置类型
- bool_ 1字节布尔型,True(1)/False(0)
- int8 1字节有符号整型,-128 - 127
- int16 2字节有符号整型
- int32 4字节有符号整型
- int64 8字节有符号整型
- uint8 1字节无符号整型,0 - 255
- uint16 2字节无符号整型
- uint32 4字节无符号整型
- uint64 8字节无符号整型
- float16 2字节浮点型
- float32 4字节浮点型
- float64 8字节浮点型
- complex64 8字节复数型
- complex128 16字节复数型
- str_ 字符串型
-
自定义类型:通过dtype将多个相同或者不同的numpy内置类型组合成某种复合类型,用于数组元素的数据类型。
-
除了使用内置类型的全称以外还可以通过类型编码字符串简化类型的说明。
- numpy.int8 -> i1
- numpy.int16 -> i2
- numpy.uint32 -> u4
- numpy.float64 -> f8
- numpy.complex128 -> c16
-
对于多字节整数可以加上字节序前缀:
< - 小端字节序,低数位低地址;
98
0x1234
L H
0x34 0x12
= - 处理器系统默认;
> -大端字节序,低数位高地址。
L H
0x12 0x34
numpy.str_ -> U字符数
numpy.bool_ -> b
代码:# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.array([('ABC', [1, 2, 3])], dtype='U3, 3i4') print(a) print(a[0]['f0']) print(a[0]['f1'][0]) print(a[0]['f1'][1]) print(a[0]['f1'][2]) b = np.array([('ABC', [1, 2, 3])], dtype=[ ('name', np.str_, 3), ('scores', np.int32, 3)]) print(b) print(b[0]['name']) print(b[0]['scores'][0]) print(b[0]['scores'][1]) print(b[0]['scores'][2]) c = np.array([('ABC', [1, 2, 3])], dtype={ 'names': ['name', 'scores'], 'formats': ['U3', '3i4']}) print(c) print(c[0]['name']) print(c[0]['scores'][0]) print(c[0]['scores'][1]) print(c[0]['scores'][2]) d = np.array([('ABC', [1, 2, 3])], dtype={ 'name': ('U3', 0), 'scores': ('3i4', 12)}) print(d) print(d[0]['name']) print(d[0]['scores'][0]) print(d[0]['scores'][1]) print(d[0]['scores'][2]) e = np.array([0x1234], dtype=( '>u2', {'lo': ('u1', 0), 'hi': ('u1', 1)})) print('{:x}'.format(e[0])) print('{:x} {:x}'.format(e['lo'][0], e['hi'][0])) -
切片
-
数组[起始:终止:步长, 起始:终止:步长, …]
-
缺省起始:首(步长为正)、尾(步长为负)
-
缺省终止:尾后(步长为正)、首前(步长为负)
-
缺省步长:1
-
靠近端部的一个或几个连续的维度使用缺省切片,可以用"…"表示。
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(1, 10) print(a) print(a[:3]) # 1 2 3 print(a[3:6]) # 4 5 6 print(a[6:]) # 7 8 9 print(a[::-1]) # 9 8 7 6 5 4 3 2 1 print(a[:-4:-1]) # 9 8 7 print(a[-4:-7:-1]) # 6 5 4 print(a[-7::-1]) # 3 2 1 print(a[::]) # 1 2 3 4 5 6 7 8 9 print(a[...]) # 1 2 3 4 5 6 7 8 9 print(a[:]) # 1 2 3 4 5 6 7 8 9 # print(a[]) # error print(a[::3]) # 1 4 7 print(a[1::3]) # 2 5 8 print(a[2::3]) # 3 6 9 b = np.arange(1, 25).reshape(2, 3, 4) print(b) print(b[:, 0, 0]) # 1 13 print(b[0, :, :]) print(b[0, ...]) print(b[0, 1, ::2]) # 5 7 print(b[..., 1]) print(b[:, 1]) print(b[-1, 1:, 2:])
-
-
改变维度
-
视图变维:针对一个数组对象获取其不同维度的视图
数组.reshape(新维度)->数组的新维度视图
数组.ravel()->数组的一维视图 -
复制变维:针对一个数组对象获取其不同维度的副本
数组.flatten()->数组的一维副本 -
就地变维
数组.shape = (新维度)
数组.resize(新维度) -
视图转置
数组.transpose()->数组的转置视图
数组.T: 转置视图属性
至少二维数组才能转置。 -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(1, 9) print(a) b = a.reshape(2, 4) print(b) c = b.reshape(2, 2, 2) print(c) d = c.ravel() print(d) e = c.flatten() print(e) f = b.reshape(2, 2, 2).copy() print(f) a += 10 print(a, b, c, d, e, f, sep='\n') a.shape = (2, 2, 2) print(a) a.resize(2, 4) print(a) #g = a.transpose() #g = a.reshape(4, 2) g = a.T print(g) # print(np.array([e]).T) print(e.reshape(-1, 1))
-
-
组合与拆分
-
垂直组合/拆分
numpy.vstack((上, 下))
numpy.vsplit(数组, 份数)->子数组集合 -
水平组合/拆分
numpy.hstack((左, 右))
numpy.hsplit(数组, 份数)->子数组集合 -
深度组合/拆分
numpy.dstack((前, 后))
numpy.dsplit(数组, 份数)->子数组集合 -
行/列组合
numpy.row_stack((上, 下))
numpy.column_stack((左, 右)) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(11, 20).reshape(3, 3) b = np.arange(21, 30).reshape(3, 3) print(a, b, sep='\n',end="\n---------------------\n") c = np.vstack((a, b)) print("vstack:",c,end="\n---------------------\n") a, b = np.vsplit(c, 2) print("vsplit:",a, b, sep='\n',end="\n---------------------\n") c = np.hstack((a, b)) print("hstack:",c,end="\n---------------------\n") a, b = np.hsplit(c, 2) print("hsplit:",a, b, sep='\n',end="\n---------------------\n") c = np.dstack((a, b)) print("dstack:",c,end="\n---------------------\n") a, b = np.dsplit(c, 2) print("dsplit:",a.T[0].T, b.T[0].T, sep='\n',end="\n---------------------\n") a = a.ravel() b = b.ravel() print("ravel:",a, b, sep='\n',end="\n---------------------\n") c = np.row_stack((a, b)) #c = np.vstack((a, b)) print("row_stack:",c,end="\n---------------------\n") #c = np.column_stack((a, b)) #c = np.hstack((a, b)) c = np.c_[a, b] print("c_:",c,end="\n---------------------\n")
-
-
ndarray类的属性
-
dtype - 元素类型
-
shape - 数组维度
-
T - 转置视图
-
ndim - 维数
-
size - 元素数, 仅对一维数组等价于len()
-
itemsize - 元素字节数
-
nbytes - 总字节数 = size x itemsize
-
flat - 扁平迭代器
-
real - 实部数组
-
imag - 虚部数组
-
数组.tolist()->列表对象
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.array([ [1 + 1j, 2 + 4j, 3 + 7j], [4 + 2j, 5 + 5j, 6 + 8j], [7 + 3j, 8 + 6j, 9 + 9j]]) print("dtype:",a.dtype, a.dtype.str, a.dtype.char) print("shape:",a.shape) print("ndim:",a.ndim) print("size,len:",a.size, len(a)) print("itemsize:",a.itemsize) print("nbytes:",a.nbytes) print("T:",a.T) print("real:",a.real, a.imag, sep='\n') for elem in a.flat: print(elem) print(a.flat[[1, 3, 5]]) a.flat[[2, 4, 6]] = 0 print(a) def fun(a, b): a.append(b) return a x = np.array([10, 20, 30]) y = 40 x = np.array(fun(x.tolist(), y)) print("tolist:",x) x = np.append(x, 50) print(x)
-
基于matplotlib的数据可视化
-
缺省样式
# -*- coding:utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp #生成曲线上各点的水平坐标 x=np.linspace(-np.pi,np.pi,1000) cos_y=np.cos(x)/2 sin_y=np.sin(x) h_y=x/2 #用直线链接曲线上的各点 mp.plot(x,cos_y) mp.plot(x,sin_y) mp.plot(x,h_y) #显示图形 mp.show()
-
设置线型、线宽和颜色
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp # 生成曲线上各点的水平坐标 x = np.linspace(-np.pi, np.pi, 1000) # 根据曲线函数计算其上各点的垂直坐标 cos_y = np.cos(x) / 2 sin_y = np.sin(x) # 用直线连接曲线上各点 mp.plot(x, cos_y, linestyle='-', linewidth=1, color='dodgerblue') mp.plot(x, sin_y, linestyle='-', linewidth=1, color='orangered') # 显示图形 mp.show()

-
-
设置坐标范围
-
设置水平坐标范围:mp.xlim(最小值, 最大值)
-
设置垂直坐标范围:mp.ylim(最小值, 最大值)
-
代码:
# -*- coding:utf-8 -*- from __future__ import unicode_literals import matplotlib.pyplot as mp # plotting 测绘 library import numpy as np #生成曲线上各点的水平坐标 x=np.linspace(-np.pi,np.pi,2000) #根据曲线函数计算其上各点的垂直坐标 cos_y=np.cos(x)/2 sin_y=np.sin(x) #设置坐标范围 mp.xlim(x.min()*1.1,x.max()*1.1) mp.ylim(min(cos_y.min(),sin_y.min())*1.1, max(cos_y.max(),sin_y.max())*1.1) #用直线连接曲线上各点 mp.plot(x,cos_y,linestyle='-',linewidth=1, color='dodgerblue') mp.plot(x,sin_y,linestyle='-',linewidth=1, color='orangered') #显示图形 mp.show()
-
-
设置坐标轴刻度标签
-
mp.xticks(刻度标签位置, 刻度标签文本)
-
mp.yticks(刻度标签位置, 刻度标签文本)
-
代码:
# -*- coding:utf-8 -*- from __future__ import unicode_literals import matplotlib.pyplot as mp # plotting 测绘 library import numpy as np #生成曲线上各点的水平坐标 x=np.linspace(-np.pi,np.pi,2000) #根据曲线函数计算其上各点的垂直坐标 cos_y=np.cos(x)/2 sin_y=np.sin(x) #设置坐标范围 mp.xlim(x.min()*1.1,x.max()*1.1) mp.ylim(min(cos_y.min(),sin_y.min())*1.1, max(cos_y.max(),sin_y.max())*1.1) mp.xticks([-np.pi,-np.pi/2,np.pi/2,np.pi*3/4,np.pi], [r'$-\pi$',r'$-\frac{\pi}{2}$',r'$0$', r'$\frac{\pi}{2}$',r'$\frac{3\pi}{4}$',r'$\pi$']) mp.yticks([-1,-0.5,0.5,1]) #用直线连接曲线上各点 mp.plot(x,cos_y,linestyle='-',linewidth=1, color='dodgerblue') mp.plot(x,sin_y,linestyle='-',linewidth=1, color='orangered') #显示图形 mp.show()
-
-
将矩形坐标轴改成十字坐标轴
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp # 生成曲线上各点的水平坐标 x = np.linspace(-np.pi, np.pi, 1000) # 根据曲线函数计算其上各点的垂直坐标 cos_y = np.cos(x) / 2 sin_y = np.sin(x) # 设置坐标范围 mp.xlim(x.min() * 1.1, x.max() * 1.1) mp.ylim(min(cos_y.min(), sin_y.min()) * 1.1, max(cos_y.max(), sin_y.max()) * 1.1) # 设置坐标轴刻度标签 mp.xticks([ -np.pi, -np.pi / 2, 0, np.pi / 2, np.pi * 3 / 4, np.pi], [ r'$-\pi$', r'$-\frac{\pi}{2}$', r'$0$', r'$\frac{\pi}{2}$', r'$\frac{3\pi}{4}$', r'$\pi$']) mp.yticks([-1, -0.5, 0.5, 1]) # 将矩形坐标轴改成十字坐标轴 # 获取当前坐标轴对象 ax = mp.gca() # 将垂直坐标刻度置于左边框 ax.yaxis.set_ticks_position('left') # 将左边框置于数据坐标原点 ax.spines['left'].set_position(('data', 0)) # 将水平坐标刻度置于底边框 ax.xaxis.set_ticks_position('bottom') # 将底边框置于数据坐标原点 ax.spines['bottom'].set_position(('data', 0)) # 将右边框和顶边框设置成无色 ax.spines['right'].set_color('none') ax.spines['top'].set_color('none') # 用直线连接曲线上各点 mp.plot(x, cos_y, linestyle='-', linewidth=1, color='dodgerblue') mp.plot(x, sin_y, linestyle='-', linewidth=1, color='orangered') # 显示图形 mp.show()
-
-
显示图例
-
mp.plot(…, label=图例文本)
-
mp.legend(loc=图例位置)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp # 生成曲线上各点的水平坐标 x = np.linspace(-np.pi, np.pi, 1000) # 根据曲线函数计算其上各点的垂直坐标 cos_y = np.cos(x) / 2 sin_y = np.sin(x) # 设置坐标范围 mp.xlim(x.min() * 1.1, x.max() * 1.1) mp.ylim(min(cos_y.min(), sin_y.min()) * 1.1, max(cos_y.max(), sin_y.max()) * 1.1) # 设置坐标轴刻度标签 mp.xticks([ -np.pi, -np.pi / 2, 0, np.pi / 2, np.pi * 3 / 4, np.pi], [ r'$-\pi$', r'$-\frac{\pi}{2}$', r'$0$', r'$\frac{\pi}{2}$', r'$\frac{3\pi}{4}$', r'$\pi$']) mp.yticks([-1, -0.5, 0.5, 1]) # 将矩形坐标轴改成十字坐标轴 # 获取当前坐标轴对象 ax = mp.gca() # 将垂直坐标刻度置于左边框 ax.yaxis.set_ticks_position('left') # 将左边框置于数据坐标原点 ax.spines['left'].set_position(('data', 0)) # 将水平坐标刻度置于底边框 ax.xaxis.set_ticks_position('bottom') # 将底边框置于数据坐标原点 ax.spines['bottom'].set_position(('data', 0)) # 将右边框和顶边框设置成无色 ax.spines['right'].set_color('none') ax.spines['top'].set_color('none') # 用直线连接曲线上各点 mp.plot(x, cos_y, linestyle='-', linewidth=1, color='dodgerblue', label=r'$y=\frac{1}{2}cos(x)$') mp.plot(x, sin_y, linestyle='-', linewidth=1, color='orangered', label=r'$y=sin(x)$') mp.legend(loc='upper left') # 显示图形 mp.show()
-
-
添加特殊点
-
mp.scatter(点集水平坐标数组,点集垂直坐标数组,…)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp # 生成曲线上各点的水平坐标 x = np.linspace(-np.pi, np.pi, 1000) # 根据曲线函数计算其上各点的垂直坐标 cos_y = np.cos(x) / 2 sin_y = np.sin(x) # 计算特殊点的坐标 xo = np.pi * 3 / 4 yo_cos = np.cos(xo) / 2 yo_sin = np.sin(xo) # 设置坐标范围 mp.xlim(x.min() * 1.1, x.max() * 1.1) mp.ylim(min(cos_y.min(), sin_y.min()) * 1.1, max(cos_y.max(), sin_y.max()) * 1.1) # 设置坐标轴刻度标签 mp.xticks([ -np.pi, -np.pi / 2, 0, np.pi / 2, np.pi * 3 / 4, np.pi], [ r'$-\pi$', r'$-\frac{\pi}{2}$', r'$0$', r'$\frac{\pi}{2}$', r'$\frac{3\pi}{4}$', r'$\pi$']) mp.yticks([-1, -0.5, 0.5, 1]) # 将矩形坐标轴改成十字坐标轴 # 获取当前坐标轴对象 ax = mp.gca() # 将垂直坐标刻度置于左边框 ax.yaxis.set_ticks_position('left') # 将左边框置于数据坐标原点 ax.spines['left'].set_position(('data', 0)) # 将水平坐标刻度置于底边框 ax.xaxis.set_ticks_position('bottom') # 将底边框置于数据坐标原点 ax.spines['bottom'].set_position(('data', 0)) # 将右边框和顶边框设置成无色 ax.spines['right'].set_color('none') ax.spines['top'].set_color('none') # 用直线连接曲线上各点 mp.plot(x, cos_y, linestyle='-', linewidth=1, color='dodgerblue', label=r'$y=\frac{1}{2}cos(x)$') mp.plot(x, sin_y, linestyle='-', linewidth=1, color='orangered', label=r'$y=sin(x)$') # 绘制特殊点 mp.plot([xo, xo], [yo_cos, yo_sin], linestyle='--', linewidth=1, color='limegreen') mp.scatter([xo, xo], [yo_cos, yo_sin], s=60, edgecolor='limegreen', facecolor='white', zorder=3) mp.legend(loc='upper left') # 显示图形 mp.show()

-
-
添加注释
-
mp.annotate(
注释文本,
xy=目标位置,
xytext=文本位置,
textcoords=坐标属性,
fontsize=字体大小,
arrowprops=箭头属性) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp # 生成曲线上各点的水平坐标 x = np.linspace(-np.pi, np.pi, 1000) # 根据曲线函数计算其上各点的垂直坐标 cos_y = np.cos(x) / 2 sin_y = np.sin(x) # 计算特殊点的坐标 xo = np.pi * 3 / 4 yo_cos = np.cos(xo) / 2 yo_sin = np.sin(xo) # 设置坐标范围 mp.xlim(x.min() * 1.1, x.max() * 1.1) mp.ylim(min(cos_y.min(), sin_y.min()) * 1.1, max(cos_y.max(), sin_y.max()) * 1.1) # 设置坐标轴刻度标签 mp.xticks([ -np.pi, -np.pi / 2, 0, np.pi / 2, np.pi * 3 / 4, np.pi], [ r'$-\pi$', r'$-\frac{\pi}{2}$', r'$0$', r'$\frac{\pi}{2}$', r'$\frac{3\pi}{4}$', r'$\pi$']) mp.yticks([-1, -0.5, 0.5, 1]) # 将矩形坐标轴改成十字坐标轴 # 获取当前坐标轴对象 ax = mp.gca() # 将垂直坐标刻度置于左边框 ax.yaxis.set_ticks_position('left') # 将左边框置于数据坐标原点 ax.spines['left'].set_position(('data', 0)) # 将水平坐标刻度置于底边框 ax.xaxis.set_ticks_position('bottom') # 将底边框置于数据坐标原点 ax.spines['bottom'].set_position(('data', 0)) # 将右边框和顶边框设置成无色 ax.spines['right'].set_color('none') ax.spines['top'].set_color('none') # 用直线连接曲线上各点 mp.plot(x, cos_y, linestyle='-', linewidth=1, color='dodgerblue', label=r'$y=\frac{1}{2}cos(x)$') mp.plot(x, sin_y, linestyle='-', linewidth=1, color='orangered', label=r'$y=sin(x)$') # 绘制特殊点 mp.plot([xo, xo], [yo_cos, yo_sin], linestyle='--', linewidth=1, color='limegreen') mp.scatter([xo, xo], [yo_cos, yo_sin], s=60, edgecolor='limegreen', facecolor='white', zorder=3) # 添加注释 mp.annotate( r'$\frac{1}{2}cos(\frac{3\pi}{4})=-\frac{\sqrt{2}}{4}$', xy=(xo, yo_cos), xycoords='data', xytext=(-90, -40), textcoords='offset points', fontsize=14, arrowprops=dict( arrowstyle='->', connectionstyle='arc3, rad=0.2')) mp.annotate( r'$sin(\frac{3\pi}{4})=\frac{\sqrt{2}}{2}$', xy=(xo, yo_sin), xycoords='data', xytext=(20, 20), textcoords='offset points', fontsize=14, arrowprops=dict( arrowstyle='->', connectionstyle='arc3, rad=0.2')) # 显示图例 mp.legend(loc='upper left') # 显示图形 mp.show()
-
-
图形对象
-
说明:一个图像对象实际上就可以被看做是一个显示图形的窗口,出了缺省创建的图形窗口以外,也可以通过函数手动创建图形窗口并设置特殊的属性。
-
属性:
- mp.figure(对象名(标题文本), figsize=窗口大小,
dpi=分辨率, facecolor=窗口颜色) - mp.title(标题文本, fontsize=字体大小)
- mp.xlabel(水平轴标签文本, fontsize=字体大小)
- mp.ylabel(垂直轴标签文本, fontsize=字体大小)
- mp.tick_params(labelsize=刻度标签字体大小)
- mp.grid(linestyle=网格线风格)
- 注意如果调用figure()函数时所指定的对象名并不存在,则新建一个图形窗口,同时将其设置为当前窗口,如果该对象名已经存在,则不再建新窗口,而只是将其所对应的图形窗口设置为当前窗口。调用该函数以后的所有绘图都在当前窗口中完成。
- mp.figure(对象名(标题文本), figsize=窗口大小,
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp x = np.linspace(-np.pi, np.pi, 1000) cos_y = np.cos(x) / 2 sin_y = np.sin(x) mp.figure('Figure Object 1', figsize=(8, 6), dpi=60, facecolor='lightgray')#打开窗口,设置分辨率 mp.title('Figure Object 1', fontsize=20)#设置标题 mp.xlabel('x', fontsize=14)#水平标签文本,fontsize=字体大小 mp.ylabel('y', fontsize=14)#垂直标签文件,fontsize字体大小 mp.tick_params(labelsize=10)#labelsize=刻度标签字体大小 mp.grid(linestyle=':')#linestyle=网格线风格 mp.figure('Figure Object 2', figsize=(8, 6), dpi=60, facecolor='lightgray') mp.title('Figure Object 2', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) mp.tick_params(labelsize=10) mp.grid(linestyle=':') mp.figure('Figure Object 1') mp.plot(x, cos_y, color='dodgerblue', label=r'$y=\frac{1}{2}cos(x)$') mp.figure('Figure Object 2') mp.plot(x, sin_y, color='orangered', label=r'$y=sin(x)$') mp.legend() mp.figure('Figure Object 1') mp.legend() mp.show()

-
-
子坐标图
-
mp.subplot(总行数, 总列数, 图序号)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import matplotlib.pyplot as mp mp.figure(facecolor='lightgray') mp.subplot(221) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '1', ha='center', va='center', size=36, alpha=0.5) mp.subplot(222) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '2', ha='center', va='center', size=36, alpha=0.5) mp.subplot(223) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '3', ha='center', va='center', size=36, alpha=0.5) mp.subplot(224) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '4', ha='center', va='center', size=36, alpha=0.5) mp.tight_layout() mp.show()

# -*- coding: utf-8 -*- from __future__ import unicode_literals import matplotlib.pyplot as mp import matplotlib.gridspec as mg mp.figure(facecolor='lightgray') gs = mg.GridSpec(3, 3) mp.subplot(gs[0, :2]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '1', ha='center', va='center', size=36, alpha=0.5) mp.subplot(gs[1:, 0]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '2', ha='center', va='center', size=36, alpha=0.5) mp.subplot(gs[2, 1:]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '3', ha='center', va='center', size=36, alpha=0.5) mp.subplot(gs[:2, 2]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '4', ha='center', va='center', size=36, alpha=0.5) mp.subplot(gs[1, 1]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '5', ha='center', va='center', size=36, alpha=0.5) mp.tight_layout() mp.show()
# -*- coding: utf-8 -*- from __future__ import unicode_literals import matplotlib.pyplot as mp mp.figure(facecolor='lightgray') mp.axes([0.03, 0.038, 0.94, 0.924]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '1', ha='center', va='center', size=36, alpha=0.5) mp.axes([0.63, 0.076, 0.31, 0.308]) mp.xticks(()) mp.yticks(()) mp.text(0.5, 0.5, '2', ha='center', va='center', size=36, alpha=0.5) mp.show()
-
-
设置坐标轴刻度定位器
-
怎么设置:
ax = mp.gca()
ax.xaxis.set_major_locator(刻度定位器对象)
ax.xaxis.set_minor_locator(刻度定位器对象)
ax.yaxis.set_major_locator(刻度定位器对象)
ax.yaxis.set_minor_locator(刻度定位器对象) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp mp.figure() locators = [ 'mp.NullLocator()', 'mp.MaxNLocator(nbins=3, steps=[1, 3, 5, 7, 9])', 'mp.FixedLocator(locs=[0, 2.5, 7.5, 10])', 'mp.AutoLocator()', 'mp.IndexLocator(offset=0.5, base=1.5)', 'mp.MultipleLocator()', 'mp.LinearLocator(numticks=21)', 'mp.LogLocator(base=2, subs=[1.0])'] n_locators = len(locators) for i, locator in enumerate(locators): mp.subplot(n_locators, 1, i + 1) mp.xlim(0, 10) mp.ylim(-1, 1) mp.yticks(()) ax = mp.gca() ax.spines['left'].set_color('none') ax.spines['top'].set_color('none') ax.spines['right'].set_color('none') ax.spines['bottom'].set_position(('data', 0)) ax.xaxis.set_major_locator(eval(locator)) ax.xaxis.set_minor_locator(mp.MultipleLocator(0.1)) mp.plot(np.arange(11), np.zeros(11), color='none') mp.text(5, 0.3, locator[3:], ha='center', size=12) mp.tight_layout() mp.show()

-
-
散点图
-
mp.scatter(水平坐标数组, 垂直坐标数组,
s=大小, c=颜色, cmap=颜色映射, alpha=透明度) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp n = 1000 x = np.random.normal(0, 1, n) y = np.random.normal(0, 1, n) d = np.sqrt(x ** 2 + y ** 2) mp.figure('Scatter', facecolor='lightgray') mp.title('Scatter', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) mp.tick_params(labelsize=10) mp.grid(linestyle=':') mp.scatter(x, y, s=6, c=d, cmap='jet_r', alpha=0.5) mp.show()
-
-
填充
-
mp.fill_between(扫描线水平坐标,
扫描线起点垂直坐标, 扫描线终点垂直坐标,
color=颜色, alpha=透明度) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp n = 1000 x = np.linspace(0, 8 * np.pi, n) sin_y = np.sin(x) cos_y = np.cos(x / 2) / 2 mp.figure('Fill', facecolor='lightgray') mp.title('Fill', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) mp.tick_params(labelsize=10) mp.grid(linestyle=':') mp.plot(x, sin_y, c='dodgerblue', label=r'$y=sin(x)$') mp.plot(x, cos_y, c='orangered', label=r'$y=\frac{1}{2}cos(\frac{x}{2})$') mp.fill_between(x, cos_y, sin_y, cos_y < sin_y, color='dodgerblue', alpha=0.5) mp.fill_between(x, cos_y, sin_y, cos_y > sin_y, color='orangered', alpha=0.5) mp.legend() mp.show()

-
-
条形图
-
mp.bar(矩形条的水平坐标, 矩形条的高度
ec=边框色, fc=填充色, label=图例标签) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp n = 12 x = np.arange(n) y1 = np.random.uniform(0.5, 1.0, n) * (1 - x / n) y2 = np.random.uniform(0.5, 1.0, n) * (1 - x / n) mp.figure('Bar', facecolor='lightgray') mp.title('Bar', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) mp.xticks(x, x + 1) mp.ylim(-1.25, 1.25) mp.tick_params(labelsize=10) mp.grid(axis='y', linestyle=':') mp.bar(x, y1, ec='white', fc='dodgerblue', label='Sample 1') for _x, _y in zip(x, y1): mp.text(_x, _y, '%.2f' % _y, ha='center', va='bottom', size=8) mp.bar(x, -y2, ec='white', fc='dodgerblue', alpha=0.5, label='Sample 2') for _x, _y in zip(x, y2): mp.text(_x, -_y - 0.015, '%.2f' % _y, ha='center', va='top', size=8) mp.legend() mp.show()
-
-
等高线图
-
mp.contour(x, y, z, 密度, colors=颜色,
linewidths=线宽) -
mp.contourf(x, y, z, 密度, cmap=颜色映射)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp n = 1000 x, y = np.meshgrid(np.linspace(-3, 3, n), np.linspace(-3, 3, n)) z = (1 - x / 2 + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2) mp.figure('Contour', facecolor='lightgray') mp.title('Contour', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) mp.tick_params(labelsize=10) mp.grid(linestyle=':') mp.contourf(x, y, z, 8, cmap='jet') cntr = mp.contour(x, y, z, 8, colors='black', linewidths=0.5) mp.clabel(cntr, inline_spacing=1, fmt='%.1f', fontsize=8) mp.show()
-
-
热力图
- mp.imshow(深度坐标, cmap=颜色映射,
origin=垂直轴向) - 代码:hot.py
- mp.imshow(深度坐标, cmap=颜色映射,
-
三维曲面/线框图
-
怎么做
from mpl_toolkits.mplot3d import axes3d
ax=mp.gca(projection=‘3d’)
ax.plot_surface(x, y, z, rstride=垂直步长,
cstride=水平步长, cmap=颜色映射)
ax.plot_wireframe(x, y, z, rstride=垂直步长,
cstride=水平步长, color=颜色,
linewidth=线宽) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp from mpl_toolkits.mplot3d import axes3d n = 1000 x, y = np.meshgrid(np.linspace(-3, 3, n), np.linspace(-3, 3, n)) z = (1 - x / 2 + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2) mp.figure('3D Surface') ax = mp.gca(projection='3d') mp.title('3D Surface', fontsize=20) ax.set_xlabel('x', fontsize=14) ax.set_ylabel('y', fontsize=14) ax.set_zlabel('z', fontsize=14) mp.tick_params(labelsize=10) ax.plot_surface(x, y, z, rstride=10, cstride=10, cmap='jet') mp.figure('3D Wireframe') ax = mp.gca(projection='3d') mp.title('3D Wireframe', fontsize=20) ax.set_xlabel('x', fontsize=14) ax.set_ylabel('y', fontsize=14) ax.set_zlabel('z', fontsize=14) mp.tick_params(labelsize=10) ax.plot_wireframe(x, y, z, rstride=20, cstride=20, linewidth=0.5, color='orangered') mp.show()

-
-
饼图
-
mp.pie(值,空,标,色,格)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import matplotlib.pyplot as mp values = [26, 17, 21, 29, 11] spaces = [0.05, 0.01, 0.01, 0.01, 0.01] labels = ['Python', 'JavaScript', 'C++', 'C', 'PHP'] colors = ['dodgerblue', 'orangered', 'limegreen', 'violet', 'gold'] mp.figure('Pie', facecolor='lightgray') mp.title('Pie', fontsize=20) mp.pie(values, spaces, labels, colors, '%d%%', shadow=True, startangle=90) mp.axis('equal') mp.show()
-
-
坐标格线
-
ax = mp.gca()
-
ax.grid(which=主次刻度, axis=横纵轴,
linewidth=线宽, linestyle=线型, color=颜色) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np import matplotlib.pyplot as mp x = np.linspace(-5, 5, 1000) y = 8 * np.sinc(x) mp.figure('Grid', facecolor='lightgray') mp.title('Grid', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator(mp.MultipleLocator()) ax.xaxis.set_minor_locator(mp.MultipleLocator(.1)) ax.yaxis.set_major_locator(mp.MultipleLocator()) ax.yaxis.set_minor_locator(mp.MultipleLocator(.1)) mp.tick_params(labelsize=10) ax.grid(which='major', axis='both', linewidth=0.75, linestyle='-', color='lightgray') ax.grid(which='minor', axis='both', linewidth=0.25, linestyle='-', color='lightgray') mp.plot(x, y, c='dodgerblue', label=r'$y=8sinc(x)$') mp.legend() mp.show()
-
-
极坐标
- ax = mp.gca(projection=‘polar’)
- mp.plot(极角, 极径, …)
numpy的通用函数
-
读取文本文件
numpy.loadtxt(
文件名,
delimiter=分隔符,
usecols=选择列,
unpack=是否解包,
dtype=目标类型,
converters=转换器)->二维数组(unpack=False)/
列一维数组集(unpack=True) -
保存文本文件
- numpy.savetxt(
文件名,
二维数组,
delimiter=分隔符,
fmt=格式) - 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(1, 10).reshape(3, 3) print(a) np.savetxt('C:/Users/Administrator/Desktop/test.csv', a, delimiter=',', fmt='%d') b = np.loadtxt('C:/Users/Administrator/Desktop/test.csv', delimiter=',', dtype='i4') print(b) c = np.loadtxt('C:/Users/Administrator/Desktop/test.csv', delimiter=',', usecols=(0, 2), dtype='i4') print(c) d, e = np.loadtxt('C:/Users/Administrator/Desktop/test.csv', delimiter=',', usecols=(0, 2), unpack=True, dtype='i4, f8') print(d, e) ``` ```python from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime(dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, opening_prices, highest_prices, \ lowest_prices, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 3, 4, 5, 6), unpack=True, dtype='M8[D], f8, f8, f8, f8', converters={1: dmy2ymd}) mp.figure('Candlestick', facecolor='lightgray') mp.title('Candlestick', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Price', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) rise = closing_prices - opening_prices >= 0.01 fall = opening_prices - closing_prices >= 0.01 fc = np.zeros(dates.size, dtype='3f4') ec = np.zeros(dates.size, dtype='3f4') fc[rise], fc[fall] = (1, 1, 1), (0, 0.5, 0) ec[rise], ec[fall] = (1, 0, 0), (0, 0.5, 0) mp.bar(dates, highest_prices - lowest_prices, 0, lowest_prices, color=fc, edgecolor=ec) mp.bar(dates, closing_prices - opening_prices, 0.8, opening_prices, color=fc, edgecolor=ec) mp.gcf().autofmt_xdate() mp.show() - numpy.savetxt(

-
算数平均值
- 样本:S = [s1, s2, …, sn]
- 算数平均值:m = (s1+s2+…+sn)/n
- numpy.mean(样本数组)->算数平均值
- 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np closing_prices =np.array([100,20,1000,300,28,91]) mean = 0 for closing_price in closing_prices: mean += closing_price mean /= closing_prices.size print(mean) mean = np.mean(closing_prices) print(mean)
-
加权平均值
-
样本:S = [s1, s2, …, sn]
-
权重:W=[w1,w2,…,wn]
-
加权平均值:
a = (s1w1+s2w2+…+snwn)/(w1+w2+…+wn)
numpy.average(样本数组, weights=权重数组)
->加权平均值
成交量加权平均价格(VWAP) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np closing_prices, volumes =np.array([100,98,10,20]),np.array([10,2,3,4]) vwap, vsum = 0, 0 for closing_price, volume in zip( closing_prices, volumes): vwap += closing_price * volume vsum += volume vwap /= vsum print(vwap) vwap = np.average(closing_prices, weights=volumes) print(vwap) -
时间加权平均价格(TWAP)
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np def dmy2days(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime(dmy, '%d-%m-%Y').date() days = (date - dt.date.min).days return days days, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 6), unpack=True, converters={1: dmy2days}) twap, tsum = 0, 0 for closing_price, day in zip( closing_prices, days): twap += closing_price * day tsum += day twap /= tsum print(twap) twap = np.average(closing_prices, weights=days) print(twap)
-
-
最大值和最小值
-
max/min: 获取一个数组中的最大/最小元素
a:
9 7 5
3 1 8
6 6 1
numpy.max(a)->9
numpy.min(a)->1 -
maximum/minimum: 在两个数组的对应元素之间构造最大值/最小值数组
-
说明
b:
6 1 9
7 1 7
4 4 5
numpy.maximum(a, b)->
9 7 9
7 1 8
6 6 5 -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.random.randint(10, 100, 9).reshape(3, 3) print(a) print(np.max(a), a.max()) print(np.min(a), a.min()) print(np.argmax(a), a.argmax()) print(np.argmin(a), a.argmin()) b = np.random.randint(10, 100, 9).reshape(3, 3) print(b) print(np.maximum(a, b)) print(np.minimum(a, b))
-
-
价格波动范围=最高的最高价-最低的最低价
-
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np highest_prices, lowest_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(4, 5), unpack=True) max_highest_price, min_lowest_price = \ highest_prices[0], lowest_prices[0] for highest_price, lowest_price in zip( highest_prices, lowest_prices): if highest_price > max_highest_price: max_highest_price = highest_price if lowest_price < min_lowest_price: min_lowest_price = lowest_price range = max_highest_price - min_lowest_price print(range) range = highest_prices.max() - lowest_prices.min() print(range)
-
-
ptp: 极差,一个数组最大值和最小值之差
numpy.ptp(数组)->数组.max()-数组.min()
价格波动幅度=某一种价格的极差
代码:# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np highest_prices, lowest_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(4, 5), unpack=True) max_highest_price, min_highest_price, \ max_lowest_price, min_lowest_price = \ highest_prices[0], highest_prices[0], \ lowest_prices[0], lowest_prices[0] for highest_price, lowest_price in zip( highest_prices, lowest_prices): if highest_price > max_highest_price: max_highest_price = highest_price if highest_price < min_highest_price: min_highest_price = highest_price if lowest_price > max_lowest_price: max_lowest_price = lowest_price if lowest_price < min_lowest_price: min_lowest_price = lowest_price high_spread = max_highest_price - min_highest_price low_spread = max_lowest_price - min_lowest_price print(high_spread, low_spread) high_spread = np.ptp(highest_prices) low_spread = np.ptp(lowest_prices) print(high_spread, low_spread) -
中位数:将多个样本按照大小顺序排列,居于中间位置的元素即为中位数。
-
说明:
12 23 45 67 89 ^ 12 23 45 67 \___/ 34 ^ A: 样本集 L: 样本数 M = (A[(L-1)/2]+A[L/2])/2 numpy.median(数组)->中位数 -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(6), unpack=True) sorted_prices = np.msort(closing_prices) l = sorted_prices.size median = (sorted_prices[int((l - 1) / 2)] + sorted_prices[int(l / 2)]) / 2 print(median) median = np.median(closing_prices) print(median)
-
-
标准差
- 说明:
样本:S = [s1, s2, …, sn]
均值:m = (s1+s2+…+sn)/n
离差:D = [s1-m, s2-m, …, sn-m]
方差:v = (s1-m)^2 +(s2-m)^2 +…(sn-m)^2)/n
标准差:std = sqrt(v) (方均根离差)
numpy.std(数组, ddof=非自由度)->标准差
总体方差和总体标准差:…/n
样本方差和样本标准差:…/(n-1)
10
50
25 25 - 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(6), unpack=True) mean = np.mean(closing_prices) devs = closing_prices - mean pvar = (devs ** 2).mean() pstd = np.sqrt(pvar) print(pstd) pstd = np.std(closing_prices) print(pstd) svar = (devs ** 2).sum() / (devs.size - 1) sstd = np.sqrt(svar) print(sstd) sstd = np.std(closing_prices, ddof=1) print(sstd)
- 说明:
针对日期的处理
-
星期数据
-
说明:
数组[关系表达式]:关系表达式的值是一个布尔型数组,其中为True的元素对应于数组中满足关系表达式的元素,
以上下标运算的值就是从数组中拣选与布尔数组中为True的元素相对应的元素。
np.where(关系表达式)->数组中满足关系表达式的元素的下标数组。
np.take(数组,下标数组)->数组中由下标数组所标识的元素集合。 -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np def dmy2wday(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime(dmy, '%d-%m-%Y').date() wday = date.weekday() # 用0-6表示周一到周日 return wday wdays, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 6), unpack=True, converters={1: dmy2wday}) ave_closing_prices = np.zeros(5) for wday in range(ave_closing_prices.size): ''' ave_closing_prices[wday] = \ closing_prices[wdays == wday].mean() ave_closing_prices[wday] = \ closing_prices[np.where(wdays == wday)].mean() ''' ave_closing_prices[wday] = \ np.take(closing_prices, np.where(wdays == wday)).mean() for wday, ave_closing_price in zip( ['MON', 'TUE', 'WED', 'THU', 'FRI'], ave_closing_prices): print(wday, np.round(ave_closing_price, 2))
-
-
星期汇总
- 说明
np.apply_along_axis(函数, 轴向, 高维数组)
在高维数组中,沿着指定轴向,提起低维子数组,作为参数传递给特定的函数,并将其返回值按照同样的轴向组成成新的数组返回给调用者。
轴向:
二维,0-行方向,1-列方向
三维,0-页方向,1-行方向,2-列方向 - 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np def pingfang(x): print('pingfang:', x) return x * x X = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9]]) Y = np.apply_along_axis(pingfang, 1, X) print(Y)# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np def dmy2wday(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() wday = date.weekday() return wday wdays, opening_prices, highest_prices, \ lowest_prices, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 3, 4, 5, 6), unpack=True, converters={1: dmy2wday}) wdays = wdays[:16] opening_prices = opening_prices[:16] highest_prices = highest_prices[:16] lowest_prices = lowest_prices[:16] closing_prices = closing_prices[:16] first_monday = np.where(wdays == 0)[0][0] last_friday = np.where(wdays == 4)[0][-1] indices = np.arange(first_monday, last_friday + 1) indices = np.split(indices, 3) def week_summary(indices): opening_price = opening_prices[indices[0]] highest_price = np.max(np.take( highest_prices, indices)) lowest_price = np.min(np.take( lowest_prices, indices)) closing_price = closing_prices[indices[-1]] return opening_price, highest_price, \ lowest_price, closing_price summaries = np.apply_along_axis( week_summary, 1, indices) print(summaries) np.savetxt('./summary.csv', summaries, delimiter=',', fmt='%g')
- 说明
-
一维卷积
-
说明:
a: [1 2 3 4 5] - 被卷积数组
b: [6 7 8] - 卷积核数组
c = a (x) b = [6 19 40 61 82 67 40] - full
[19 40 61 82 67] - same
[40 61 82] - valid
6 19 40 61 82 67 40
0 0 1 2 3 4 5 0 0
8 7 6
8 7 6
8 7 6
8 7 6
8 7 6
8 7 6
8 7 6
numpy.convolve(a, b, ‘full’/‘same’/‘valid’) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(1, 6) print('a:', a) b = np.arange(6, 9) print('b:', b) c = np.convolve(a, b, 'full') print('c ( full):', c) c = np.convolve(a, b, 'same') print('c ( same):', c) c = np.convolve(a, b, 'valid') print('c (valid):', c)
-
-
移动均线
- 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) ma51 = np.zeros(closing_prices.size - 4) for i in range(ma51.size): ma51[i] = closing_prices[i:i + 5].mean() ma52 = np.convolve(closing_prices, np.ones(5) / 5, 'valid') weights = np.exp(np.linspace(-1, 0, 5)) weights /= weights.sum() ma53 = np.convolve(closing_prices, weights[::-1], 'valid') ma10 = np.convolve(closing_prices, np.ones(10) / 10, 'valid') mp.figure('Moving Average', facecolor='lightgray') mp.title('Moving Average', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Price', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) mp.plot(dates, closing_prices, c='lightgray', label='Closing Price') mp.plot(dates[4:], ma51, c='orangered', linewidth=1, label='MA-51') mp.plot(dates[4:], ma52, c='orangered', alpha=0.25, linewidth=5, label='MA-52') mp.plot(dates[4:], ma53, c='limegreen', label='MA-53') mp.plot(dates[9:], ma10, c='dodgerblue', label='MA-10') mp.legend() mp.gcf().autofmt_xdate() mp.show() - 说明:
[a b c d e] [A B C D E]
(aA+bB+cC+dD+eE)/(A+B+C+D+E)
(aA+bB+cC+dD+eE)/S
aA/S+bB/S+cC/S+dD/S+eE/S
[A/S B/S C/S D/S E/S] - 布林带
- 中轨:移动均线
- 上轨:中轨 + 2x标准差
- 下轨:中轨 - 2x标准差
- 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) N = 5 medios = np.convolve(closing_prices, np.ones(N) / N, 'valid') stds = np.zeros(medios.size) for i in range(stds.size): stds[i] = np.std(closing_prices[i:i + N]) lowers = medios - 2 * stds uppers = medios + 2 * stds mp.figure('Bollinger Bands', facecolor='lightgray') mp.title('Bollinger Bands', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Price', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) mp.plot(dates, closing_prices, c='lightgray', label='Closing Price') mp.plot(dates[N - 1:], medios, c='dodgerblue', label='Medio') mp.plot(dates[N - 1:], lowers, c='limegreen', label='Lower') mp.plot(dates[N - 1:], uppers, c='orangered', label='Upper') mp.legend() mp.gcf().autofmt_xdate() mp.show()
- 代码:
-
线性模型
-
说明:
1 2 3 4 60 70 80 90 y = kx+b 1)线性预测 a b c d e f ? ? d = aA+bB+cC \ e = bA+cB+dC > A B C f = cA+dB+eC / ? = dA+eB+fC / a b c\ / A \ / d \ | b c d | X | B | = | e | \ c d e / \ C / \ f / --------- ----- ----- a x b = numpy.linalg.lstsq(a, b) bx=>? -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import pandas as pd import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) N = 5 pred_prices = np.zeros( closing_prices.size - 2 * N + 1) for i in range(pred_prices.size): a = np.zeros((N, N)) for j in range(N): a[j, ] = closing_prices[i + j: i + j + N] b = closing_prices[i + N: i + N * 2] x = np.linalg.lstsq(a, b)[0] pred_prices[i] = b.dot(x) print(pred_prices) mp.figure('Stock Price Prediction', facecolor='lightgray') mp.title('Stock Price Prediction', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Price', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) mp.plot(dates, closing_prices, 'o-', c='lightgray', label='Closing Price') dates = np.append( dates, dates[-1] + pd.tseries.offsets.BDay()) mp.plot(dates[N * 2:], pred_prices, 'o-', c='orangered', label='Predicted Price') mp.legend() mp.gcf().autofmt_xdate() mp.show()
-
-
线性拟合
-
说明
kx + b = y kx1 + b = y1 kx2 + b = y2 ... kxn +b = yn / x1 1 \ / k \ / y1 \ | x2 1 | X | b | = | y2 | | ... | \ / | ... | \ xn 1 / \ yn / -------- ---- ------ a x b = np.linalg.lstsq(a, b) -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime(dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, opening_prices, highest_prices, \ lowest_prices, closing_prices = np.loadtxt( './aapl.csv', delimiter=',', usecols=(1, 3, 4, 5, 6), unpack=True, dtype='M8[D], f8, f8, f8, f8', converters={1: dmy2ymd}) trend_points = (highest_prices + lowest_prices + closing_prices) / 3 spreads = highest_prices - lowest_prices resistance_points = trend_points + spreads support_points = trend_points - spreads days = dates.astype(int) a = np.column_stack((days, np.ones_like(days))) x = np.linalg.lstsq(a, trend_points)[0] trend_line = days * x[0] + x[1] x = np.linalg.lstsq(a, resistance_points)[0] resistance_line = days * x[0] + x[1] x = np.linalg.lstsq(a, support_points)[0] support_line = days * x[0] + x[1] mp.figure('Trend', facecolor='lightgray') mp.title('Trend', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Price', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) rise = closing_prices - opening_prices >= 0.01 fall = opening_prices - closing_prices >= 0.01 fc = np.zeros(dates.size, dtype='3f4') ec = np.zeros(dates.size, dtype='3f4') fc[rise], fc[fall] = (1, 1, 1), (0.85, 0.85, 0.85) ec[rise], ec[fall] = (0.85, 0.85, 0.85), (0.85, 0.85, 0.85) mp.bar(dates, highest_prices - lowest_prices, 0, lowest_prices, color=fc, edgecolor=ec) mp.bar(dates, closing_prices - opening_prices, 0.8, opening_prices, color=fc, edgecolor=ec) mp.scatter(dates, trend_points, c='dodgerblue', alpha=0.5, s=60, zorder=2) mp.scatter(dates, resistance_points, c='orangered', alpha=0.5, s=60, zorder=2) mp.scatter(dates, support_points, c='limegreen', alpha=0.5, s=60, zorder=2) mp.plot(dates, trend_line, c='dodgerblue', linewidth=3, label='Trend') mp.plot(dates, resistance_line, c='orangered', linewidth=3, label='Resistance') mp.plot(dates, support_line, c='limegreen', linewidth=3, label='Support') mp.legend() mp.gcf().autofmt_xdate() mp.show()
-
-
裁剪、压缩和累乘
-
ndarray.clip(min=最小值, max=最大值)
将调用数组中小于min的元素设置为min,大于max的元素设置为max。 -
ndarray.compress(条件)
返回调用数组中满足给定条件的元素。 -
ndarray.prod()
返回调用数组中各元素的乘积。
ndarray.cumprod()
返回调用数组中各元素计算累乘的过程数组。
代码:# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.arange(1, 10).reshape(3, 3) print(a) b = a.clip(min=3, max=7) print(b) c = a.compress(3 < a.ravel()).reshape(-1, 3) print(c) d = a.compress(a.ravel() < 7).reshape(-1, 3) print(d) e = a.compress((3 < a.ravel()) & (a.ravel() < 7)) print(e) f = a.prod() print(f) g = 1 for elem in a.flat: g *= elem print(g) h = a.cumprod() print(h) i = [1] for elem in a.flat: i.append(i[-1] * elem) i = np.array(i[1:]) print(i) def jiecheng(n): if n == 1: return 1 return n * jiecheng(n - 1) print(jiecheng(9)) print(np.arange(1, 10).prod())
-
-
相关性
-
样本:
a = [a1, a2, …, an]
b = [b1, b2, …, bn] -
均值:
ave(a) = (a1+a2+…+an)/n
ave(b) = (b1+b2+…+bn)/n -
离差:
dev(a) = [a1, a2, …, an] - ave(a)
dev(b) = [b1, b2, …, bn] - ave(b) -
方差:
var(a) = ave(dev(a)dev(a))
var(b) = ave(dev(b)dev(b)) -
标准差:
std(a) = sqrt(var(a))
std(b) = sqrt(var(b)) -
协方差:
cov(a,b) = ave(dev(a)dev(b))
cov(b,a) = ave(dev(b)dev(a)) -
相关性系数:
cov(a,b)/std(a)std(b) cov(b,a)/std(b)std(a) [-1, 1]:正负表示了相关性方向为正或反,绝对值表示相关性强弱, 越大越强,越小越弱,0表示不相关。 相关性矩阵: / var(a)/std(a)std(a)=1 cov(a,b)/std(a)std(b) \ | | \ cov(b,a)/std(b)std(a) var(b)/std(b)std(b)=1 / numpy.corrcoef(a, b)->相关性矩阵 -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, bhp_closing_prices = np.loadtxt( './bhp.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) _, vale_closing_prices = np.loadtxt( './vale.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) bhp_returns = np.diff( bhp_closing_prices) / bhp_closing_prices[:-1] vale_returns = np.diff( vale_closing_prices) / vale_closing_prices[:-1] ave_a = np.mean(bhp_returns) dev_a = bhp_returns - ave_a var_a = np.mean(dev_a * dev_a) std_a = np.sqrt(var_a) ave_b = np.mean(vale_returns) dev_b = vale_returns - ave_b var_b = np.mean(dev_b * dev_b) std_b = np.sqrt(var_b) cov_ab = np.mean(dev_a * dev_b) cov_ba = np.mean(dev_b * dev_a) covs = np.array([ [var_a, cov_ab], [cov_ba, var_b]]) stds = np.array([ [std_a * std_a, std_a * std_b], [std_b * std_a, std_b * std_b]]) corr = covs / stds print(corr) corr = np.corrcoef(bhp_returns, vale_returns) print(corr) mp.figure('Correlation Of Returns', facecolor='lightgray') mp.title('Correlation Of Returns', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Returns', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) mp.plot(dates[:-1], bhp_returns, c='orangered', label='BHP') mp.plot(dates[:-1], vale_returns, c='dodgerblue', label='VALE') mp.legend() mp.gcf().autofmt_xdate() mp.show()
-
-
多项式拟合
-
说明
用一个无穷级数表示一个可微函数。实际上任何可微的函数,总可以用一个N次多项式函数来近似,而比N次幂更高阶的部分可以作为无穷小量而被忽略不计。
f(x) = p0x^n + p1x^n-1 + p2x^n-2 + … + pn
y0 = f(x0)
y1 = f(x1)
y2 = f(x2)
…
yn = f(xn)
numpy.ployfit(自变量数组, 函数值数组, 最高次幂(n))
->[p0, p1, …, pn]
numpy.polyval([p0, p1, …, pn], 自变量数组)->函数值数组
numpy.roots([p0, p1, …, pn])->多项式方程的根
y = 3x^2+4x+1
y’ = 6x+4
y’’= 6
numpy.polyder([p0, p1, …, pn])->导函数系数数组 -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, bhp_closing_prices = np.loadtxt( '../../data/bhp.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) _, vale_closing_prices = np.loadtxt( '../../data/vale.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) diff_closing_price = bhp_closing_prices - \ vale_closing_prices days = dates.astype(int) p = np.polyfit(days, diff_closing_price, 4) poly_closing_price = np.polyval(p, days) q = np.polyder(p) roots = np.roots(q) reals = roots[np.isreal(roots)].real peeks = [[days[0], np.polyval(p, days[0])]] for real in reals: if days[0] < real and real < days[-1]: peeks.append([real, np.polyval(p, real)]) peeks.append([days[-1], np.polyval(p, days[-1])]) peeks.sort() peeks = np.array(peeks) mp.figure('Polynomial Fitting', facecolor='lightgray') mp.title('Polynomial Fitting', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Difference Price', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) mp.plot(dates, poly_closing_price, c='dodgerblue', linewidth=3, label='Polynomial Fitting') mp.scatter(dates, diff_closing_price, c='limegreen', alpha=0.5, s=60, label='Difference Price') dates, prices = np.hsplit(peeks, 2) dates = dates.astype(int).astype( 'M8[D]').astype(md.datetime.datetime) for i in range(1, dates.size): mp.annotate( '', xytext=(dates[i - 1], prices[i - 1]), xy=(dates[i], prices[i]), size=40, arrowprops=dict(arrowstyle='fancy', color='orangered', alpha=0.25)) mp.scatter(dates, prices, marker='^', c='orangered', s=80, label='Peek', zorder=4) mp.legend() mp.gcf().autofmt_xdate() mp.show()
-
-
符号数组
- 说明
a: [10 -20 30 0 40 -50 -60 0 70]
numpy.sign(a)->[1 -1 1 0 1 -1 -1 0 1]
净额成交量(OBV)
numpy.piecewise(被判断数组, [条件1, 条件2, …],
[标志1, 标志2, …])->满足每个条件的标志数组 - 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, closing_prices, volumes = np.loadtxt( './bhp.csv', delimiter=',', usecols=(1, 6, 7), unpack=True, dtype=np.dtype('M8[D], f8, f8'), converters={1: dmy2ymd}) diff_closing_price = np.diff(closing_prices) ''' sign_closing_price = np.sign(diff_closing_price) ''' sign_closing_price = np.piecewise( diff_closing_price, [diff_closing_price < 0, diff_closing_price == 0, diff_closing_price > 0], [-1, 0, 1]) obvs = volumes[1:] * sign_closing_price mp.figure('On-Balance Volume', facecolor='lightgray') mp.title('On-Balance Volume', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('OBV', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(axis='y', linestyle=':') dates = dates[1:].astype(md.datetime.datetime) rise = obvs > 0 fall = obvs < 0 fc = np.zeros(dates.size, dtype='3f4') ec = np.zeros(dates.size, dtype='3f4') fc[rise], fc[fall] = (1, 0, 0), (0, 0.5, 0) ec[rise], ec[fall] = (1, 1, 1), (1, 1, 1) mp.bar(dates, obvs, 1.0, 0, color=fc, edgecolor=ec, label='OBV') mp.legend() mp.gcf().autofmt_xdate() mp.show()
- 说明
-
矢量化
-
说明:
def 标量函数(标量参数1, 标量参数2, …):
…
return 标量返回值1, 标量返回值2, …
np.vectorize(标量函数)->矢量函数
矢量函数(矢量参数1, 矢量参数2, …)
->矢量返回值1, 矢量返回值2, … -
代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np def fun(a, b): return a + b, a - b, a * b A = np.array([10, 20, 30]) B = np.array([100, 200, 300]) C = np.vectorize(fun)(A, B) print(C)# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, opening_prices, highest_prices, \ lowest_prices, closing_prices = np.loadtxt( '../../data/bhp.csv', delimiter=',', usecols=(1, 3, 4, 5, 6), unpack=True, dtype=np.dtype('M8[D], f8, f8, f8, f8'), converters={1: dmy2ymd}) def profit(opening_price, highest_price, lowest_price, closing_price): buying_price = opening_price * 0.99 if lowest_price <= buying_price <= highest_price: return (closing_price - buying_price) * 100 / buying_price return np.nan profits = np.vectorize(profit)( opening_prices, highest_prices, lowest_prices, closing_prices) nan = np.isnan(profits) dates, profits = dates[~nan], profits[~nan] gain_dates, gain_profits = \ dates[profits > 0], profits[profits > 0] loss_dates, loss_profits = \ dates[profits < 0], profits[profits < 0] mp.figure('Trading Simulation', facecolor='lightgray') mp.title('Trading Simulation', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Profit', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') if dates.size > 0: dates = dates.astype(md.datetime.datetime) mp.plot(dates, profits, c='gray', label='Profit') mp.axhline(y=profits.mean(), linestyle='--', color='gray') if gain_dates.size > 0: gain_dates = gain_dates.astype( md.datetime.datetime) mp.plot(gain_dates, gain_profits, 'o', c='orangered', label='Gain Profit') mp.axhline(y=gain_profits.mean(), linestyle='--', color='orangered') if loss_dates.size > 0: loss_dates = loss_dates.astype( md.datetime.datetime) mp.plot(loss_dates, loss_profits, 'o', c='limegreen', label='Loss Profit') mp.axhline(y=loss_profits.mean(), linestyle='--', color='limegreen') mp.legend() mp.gcf().autofmt_xdate() mp.show()
-
-
数据平滑与特征值
- 说明
卷积降噪->曲线拟合->特征值
消除随机 获得数学 反映业务
噪声的干 模型 特征
扰 - 代码:
# -*- coding: utf-8 -*- from __future__ import unicode_literals import datetime as dt import numpy as np import matplotlib.pyplot as mp import matplotlib.dates as md def dmy2ymd(dmy): dmy = str(dmy, encoding='utf-8') date = dt.datetime.strptime( dmy, '%d-%m-%Y').date() ymd = date.strftime('%Y-%m-%d') return ymd dates, bhp_closing_prices = np.loadtxt( './bhp.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) _, vale_closing_prices = np.loadtxt( './vale.csv', delimiter=',', usecols=(1, 6), unpack=True, dtype=np.dtype('M8[D], f8'), converters={1: dmy2ymd}) bhp_returns = np.diff( bhp_closing_prices) / bhp_closing_prices[:-1] vale_returns = np.diff( vale_closing_prices) / vale_closing_prices[:-1] N = 8 weights = np.hanning(N) # 汉宁窗 weights /= weights.sum() bhp_smooth_returns = np.convolve( bhp_returns, weights, 'valid') vale_smooth_returns = np.convolve( vale_returns, weights, 'valid') days = dates[N - 1:-1].astype(int) degree = 3 bhp_p = np.polyfit(days, bhp_smooth_returns, degree) bhp_fitted_returns = np.polyval(bhp_p, days) vale_p = np.polyfit(days, vale_smooth_returns, degree) vale_fitted_returns = np.polyval(vale_p, days) sub_p = np.polysub(bhp_p, vale_p) roots = np.roots(sub_p) reals = roots[np.isreal(roots)].real inters = [] for real in reals: if days[0] <= real <= days[-1]: inters.append( [real, np.polyval(bhp_p, real)]) inters.sort() inters = np.array(inters) mp.figure('Smoothing Returns', facecolor='lightgray') mp.title('Smoothing Returns', fontsize=20) mp.xlabel('Date', fontsize=14) mp.ylabel('Returns', fontsize=14) ax = mp.gca() ax.xaxis.set_major_locator( md.WeekdayLocator(byweekday=md.MO)) ax.xaxis.set_minor_locator( md.DayLocator()) ax.xaxis.set_major_formatter( md.DateFormatter('%d %b %Y')) mp.tick_params(labelsize=10) mp.grid(linestyle=':') dates = dates.astype(md.datetime.datetime) mp.plot(dates[:-1], bhp_returns, c='orangered', alpha=0.25, label='BHP') mp.plot(dates[:-1], vale_returns, c='dodgerblue', alpha=0.25, label='VALE') mp.plot(dates[N - 1:-1], bhp_smooth_returns, c='orangered', alpha=0.75, label='Smooth BHP') mp.plot(dates[N - 1:-1], vale_smooth_returns, c='dodgerblue', alpha=0.75, label='Smooth VALE') mp.plot(dates[N - 1:-1], bhp_fitted_returns, c='orangered', linewidth=3, label='Fitted BHP') mp.plot(dates[N - 1:-1], vale_fitted_returns, c='dodgerblue', linewidth=3, label='Fitted VALE') dates, returns = np.hsplit(inters, 2) dates = dates.astype(int).astype( 'M8[D]').astype(md.datetime.datetime) mp.scatter(dates, returns, marker='x', c='firebrick', s=100, lw=3, zorder=3) mp.legend() mp.gcf().autofmt_xdate() mp.show() - 示例:
y = f(x) -> y1 = f(x1)
y = g(x) -> y1 = g(x1)
f(x1) = g(x1)
f(x1)-g(x1)=0
f(x)-g(x)=0的根就是x1
np.polysub(p1, p2)->p3
np.roots(p3)->x1
- 说明
矩阵和ufunc
-
矩阵
- numpy.matrix(可被解释为矩阵的二维容器,
copy=[True]/False)->矩阵对象
1 2 3
4 5 6
‘1 2 3; 4 5 6’ - numpy.mat(可被解释为矩阵的二维容器)
数据共享,相当于copy=False的matrix()
numpy.bmat(‘A B; C D’)
- numpy.matrix(可被解释为矩阵的二维容器,
-
代码:mat.py
# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np a = np.array([ [1, 2], [3, 4]]) print(a, type(a)) b = np.matrix(a, copy=False) print(b, type(b)) c = np.mat(a) print(c, type(c)) a *= 10 print(a, b, c, sep='\n') d = np.mat('1 2; 3 4') print(d) e = np.mat('5 6; 7 8') f = np.bmat('d e') print(f) g = np.bmat('d; e') print(g) h = d.I print(h) print(h * d) i = f.I print(i) # 广义逆矩阵 j = np.array([ [5, 6], [7, 8]]) k = a * j print(a, j, k, sep='\n') a = np.mat(a) j = np.mat(j) k = a * j print(a, j, k, sep='\n')
2.ufunc, 统一(泛)化函数
-
numpy.frompyfunc(标量函数, 参数个数, 返回值个数)
->numpy.ufunc类型的函数对象
ufunc函数对象(矢量参数, …)->矢量返回值, …
代码:# -*- coding: utf-8 -*- from __future__ import unicode_literals import numpy as np def fun(a, b): return a + b, a - b, a * b A = np.array([10, 20, 30]) B = np.array([100, 200, 300]) C = np.vectorize(fun)(A, B) print(C) C = np.frompyfunc(fun, 2, 3)(A, B) print(C) def foo(a): def bar(b): return a + b, a - b, a * b return np.frompyfunc(bar, 1, 3) C = foo(100)(A) print(C) C = foo(B)(A) print(C) -
numpy.add
reduce - 累加
accumulate - 累加过程
reduceat - 在指定位置累加
outer - 外和
代码:add.py -
除法
A.真除
[5 5 -5 -5]<真除>[2 -2 2 -2]=[2.5 -2.5 -2.5 2.5]
numpy.true_divide()
numpy.divide()
/
B.地板除
[5 5 -5 -5]<地板除>[2 -2 2 -2]=[2 -3 -3 2]
numpy.floor_divide()
//
C.天花板除
[5 5 -5 -5]<天花板除>[2 -2 2 -2]=[3 -2 -2 3]
D.截断除
[5 5 -5 -5]<截断除>[2 -2 2 -2]=[2 -2 -2 2]
代码:div.py -
余数
被除数<除以>除数=商…余数
除数x商+余数=被除数
地板余数:做地板除所得到的余数
[5 5 -5 -5]<地板除>[2 -2 2 -2]=[2 -3 -3 2]…[1 -1 1 -1]
numpy.remainder()
numpy.mod()
%
截断余数:做截断除所得到的余数
[5 5 -5 -5]<截断除>[2 -2 2 -2]=[2 -2 -2 2]…[1 1 -1 -1]
numpy.fmod()
代码:mod.py -
python中几乎所有的算术和关系运算符都被numpy借助ufunc实现为可对数组操作的矢量化运算符。
代码:fibo.py
1 1 1 1 1 1
1 0 1 0 1 0
1 1 2 1 3 2 5 3
1 0 1 1 2 1 3 2 …
f1f2 f3 f4 f5 fn
F^2 3 4 n-1 -
numpy中的三角函数都是ufunc对象,可以对参数数组中的每个元素进行三角函数运算,并将运算结果以数组形式返回。
x = Asin(at+pi/2)
y = Bsin(bt)
代码:lissa.py
4 sin((2k-1)t)
— x --------------
pi 2k-1
k=1,2,3
代码:squr.py -
实现位运算的ufunc
A.异或:^/xor/bitwise_xor
1 ^ 0 = 1
1 ^ 1 = 0
0 ^ 0 = 0
0 ^ 1 = 1
if a^b < 0 then a和b异号
B.与:&/and/bitwise_and
1 & 0 = 0
1 & 1 = 1
0 & 0 = 0
0 & 1 = 0
1 2^0 00000001 -1 -> 00000000
2 2^1 00000010 -1 -> 00000001
4 2^2 00000100 -1 -> 00000011
8 2^3 00001000 -1 -> 00000111
16 2^4 00010000 -1 -> 00001111
_&_/
|
0
if a & (a-1) == 0 then a是2的幂
代码:bit.py
C.移位:<</lshift/left_shift (乘2)
>>/rshift/right_shift (除2)
Numpy的子模块
-
线性代数模块(linalg)
-
矩阵的逆:inv()
在线性代数中,矩阵A与其逆矩阵A^-1的乘积是一个单位矩阵I。
使用numpy.linalg.inv()函数求矩阵的逆矩阵,要求必须是方阵,即行列数相等的矩阵。
代码:inv.py -
解线性(一次)方程组:solve()
/ x-2y+z=0 | 2y-8z-8=0 \ -4x+5y+9z+9=0 x-7z-8=0 5x-10y+5z=0 -8x+10y+18z+18=0 -3x+23z+18=0 3x-21z-24=0 2z-6=0 -> z = 3 x = 21+8 = 29 29 -2y + 3 = 0 -> y = 16 / 1x + -2y + 1z = 0 | 0x + 2y + -8z = 8 \ -4x + 5y + 9z = -9 / 1 -2 1 \ / x \ / 0 \ | 0 2 -8 | X | y | = | 8 | \ -4 5 9 / \ z / \ -9 / ----------- ----- ------ a x b = numpy.linalg.lstsq(a, b)[0] = numpy.linalg.solve(a, b)
代码:solve.py
- 特征值和特征向量:eig()
对于n阶方阵A,如果存在数a和非零n维向量x,使得Ax=ax,则称a是矩阵A的一个特征值,x是矩阵A属于特征值a的特征向量。
numpy.linalg.eig(A) -> a, x
a: 1 2
| |
v v
x: 1 2
3 4
5 6
代码:eig.py - 奇异值分解:svd()
对于一个满足特定条件的矩阵M,可以被分解为三个矩阵的乘积,M=USV,其中U和V都是正交矩阵,即UUT=I,VVT=I
,S矩阵除主对角线以外的元素均为0,主对角线上的元素被称为矩阵M的奇异值。
numpy.linalg.svd(M)-> U, S主对角线上的元素, V
代码:svd.py - 广义逆矩阵:pinv()
代码:pinv.py - 行列式:det()
a b
c d ad-bc
a b c
d e f
g h i
a e f - b d f + c d e
h i g i g h
a(ei-gh)-b(di-fg)+c(dh-eg)
numpy.linalg.det(方阵)->行列式的值
代码:det.py
-
-
快速傅里叶变换模块(fft)
s=F(t) -> (A/P, fai) = G(f)
y = Asin(wx+fai)
w1 -> A1, f1
w2 -> A2, f2
…
(A, fai) = f(w)
代码:fft.py、filter.py -
随机数模块(random)
- 二项分布
numpy.random.binomial(n, p, size)
->包含size个随机数的数组,其中每个随机数来自n次尝试中的成功次数,每次尝试成功的概率为p。
猜硬币游戏:初始筹码1000,每轮猜9次,猜对5次或5次以上为赢,筹码加一,否则为输,筹码减一。模拟10000轮,记录筹码数的变化。binomial(9, 0.5, 10000)
代码:bi.py - 超几何分布
numpy.random.hypergeometric(ngood, nbad,
nsample, size)->包含size个随机数的数组,其中每个随机数来自随机抽取nsample个样本中好样本的个数,总样本中共有ngood个好样本,nbad个坏样本。
摸球球游戏:将25个好球和1个坏球放在一起,每轮摸出3个球,全为好球加1分,若有坏球则减6分。模拟100轮,记录分值的变化。hypergeometric(25, 1, 3, 100)
代码:hyper.py - 正态分布
numpy.random.normal(size)->包含size个随机数的数组,其中每个随机数服从标准正态分布规律,即平均值为0,标准差为1的正态分布。
[1 1 2 1 1 2 2 2 5 1 2 3 … 10 10]
[1, 3] 20
[4, 6] 60
[7, 10] 40
代码:norm.py
- 二项分布
Numpy的专用函数
- 间接联合排序
间接:获取排序样本的下标。
0 1 2 3 4 5 6 7 8
原始序列:8 2 3 1 7 4 6 5 9
直接排序:1 2 3 4 5 6 7 8 9
间接排序:3 1 2 5 7 6 4 0 8
姓名:张三 李四 王五 赵六 陈七
成绩:90 70 50 80 60
0 1 2 3 4
2 4 1 3 0
年龄:20 30 30 20 40
3 0 2 1 4
numpy.lexsort((参考序列, 待排序列))->索引序列
numpy.sort_complex(复数数组)->按实部的升序排列,实部相同的参考虚部的升序
代码:sort.py - 最大值最小值
numpy.xxx
max/min
argmax/argmin
nanmax/nanmin
nanargmax/nanargmin
max - 最大值
min - 最小值
arg - 间接,下标
nan - 忽略无效值
代码:nan.py
3. 有序插入
有序序列:[1, 2, 4, 5, 6, 8, 9]
被插序列:[7, 3]
将被插序列插入到有序序列的什么位置,结果还是有序的?
numpy.searchsorted(有序序列, 被插序列)->插入位置
numpy.insert(有序序列, 插入位置, 被插序列)->插入结果
代码:insert.py
4. 定积分
y = f(x)
/ b
| f(x)dx
/ a
import scipy.integrate as si
def f(x):
y = … x …
return y
si.quad(f, a, b)[0] -> 定积分值
代码:integ.py
5. 插值
import scipy.interpolate as si
si.interp1d(离散样本水平坐标,离散样本垂直坐标,
kind=插值器种类)->一维插值器对象
一维插值器对象(插值样本水平坐标)->插值样本垂直坐标
代码:inter.py
6. 金融计算
本文介绍了 Numpy 库的基本概念及其在数据分析和可视化中的应用。涵盖了数组操作、内置类型、数学函数等内容,并展示了如何利用 Numpy 和 Matplotlib 进行数据可视化。
226

被折叠的 条评论
为什么被折叠?



