Skip to content

Commit fca7dda

Browse files
committed
MISC: plot_stock_market cluster on learned covariance
Shows the relations between the methods
1 parent 75faece commit fca7dda

File tree

1 file changed

+22
-24
lines changed

1 file changed

+22
-24
lines changed

examples/applications/plot_stock_market.py

Lines changed: 22 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,6 @@
1010
that are linked tend to cofluctuate during a day.
1111
1212
13-
Clustering
14-
----------
15-
16-
We use clustering to group together quotes that behave similarly. Here,
17-
amongst the :ref:`various clustering techniques <clustering>` available
18-
in the scikit-learn, we use :ref:`affinity_propagation` as it does
19-
not enforce equal-size clusters, and it can choose automatically the
20-
number of clusters from the data.
21-
22-
2313
Learning a graph structure
2414
--------------------------
2515
@@ -29,11 +19,20 @@
2919
symbol, the symbols that it is connected too are those useful to expain
3020
its fluctuations.
3121
32-
Note that this gives us a different indication than the clustering. One
33-
could apply graph clustering techniques (such as
34-
:ref:`spectral_clustering`) on the corresponding graph, to retrieve a
35-
clustering consistent with the partial-independence structure.
22+
Clustering
23+
----------
24+
25+
We use clustering to group together quotes that behave similarly. Here,
26+
amongst the :ref:`various clustering techniques <clustering>` available
27+
in the scikit-learn, we use :ref:`affinity_propagation` as it does
28+
not enforce equal-size clusters, and it can choose automatically the
29+
number of clusters from the data.
3630
31+
Note that this gives us a different indication than the graph, as the
32+
graph reflects conditional relations between variables, while the
33+
clustering reflects marginal properties: variables clustered together can
34+
be considered as having a similar impact at the level of the full stock
35+
market.
3736
3837
Embedding in 2D space
3938
---------------------
@@ -156,16 +155,6 @@
156155
# The daily variations of the quotes are what carry most information
157156
variation = close - open
158157

159-
###############################################################################
160-
# Cluster using affinity propagation
161-
162-
correlations = np.corrcoef(variation)
163-
_, labels = cluster.affinity_propagation(correlations)
164-
n_labels = labels.max()
165-
166-
for i in range(n_labels + 1):
167-
print 'Cluster %i: %s' % ((i + 1), ', '.join(names[labels == i]))
168-
169158
###############################################################################
170159
# Learn a graphical structure from the correlations
171160
edge_model = covariance.GraphLassoCV()
@@ -176,6 +165,15 @@
176165
X /= X.std(axis=0)
177166
edge_model.fit(X)
178167

168+
###############################################################################
169+
# Cluster using affinity propagation
170+
171+
_, labels = cluster.affinity_propagation(edge_model.covariance_)
172+
n_labels = labels.max()
173+
174+
for i in range(n_labels + 1):
175+
print 'Cluster %i: %s' % ((i + 1), ', '.join(names[labels == i]))
176+
179177
###############################################################################
180178
# Find a low-dimension embedding for visualization: find the best position of
181179
# the nodes (the stocks) on a 2D plane

0 commit comments

Comments
 (0)