|
10 | 10 | that are linked tend to cofluctuate during a day. |
11 | 11 |
|
12 | 12 |
|
13 | | -Clustering |
14 | | ----------- |
15 | | -
|
16 | | -We use clustering to group together quotes that behave similarly. Here, |
17 | | -amongst the :ref:`various clustering techniques <clustering>` available |
18 | | -in the scikit-learn, we use :ref:`affinity_propagation` as it does |
19 | | -not enforce equal-size clusters, and it can choose automatically the |
20 | | -number of clusters from the data. |
21 | | -
|
22 | | -
|
23 | 13 | Learning a graph structure |
24 | 14 | -------------------------- |
25 | 15 |
|
|
29 | 19 | symbol, the symbols that it is connected too are those useful to expain |
30 | 20 | its fluctuations. |
31 | 21 |
|
32 | | -Note that this gives us a different indication than the clustering. One |
33 | | -could apply graph clustering techniques (such as |
34 | | -:ref:`spectral_clustering`) on the corresponding graph, to retrieve a |
35 | | -clustering consistent with the partial-independence structure. |
| 22 | +Clustering |
| 23 | +---------- |
| 24 | +
|
| 25 | +We use clustering to group together quotes that behave similarly. Here, |
| 26 | +amongst the :ref:`various clustering techniques <clustering>` available |
| 27 | +in the scikit-learn, we use :ref:`affinity_propagation` as it does |
| 28 | +not enforce equal-size clusters, and it can choose automatically the |
| 29 | +number of clusters from the data. |
36 | 30 |
|
| 31 | +Note that this gives us a different indication than the graph, as the |
| 32 | +graph reflects conditional relations between variables, while the |
| 33 | +clustering reflects marginal properties: variables clustered together can |
| 34 | +be considered as having a similar impact at the level of the full stock |
| 35 | +market. |
37 | 36 |
|
38 | 37 | Embedding in 2D space |
39 | 38 | --------------------- |
|
156 | 155 | # The daily variations of the quotes are what carry most information |
157 | 156 | variation = close - open |
158 | 157 |
|
159 | | -############################################################################### |
160 | | -# Cluster using affinity propagation |
161 | | - |
162 | | -correlations = np.corrcoef(variation) |
163 | | -_, labels = cluster.affinity_propagation(correlations) |
164 | | -n_labels = labels.max() |
165 | | - |
166 | | -for i in range(n_labels + 1): |
167 | | - print 'Cluster %i: %s' % ((i + 1), ', '.join(names[labels == i])) |
168 | | - |
169 | 158 | ############################################################################### |
170 | 159 | # Learn a graphical structure from the correlations |
171 | 160 | edge_model = covariance.GraphLassoCV() |
|
176 | 165 | X /= X.std(axis=0) |
177 | 166 | edge_model.fit(X) |
178 | 167 |
|
| 168 | +############################################################################### |
| 169 | +# Cluster using affinity propagation |
| 170 | + |
| 171 | +_, labels = cluster.affinity_propagation(edge_model.covariance_) |
| 172 | +n_labels = labels.max() |
| 173 | + |
| 174 | +for i in range(n_labels + 1): |
| 175 | + print 'Cluster %i: %s' % ((i + 1), ', '.join(names[labels == i])) |
| 176 | + |
179 | 177 | ############################################################################### |
180 | 178 | # Find a low-dimension embedding for visualization: find the best position of |
181 | 179 | # the nodes (the stocks) on a 2D plane |
|
0 commit comments