Skip to content

Commit cf2f00b

Browse files
author
Tom May
committed
Add comment explaining how Cassandra is used and why we can't do something
simpler with a single column or supercolumn family.
1 parent dfb3070 commit cf2f00b

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

plugins/cassandra/src/main/java/org/elasticsearch/cassandra/blobstore/CassandraBlobStore.java

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,47 @@
1717
* under the License.
1818
*/
1919

20+
/**
21+
* The cassandra column families are defined in storage-conf.xml like
22+
* this:
23+
* <!-- Key is BlobPath/BlobName.
24+
* Column name is "data".
25+
* Value is blob data. -->
26+
* <ColumnFamily Name="Blobs"/>
27+
* />
28+
* <!-- Key is BlobPath.
29+
* Column names are BlobNames.
30+
* Value is size as a string. -->
31+
* <ColumnFamily Name="BlobNames"/>
32+
*
33+
* ElasticSearch needs us to support the following:
34+
* 1. Read, write, and delete a blob named by BlobPath + BlobName.
35+
* 2. Determine whether a Blob exists. (Actually this method, blobExists,
36+
* is not referenced in the code.)
37+
* 3. Get a list of BlobNames in a BlobPath, with their sizes.
38+
* 4. Delete all blobs in a BlobPath.
39+
*
40+
* Here are a couple ways to store the data, and why they don't work:
41+
* A. A single column family with BlobPath as the key and BlobName as the
42+
* column name. 1 and 4 are easy, but getting the sizes for 3 isn't
43+
* possible without fetching the entire blob. Same with 2.
44+
* B. A single supercolumn family with BlobPath as a key and BlobName as
45+
* the supercolumn name, with subcolumns data and size. 1, 2, and 4
46+
* are easy, but fetching the size for 3 requires fetching the entire
47+
* supercolumn, we can't just pick out the size column in a get_slice
48+
* request.
49+
*
50+
* The storage layout used allows us to do everything we need even though
51+
* it's a bit more complicated than A and B because it has two column
52+
* families.
53+
* X. Storing the blob names and sizes in BlobNames makes 3 possible,
54+
* but complicates 1 since we need to track things in BlobNames.
55+
* Y. Using BlobPath/BlobName as the key and storing the data in a column
56+
* makes 2 possible using get_count.
57+
* Z. But it complicates 4, which must be done by fetching the BlobPath's
58+
* BlobNames then deleting them.
59+
*/
60+
2061
package org.elasticsearch.cassandra.blobstore;
2162

2263
import org.elasticsearch.common.blobstore.BlobContainer;

0 commit comments

Comments
 (0)