|
17 | 17 | * under the License.
|
18 | 18 | */
|
19 | 19 |
|
| 20 | +/** |
| 21 | + * The cassandra column families are defined in storage-conf.xml like |
| 22 | + * this: |
| 23 | + * <!-- Key is BlobPath/BlobName. |
| 24 | + * Column name is "data". |
| 25 | + * Value is blob data. --> |
| 26 | + * <ColumnFamily Name="Blobs"/> |
| 27 | + * /> |
| 28 | + * <!-- Key is BlobPath. |
| 29 | + * Column names are BlobNames. |
| 30 | + * Value is size as a string. --> |
| 31 | + * <ColumnFamily Name="BlobNames"/> |
| 32 | + * |
| 33 | + * ElasticSearch needs us to support the following: |
| 34 | + * 1. Read, write, and delete a blob named by BlobPath + BlobName. |
| 35 | + * 2. Determine whether a Blob exists. (Actually this method, blobExists, |
| 36 | + * is not referenced in the code.) |
| 37 | + * 3. Get a list of BlobNames in a BlobPath, with their sizes. |
| 38 | + * 4. Delete all blobs in a BlobPath. |
| 39 | + * |
| 40 | + * Here are a couple ways to store the data, and why they don't work: |
| 41 | + * A. A single column family with BlobPath as the key and BlobName as the |
| 42 | + * column name. 1 and 4 are easy, but getting the sizes for 3 isn't |
| 43 | + * possible without fetching the entire blob. Same with 2. |
| 44 | + * B. A single supercolumn family with BlobPath as a key and BlobName as |
| 45 | + * the supercolumn name, with subcolumns data and size. 1, 2, and 4 |
| 46 | + * are easy, but fetching the size for 3 requires fetching the entire |
| 47 | + * supercolumn, we can't just pick out the size column in a get_slice |
| 48 | + * request. |
| 49 | + * |
| 50 | + * The storage layout used allows us to do everything we need even though |
| 51 | + * it's a bit more complicated than A and B because it has two column |
| 52 | + * families. |
| 53 | + * X. Storing the blob names and sizes in BlobNames makes 3 possible, |
| 54 | + * but complicates 1 since we need to track things in BlobNames. |
| 55 | + * Y. Using BlobPath/BlobName as the key and storing the data in a column |
| 56 | + * makes 2 possible using get_count. |
| 57 | + * Z. But it complicates 4, which must be done by fetching the BlobPath's |
| 58 | + * BlobNames then deleting them. |
| 59 | + */ |
| 60 | + |
20 | 61 | package org.elasticsearch.cassandra.blobstore;
|
21 | 62 |
|
22 | 63 | import org.elasticsearch.common.blobstore.BlobContainer;
|
|
0 commit comments