Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Merge remote-tracking branch 'origin/master' into develop
- Loading branch information
Showing
8 changed files
with
499 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,16 @@ | ||
We have five Notebooks: three are for regular dense vector spaces, one for 8-bit integer SIFT vectors, one is for the sparse vector space with cosine similarity, and one is for the sparse Jaccard space. For the dense space, we have examples of the so-called optimized and non-optimized indices. Except HNSW, all the methods save meta-indices rather than real ones. Meta indices contain only index structure, but not the data. Hence, before a meta-index can be loaded, we need to re-load data. One example is a memory efficient space to search for SIFT vectors. | ||
We have Python notebooks for the following scenarios: | ||
|
||
1. [The Euclidean space with the optimized index](search_vector_dense_optim.ipynb); | ||
2. [The Euclidean space with the non-optimized index](search_vector_dense_nonoptim.ipynb); | ||
3. [The Euclidean space ofr for 8-bit integer SIFT vectors (the index is not optimized)](search_sift_uint8.ipynb); | ||
4. [KL-divergence (non-optimized index)](search_vector_dense_kldiv.ipynb); | ||
3. [Sparse cosine similarity (non-optimized index)](search_sparse_cosine.ipynb); | ||
4. [Sparse Jaccard similarity (non-optimized index)](search_sparse_cosine.ipynb). | ||
|
||
Note that for for the dense space, we have examples of the so-called optimized and non-optimized indices. Except HNSW, all the methods save meta-indices rather than real ones. Meta indices contain only index structure, but not the data. Hence, before a meta-index can be loaded, we need to re-load data. One example is a memory efficient space to search for SIFT vectors. | ||
|
||
HNSW, can save real indices, but only for the dense vector spaces: Euclidean and the cosine. When you use these optimized indices, the search does not require reloading all the data. However, reloading the data is **required** if you want to use the function **getDistance**. Furthermore, creation of the optimized index can always be disabled specifying the index-time parameter **skip_optimized_index** (value 1). | ||
This separation into optimized and non-optimized indices is not very convenient. In the future, we will fix this issue. | ||
|
||
The sparse Jaccard space example is particularly interesting, because data set objects are represented by strings. In the case of the Jaccard space, a string is simply a list of sorted space-separated IDs. Other non-vector spaces can use different formats. | ||
The sparse Jaccard space example is interesting, because data set objects are represented by strings. In the case of the Jaccard space, a string is simply a list of sorted space-separated IDs. Other non-vector spaces can use different formats. The KL-divergence example is interesting, because it is a non-metric data set with non-symmetric distance function. | ||
|
File renamed without changes.
Oops, something went wrong.