Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Merge remote-tracking branch 'origin/master' into develop
  • Loading branch information
searchivairus committed May 11, 2018
2 parents 59d0956 + dbe5495 commit 71f2111
Show file tree
Hide file tree
Showing 8 changed files with 499 additions and 4 deletions.
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -5,7 +5,7 @@

Non-Metric Space Library (NMSLIB)
=================
The latest **pre**-release is [1.7.2](https://github.com/nmslib/nmslib/releases/tag/v1.7.1). Note that the manual is not updated to reflect some of the changes. In particular, we changed the build procedure for Windows. Also note that the manual targets primiarily developers who will extend the library. For most other folks, [Python binding docs should be sufficient](python_bindings).
The latest **pre**-release is [1.7.2](https://github.com/nmslib/nmslib/releases/tag/v1.7.1). Note that the manual is not updated to reflect some of the changes. In particular, we changed the build procedure for Windows. Also note that the manual targets primiarily developers who will extend the library. For most other folks, [Python binding docs should be sufficient](python_bindings). The basic parameter tuning/selection guidelines are also available [online](/python_bindings/parameters.md).
-----------------
Non-Metric Space Library (NMSLIB) is an **efficient** cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does **not** have any third-party dependencies.

Expand All @@ -17,7 +17,7 @@ NMSLIB is an **extendible library**, which means that is possible to add new sea

Other contributors: Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

To acknowledge the use of the library, you could provide a link to this repository and/or cite our SISAP paper [**[BibTex]**](http://dblp.uni-trier.de/rec/bibtex/conf/sisap/BoytsovN13). Some other related papers are listed in the end.
**Citing:** If you find this library useful, feel free to cite our SISAP paper [**[BibTex]**](http://dblp.uni-trier.de/rec/bibtex/conf/sisap/BoytsovN13) as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [**[BibTex]**](https://dblp.uni-trier.de/rec/bibtex/journals/corr/MalkovY16).

Leo(nid) Boytsov is a maintainer. Leo is supported by the [Open Advancement of Question Answering Systems (OAQA) group](https://github.com/oaqa) and the following NSF grant #1618159: "[Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1618159&HistoricalAwards=false)". Bileg was supported by the [iAd Center](https://web.archive.org/web/20160306011711/http://www.iad-center.com/).

Expand Down
4 changes: 4 additions & 0 deletions python_bindings/README.md
Expand Up @@ -45,6 +45,10 @@ ids, distances = index.knnQuery(data[0], k=10)
neighbours = index.knnQueryBatch(data, k=10, num_threads=4)
```

#### Basic tuning guidelines

The basic parameter tuning/selection guidelines are available [here](/python_bindings/parameters.md).

#### Logging

NMSLIB produces quite a few informational messages. By default, they are not shown in Python. To enable debugging, one should use the following commands **before** importing the library:
Expand Down
13 changes: 11 additions & 2 deletions python_bindings/notebooks/README.md
@@ -1,7 +1,16 @@
We have five Notebooks: three are for regular dense vector spaces, one for 8-bit integer SIFT vectors, one is for the sparse vector space with cosine similarity, and one is for the sparse Jaccard space. For the dense space, we have examples of the so-called optimized and non-optimized indices. Except HNSW, all the methods save meta-indices rather than real ones. Meta indices contain only index structure, but not the data. Hence, before a meta-index can be loaded, we need to re-load data. One example is a memory efficient space to search for SIFT vectors.
We have Python notebooks for the following scenarios:

1. [The Euclidean space with the optimized index](search_vector_dense_optim.ipynb);
2. [The Euclidean space with the non-optimized index](search_vector_dense_nonoptim.ipynb);
3. [The Euclidean space ofr for 8-bit integer SIFT vectors (the index is not optimized)](search_sift_uint8.ipynb);
4. [KL-divergence (non-optimized index)](search_vector_dense_kldiv.ipynb);
3. [Sparse cosine similarity (non-optimized index)](search_sparse_cosine.ipynb);
4. [Sparse Jaccard similarity (non-optimized index)](search_sparse_cosine.ipynb).

Note that for for the dense space, we have examples of the so-called optimized and non-optimized indices. Except HNSW, all the methods save meta-indices rather than real ones. Meta indices contain only index structure, but not the data. Hence, before a meta-index can be loaded, we need to re-load data. One example is a memory efficient space to search for SIFT vectors.

HNSW, can save real indices, but only for the dense vector spaces: Euclidean and the cosine. When you use these optimized indices, the search does not require reloading all the data. However, reloading the data is **required** if you want to use the function **getDistance**. Furthermore, creation of the optimized index can always be disabled specifying the index-time parameter **skip_optimized_index** (value 1).
This separation into optimized and non-optimized indices is not very convenient. In the future, we will fix this issue.

The sparse Jaccard space example is particularly interesting, because data set objects are represented by strings. In the case of the Jaccard space, a string is simply a list of sorted space-separated IDs. Other non-vector spaces can use different formats.
The sparse Jaccard space example is interesting, because data set objects are represented by strings. In the case of the Jaccard space, a string is simply a list of sorted space-separated IDs. Other non-vector spaces can use different formats. The KL-divergence example is interesting, because it is a non-metric data set with non-symmetric distance function.

0 comments on commit 71f2111

Please sign in to comment.