Skip to content

Commit

Permalink
Generic doc improvements (interm. commit)
Browse files Browse the repository at this point in the history
  • Loading branch information
searchivarius committed Jun 3, 2019
1 parent 7bb63ef commit 2613472
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 27 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@
[![Windows Build Status](https://ci.appveyor.com/api/projects/status/wd63b9doe7xco81t/branch/master?svg=true)](https://ci.appveyor.com/project/searchivarius/nmslib)
[![Join the chat at https://gitter.im/nmslib/Lobby](https://badges.gitter.im/nmslib/Lobby.svg)](https://gitter.im/nmslib/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

#Non-Metric Space Library (NMSLIB)
# Non-Metric Space Library (NMSLIB)

##Important Notes
## Important Notes

* NMSLIB is generic, but fast, see the results of [ANN benchmarks](https://github.com/erikbern/ann-benchmarks).
* A stand-alone implementation of our fastest method HNSW [also exists as a header-only library](https://github.com/nmslib/hnswlib).
* All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library) can be found [on this page](/manual/README.md).
* For **generic questions/inquiries**, please, use [**the Gitter chat**](https://gitter.im/nmslib/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge): GitHub issues page is for bugs and feature requests.

##Some Limitations
## Some Limitations

* Only static data sets are supported (with an exception of SW-graph)
* HNSW currently duplicates memory to create optimized indices
* Range/threshold search is not supported by many methods including SW-graph/HNSW

##Objectives
## Objectives

Non-Metric Space Library (NMSLIB) is an **efficient** cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does **not** have any third-party dependencies.

Expand All @@ -32,7 +32,7 @@ NMSLIB is an **extendible library**, which means that is possible to add new sea

**Authors**: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov. **With contributions from** David Novak, Lawrence Cayton, Wei Dong, Avrelin Nikita, Ben Frederickson, Dmitry Yashunin, Bob Poekert, @orgoro, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

##Brief History
## Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings,
and participated in earlier evaluations.
Expand All @@ -44,11 +44,11 @@ a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and i
as well as a vanilla uncompressed inverted file.


##Credits and Citing
## Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [**[BibTex]**](http://dblp.uni-trier.de/rec/bibtex/conf/sisap/BoytsovN13) as well as other papers listed in the end. One **crucial contribution** to cite is the fast Hierarchical Navigable World graph (HNSW) method [**[BibTex]**](https://dblp.uni-trier.de/rec/bibtex/journals/corr/MalkovY16). Please, [also check out the stand-alone HNSW implementation by Yury Malkov](https://github.com/nmslib/hnswlib), which is released as a header-only HNSWLib library.

##License
## License

Most of this code is released under the
Apache License Version 2.0 http://www.apache.org/licenses/.
Expand All @@ -57,11 +57,11 @@ Apache License Version 2.0 http://www.apache.org/licenses/.
* The k-NN graph construction algorithm *NN-Descent* due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
* FALCONN library's licence is MIT.

##Funding
## Funding

Leonid Boytsov was supported by the [Open Advancement of Question Answering Systems (OAQA) group](https://github.com/oaqa) and the following NSF grant #1618159: "[Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1618159&HistoricalAwards=false)". Bileg was supported by the [iAd Center](https://web.archive.org/web/20160306011711/http://www.iad-center.com/).

##Related Publications
## Related Publications

Most important related papers are listed below in the chronological order:
* L. Boytsov, D. Novak, Y. Malkov, E. Nyberg (2016). [Off the Beaten Path: Let’s Replace Term-Based Retrieval
Expand Down
3 changes: 2 additions & 1 deletion manual/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,11 @@ an average fraction of true neighbors returned by the method (with ties broken a
# Documentation Links

* [Python bindings overview](/python_bindings) and [Python bindings API](https://nmslib.github.io/nmslib/index.html)
* [A Brief List of Methods and Parameters](/manual/parameters.md)
* [A Brief List of Methods and Parameters](/manual/methods.md)
* [A brief list of supported spaces/distance](/manual/spaces.md)
* [Building the main library](/manual/build.md)
* [Building and using the query server](/manual/query_server.md)
* [Benchmarking using NMSLIB utility ``experiment``](/manual/benchmarking.md)
* [Extending the library](/manual/extensions.md)
* [A more detailed and formal description of methods and spaces (PDF)](/manual/latex/manual.pdf)

3 changes: 3 additions & 0 deletions manual/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Benchmarking

##
41 changes: 35 additions & 6 deletions manual/build.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#Building the main library on Linux/Mac
# Building the Main Library on Linux/Mac

##Prerequisites
## Prerequisites

1. A modern compiler that supports C++11: G++ 4.7, Intel compiler 14, Clang 3.4, or Visual Studio 14 (version 12 can probably be used as well, but the project files need to be downgraded).
2. **64-bit** Linux is recommended, but most of our code builds on **64-bit** Windows and MACOS as well.
Expand All @@ -14,7 +14,7 @@ To install additional prerequisite packages on Ubuntu, type the following
sudo apt-get install libboost-all-dev libgsl0-dev libeigen3-dev
```

##Quick Start on Linux/Mac
## Quick Start on Linux/Mac

To compile, go to the directory **similarity_search** and type:
```
Expand All @@ -27,11 +27,11 @@ cmake . -DWITH_EXTRAS=1
make
```

##Quick Start on Windows
## Quick Start on Windows

Building on Windows requires [Visual Studio 2015 Express for Desktop](https://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx) and [CMake for Windows](https://cmake.org/download/). First, generate Visual Studio solution file for 64 bit architecture using CMake **GUI**. You have to specify both the platform and the version of Visual Studio. Then, the generated solution can be built using Visual Studio. **Attention**: this way of building on Windows is not well tested yet. We suspect that there might be some issues related to building truly 64-bit binaries.

##Additional Building Details
## Additional Building Details

Here we cover a few details on choosing the compiler,
a type of the release, and manually pointing to the location
Expand Down Expand Up @@ -90,4 +90,33 @@ manually as follows:

```
export BOOST_ROOT=$HOME/boost_download_dir
```
```

## Testing the Correctness of Implementations

We have two main testing utilities ``bunit`` and ``test_integr`` (``experiment.exe`` and
``test_integr.exe`` on Windows).
Both utilities accept the single optional argument: the name of the log file.
If the log file is not specified, a lot of informational messages are printed to the screen.

The ``bunit`` verifies some basic functitionality akin to unit testing.
In particular, it checks that an optimized version of the, e.g., Eucledian, distance
returns results that are very similar to the results returned by unoptimized and simpler version.
The utility ``bunit`` is expected to always run without errors.

The utility ``test_integr`` runs complete implementations of many methods
and checks if several effectiveness and efficiency characteristics
meet the expectations.
The expectations are encoded as an array of instances of the class ``MethodTestCase``
(see [the code here](similarity_search/test/test_integr.cc#L65)).
For example, we expect that the recall falls in a certain pre-recorded range.

Because almost all our methods are randomized, there is a great deal of variance
in the observed performance characteristics. Thus, some tests
may fail infrequently, if e.g., the actual recall value is slightly lower or higher
than an expected minimum or maximum.
From an error message, it should be clear if the discrepancy is substantial, i.e.,
something went wrong, or not, i.e., we observe an unlikely outcome due to randomization.



2 changes: 1 addition & 1 deletion manual/methods.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#A Brief List of Methods and Parameters
# A Brief List of Methods and Parameters

## Overview

Expand Down
20 changes: 10 additions & 10 deletions manual/spaces.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#Spaces and Distances
# Spaces and Distances

Below, there is a list of nearly all spaces (a space is a combination of data and the distance). The mnemonic name of a space is passed to python bindings function as well as to the benchmarking utility ``experiment``.
When initializing the space in Python embeddings, please use the type
`FLOAT` for all spaces, except `leven`: [see the description here.](https://nmslib.github.io/nmslib/api.html#nmslib-init)
A more detailed description is given
in the [manual](manual/latex/manual.pdf).

##Specifying parameters of the space
## Specifying parameters of the space

In some rare cases, spaces have parameters, which are specified after the
colon.
Expand All @@ -17,7 +17,7 @@ For example, ``lp:p=3`` denotes the L<sub>3</sub> space and
``lp:p=2`` is a synonym for the Euclidean, i.e., L<sub>2</sub> space.


##Fast, Slow, and Approximate variants
## Fast, Slow, and Approximate variants

There can be more than one version of a distance function,
which have different space-performance trade-off.
Expand All @@ -42,7 +42,7 @@ and another is for right queries (the data object is the second argument and the
In the latter case the name of the space ends on ``rq``.


##Input Format
## Input Format

For Python bindings, all dense-vector spaces require float32 numpy-array input (two-dimensional). See an example [here](python_bindings/notebooks/search_vector_dense_optim.ipynb).
One exception is the squared Euclidean space for SIFT vectors, which requires input as uint8 integer numpy arrays. An example can be found [here](python_bindings/notebooks/search_sift_uint8.ipynb).
Expand All @@ -62,7 +62,7 @@ currently a limitation).
You can pass a UTF8-encoded string, but the distance will be sometimes
larger than the actual distance.

##Storage Format
## Storage Format

For dense vector spaces, the data can be either single-precision or double-precision floating-point numbers.
However, double-precision has not been useful so far and we do not recommend use it.
Expand Down Expand Up @@ -134,14 +134,14 @@ and for the [Itakura-Saito distance](https://en.wikipedia.org/wiki/Itakura%E2%80
We also explicitly implement the squared JS-divergence,
which is a true metric distance.

For the meaning of infixes `fast`, `slow`, `approx`, and `rq` see the information above.
For the meaning of infixes ``fast``, ``slow``, ``approx``, and ``rq`` see the information above.

| Space code(s) | Description and Notes |
|--------------------------------------------|-------------------------------------------------|
| ``kldivfast``, ``kldivfastrq`` | Regular KL-divergence |
| ``kldivgenslow``, ``kldivgenfast``, ``kldivgenfastrq`` | Generalized KL-divergence |
| ``itakurasaitoslow``, ``itakurasaitofast``, ``itakurasaitofastrq`` | Itakura-Saito distance |
| ``jsdivslow``, ``jsdivfast``, `jsdivfastapprox` | JS-divergence |
| `kldivfast`, `kldivfastrq` | Regular KL-divergence |
| `kldivgenslow`, `kldivgenfast`, `kldivgenfastrq` | Generalized KL-divergence |
| `itakurasaitoslow`, `itakurasaitofast`, `itakurasaitofastrq` | Itakura-Saito distance |
| `jsdivslow`, `jsdivfast`, `jsdivfastapprox` | JS-divergence |
| `jsmetrslow`, `jsmetrfast`, `jsmetrfastapprox` | JS-metric |
| `renyidiv_slow`, `renyidiv_fast` | Renyi divergence: parameter name `alpha` |

Expand Down

0 comments on commit 2613472

Please sign in to comment.