Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
searchivairus committed Sep 1, 2017
2 parents 74c3d2d + 0fc0112 commit 6350d29
Show file tree
Hide file tree
Showing 66 changed files with 16,574 additions and 12 deletions.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,17 @@

Non-Metric Space Library (NMSLIB)
=================
The latest **pre**-release is [1.6](https://github.com/searchivarius/nmslib/releases/tag/v1.6). Note that the manual is not fully updated to reflect 1.6 changes. In particular, we changed the build procedure for Windows.
The latest **pre**-release is [1.6](https://github.com/searchivarius/nmslib/releases/tag/v1.6). Note that the manual is not updated to reflect 1.6 changes. In particular, we changed the build procedure for Windows. Also note that the manual targets primiarily developers who will extend the library. For most other folks, [Python binding docs should be sufficient](https://searchivarius.github.io/nmslib/).
-----------------
Non-Metric Space Library (NMSLIB) is an **efficient** cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does **not** have any third-party dependencies.

The goal of the project is to create an effective and **comprehensive** toolkit for searching in **generic non-metric** spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on **approximate** methods.

NMSLIB is an **extendible library**, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

**Main developers** : Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov. With contributions from David Novak, Lawrence Cayton, Wei Dong, Avrelin Nikita, Ben Frederickson, Dmitry Yashunin, Bob Poekert, @orgoro, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.
**Main developers** : Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak, Ben Frederickson.

Other contributors: Lawrence Cayton, Wei Dong, Avrelin Nikita, Ben Frederickson, Dmitry Yashunin, Bob Poekert, @orgoro, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

To acknowledge the use of the library, you could provide a link to this repository and/or cite our SISAP paper [**[BibTex]**](http://dblp.uni-trier.de/rec/bibtex/conf/sisap/BoytsovN13). Some other related papers are listed in the end.

Expand All @@ -40,11 +42,11 @@ As of **May 2016** results are:
<tr width="100%" border="0" style="border:none">
<td border="0" align="center" style="border:none">
1.19M 100d GloVe, cosine similarity.
<img src="https://raw.githubusercontent.com/searchivarius/nmslib/master/docs/figures/glove.png" width="400">
<img src="https://raw.githubusercontent.com/searchivarius/nmslib/master/manual/figures/glove.png" width="400">
</td>
<td border="0" align="center" style="border:none">
1M 128d SIFT features, Euclidean distance:
<img src="https://raw.githubusercontent.com/searchivarius/nmslib/master/docs/figures/sift.png" width="400">
<img src="https://raw.githubusercontent.com/searchivarius/nmslib/master/manual/figures/sift.png" width="400">
</td>
</tr></table>

Expand All @@ -61,7 +63,7 @@ What's new in version 1.6 ([see this page for more details](https://github.com/s
General information
-----------------------

A detailed description is given [in the manual](docs/manual.pdf). The manual also contains instructions for building under Linux and Windows, extending the library, as well as for debugging the code using Eclipse. Note that the manual is not fully updated to reflect 1.6 changes.
A detailed description is given [in the manual](manual/manual.pdf). The manual also contains instructions for building under Linux and Windows, extending the library, as well as for debugging the code using Eclipse. Note that the manual is not fully updated to reflect 1.6 changes. Also note that the manual targets primiarily developers who will extend the library. **For most other folks**, [Python binding docs should be sufficient](https://searchivarius.github.io/nmslib/).

Most of this code is released under the
Apache License Version 2.0 http://www.apache.org/licenses/.
Expand Down Expand Up @@ -90,7 +92,7 @@ Limitations
1. Currently only static data sets are supported
2. HNSW does not work with Clang
3. HNSW currently duplicates memory to create optimized indices
4. Non-optimized HNSW indices cannot be saved
4. Non-optimized HNSW indices cannot be saved (for spaces other than cosine and Euclidean)
5. Range/threshold search is not supported by many methods including SW-graph/HNSW

We plan to resolve these issues in the future.
Expand All @@ -109,16 +111,14 @@ cmake . -DWITH_EXTRAS=1
make
```

Note that the directory **similarity_search** contains an Eclipse project that can be imported into [The Eclipse IDE for C/C++ Developers](http://www.eclipse.org/ide/). A more detailed description is given in [in the manual](docs/manual.pdf), which also contains examples of using the software.

You can also download almost every data set used in our previous evaluations (see the section **Data sets** below). The downloaded data needs to be decompressed (you may need 7z, gzip, and bzip2). Old experimental scripts can be found in the directory [previous_releases_scripts](previous_releases_scripts). However, they will work only with previous releases.

Note that the benchmarking utility **supports caching of ground truth data**, so that ground truth data is not recomputed every time this utility is re-run on the same data set.

Query server (Linux-only)
-----------------------
The query server requires Apache Thrift. We used Apache Thrift 0.9.2, but, perhaps, newer versions will work as well.
To install Apache Thrift, you need to [build it from source](https://thrift.apache.org/docs/BuildingFromSource).
To install Apache Thrift, you need to [build it from source](https://thrift.apache.org/manual/BuildingFromSource).
This may require additional libraries. On Ubuntu they can be installed as follows:
```
sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libboost-system-dev libboost-filesystem-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev libboost-thread-dev make
Expand Down Expand Up @@ -156,7 +156,7 @@ We provide Python bindings for Python 2.7+ and Python 3.5+, which have been test
pip install nmslib
```

For examples of using the Python API, please, see the README in the [python_bindings](python_bindings) folder.
For examples of using the Python API, please, see the README in the [python_bindings](python_bindings) folder. [More detailed documentation is also available](https://searchivarius.github.io/nmslib/) (thanks to Ben Frederickson).

Quick start on Windows
-----------------------
Expand All @@ -168,7 +168,7 @@ Data sets
We use several data sets, which were created either by other folks,
or using 3d party software. If you use these data sets, please, consider
giving proper credit. The download scripts prints respective BibTex entries.
More information can be found [in the manual](docs/manual.pdf).
More information can be found [in the manual](manual/manual.pdf).

Here is the list of scripts to download major data sets:
* Data sets for our NIPS'13 and SISAP'13 papers [data/get_data_nips2013.sh](data/get_data_nips2013.sh).
Expand Down
Empty file added docs/.nojekyll
Empty file.
File renamed without changes.
34 changes: 34 additions & 0 deletions docs/_sources/api.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
API Reference
=============

nmslib.init
-----------

This function acts act the main entry point into NMS lib. This
function should be called first before calling any other method.

.. autofunction:: nmslib.init

nmslib.FloatIndex
-----------------

nmslib.dist.FloatIndex

.. autoclass:: nmslib.dist.FloatIndex
:members:

nmslib.DoubleIndex
------------------

nmslib.dist.DoubleIndex

.. autoclass:: nmslib.dist.DoubleIndex
:members:

nmslib.IntIndex
---------------

nmslib.dist.IntIndex

.. autoclass:: nmslib.dist.IntIndex
:members:
24 changes: 24 additions & 0 deletions docs/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. nmslib documentation master file, created by
sphinx-quickstart on Mon Aug 7 17:30:15 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Non-Metric Space Library (NMSLIB)
==================================

Contents:

.. toctree::
:maxdepth: 2
:caption: Contents:

Quickstart <quickstart>
API Reference <api>


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
39 changes: 39 additions & 0 deletions docs/_sources/quickstart.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Python bindings for NMSLIB
====================================

Installation
------------

This project works with Python on version 2.7+ and 3.5+, and on Linux, OSX and the Windows operating systems. To install:

``pip install nmslib``

You may need to install Python dev-files. On Ubuntu, you can do it as follows:

``sudo apt-get install python3-dev``

Building on Windows requires Visual Studio 2015, see this project for more information.

Example Usage
-------------

.. code-block:: python
import nmslib
import numpy
# create a random matrix to index
data = numpy.random.randn(10000, 100).astype(numpy.float32)
# initialize a new index, using a HNSW index on Cosine Similarity
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(data)
index.createIndex({'post': 2}, print_progress=True)
# query for the nearest neighbours of the first datapoint
ids, distances = index.knnQuery(data[0], k=10)
# get all nearest neighbours for all the datapoint
# using a pool of 4 threads to compute
neighbours = index.knnQueryBatch(data, k=10, num_threads=4)
Binary file added docs/_static/ajax-loader.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 6350d29

Please sign in to comment.