Skip to content

Commit

Permalink
Improving docs (interm. commit) #390
Browse files Browse the repository at this point in the history
  • Loading branch information
searchivarius committed Jun 3, 2019
1 parent 55f9fc4 commit 7bb63ef
Show file tree
Hide file tree
Showing 21 changed files with 369 additions and 4,372 deletions.
186 changes: 36 additions & 150 deletions README.md

Large diffs are not rendered by default.

55 changes: 55 additions & 0 deletions manual/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#NMSLIB documentation

Documentation is split into several parts.
Links to these parts are given below.
They are preceded by a short problem definition.

# Terminology and Problem Formulation

NMSLIB provides a fast similarity search.
The search is carried out in a finite database of objects _{o<sub>i</sub>}_
using a search query _q_ and a dissimilarity measure.
An object is a synonym for a **data point** or simply a **point**.
The dissimilarity measure is typically represented by a **distance** function _d(o<sub>i</sub>, q)_.
The ultimate goal is to answer a query by retrieving a subset of database objects sufficiently similar to the query _q_.
A combination of data points and the distance function is called a **search space**,
or simply a **space**.


Note that we use the terms distance and the distance function in a broader sense than
most other folks:
We do not assume that the distance is a true metric distance.
The distance function can disobey the triangle inequality and/or be even non-symmetric.

Two retrieval tasks are typically considered: a **nearest-neighbor** and a range search.
Currently, we mostly support only the nearest-neighbor search,
which aims to find the object at the smallest distance from the query.
Its direct generalization is the _k_ nearest-neighbor search (the _k_-NN search),
which looks for the _k_ closest objects, which
have _k_ smallest distance values to the query _q_.

In generic spaces, the distance is not necessarily symmetric.
Thus, two types of queries can be considered.
In a **left** query, the object is the left argument of the distance function,
while the query is the right argument.
In a **right** query, the query _q_ is the first argument and the object is the second, i.e., the right, argument.

The queries can be answered either exactly,
i.e., by returning a complete result set that does not contain erroneous elements, or,
approximately, e.g., by finding only some neighbors.
Thus, the methods are evaluated in terms of efficiency-effectiveness trade-offs
rather than merely in terms of their efficiency.
One common effectiveness metric is recall,
which is computed as
an average fraction of true neighbors returned by the method (with ties broken arbitrarily).

# Documentation Links

* [Python bindings overview](/python_bindings) and [Python bindings API](https://nmslib.github.io/nmslib/index.html)
* [A Brief List of Methods and Parameters](/manual/parameters.md)
* [A brief list of supported spaces/distance](/manual/spaces.md)
* [Building the main library](/manual/build.md)
* [Building and using the query server](/manual/query_server.md)
* [Extending the library](/manual/extensions.md)
* [A more detailed and formal description of methods and spaces (PDF)](/manual/latex/manual.pdf)

93 changes: 93 additions & 0 deletions manual/build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#Building the main library on Linux/Mac

##Prerequisites

1. A modern compiler that supports C++11: G++ 4.7, Intel compiler 14, Clang 3.4, or Visual Studio 14 (version 12 can probably be used as well, but the project files need to be downgraded).
2. **64-bit** Linux is recommended, but most of our code builds on **64-bit** Windows and MACOS as well.
3. Only for Linux/MACOS: CMake (GNU make is also required)
4. An Intel or AMD processor that supports SSE 4.2 is recommended
5. Extended version of the library requires a development version of the following libraries: Boost, GNU scientific library, and Eigen3.

To install additional prerequisite packages on Ubuntu, type the following

```
sudo apt-get install libboost-all-dev libgsl0-dev libeigen3-dev
```

##Quick Start on Linux/Mac

To compile, go to the directory **similarity_search** and type:
```
cmake .
make
```
To build an extended version (need extra library):
```
cmake . -DWITH_EXTRAS=1
make
```

##Quick Start on Windows

Building on Windows requires [Visual Studio 2015 Express for Desktop](https://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx) and [CMake for Windows](https://cmake.org/download/). First, generate Visual Studio solution file for 64 bit architecture using CMake **GUI**. You have to specify both the platform and the version of Visual Studio. Then, the generated solution can be built using Visual Studio. **Attention**: this way of building on Windows is not well tested yet. We suspect that there might be some issues related to building truly 64-bit binaries.

##Additional Building Details

Here we cover a few details on choosing the compiler,
a type of the release, and manually pointing to the location
of Boost libraries (to build with extras).

The compiler is chosen by setting two environment variables: ``CXX`` and ``CC``. In the case of GNU
C++ (version 8), you may need to type:
```
export CCX=g++-8 CC=gcc-8
```

To create make les for a release version of the code, type:
```
cmake -DCMAKE_BUILD_TYPE=Release .
```

If you did not create any make les before, you can shortcut by typing:
```
cmake .
```

To create make les for a debug version of the code, type:
```
cmake -DCMAKE_BUILD_TYPE=Debug .
```

When make les are created, just type:

```make```

**Important note**: a shortcut command:
``cmake .``
(re)-creates make les for the previously created build. When you type ``cmake .``
for the first time, it creates release makefiles. However, if you create debug
makefiles and then type ``cmake .``, this will not lead to creation of release makefiles!
To prevent this, you need to to delete the cmake cache and makefiles, before
running cmake. For example, you can do the following (assuming the
current directory is similarity search):

```
rm -rf `find . -name CMakeFiles CMakeCache.txt`
```

Also note that, for some reason, cmake might sometimes ignore environmental
variables ``CXX`` and ``CC``. In this unlikely case, you can specify the compiler directly
through cmake arguments. For example, in the case of the GNU C++ and the
release build, this can be done as follows:

```
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=g++-8 \
-DCMAKE_GCC_COMPILER=gcc-8 CMAKE_CC_COMPILER=gcc-8 .
```

Finally, if cmake cannot find the Boost libraries, their location can be specified
manually as follows:

```
export BOOST_ROOT=$HOME/boost_download_dir
```
Binary file removed manual/figures/Eclipse1.pdf
Binary file not shown.
Binary file removed manual/figures/Eclipse2.pdf
Binary file not shown.
Binary file removed manual/figures/Eclipse3.pdf
Binary file not shown.
Binary file removed manual/figures/EclipseDebug.pdf
Binary file not shown.
Binary file removed manual/figures/EclipseDebugConf.pdf
Binary file not shown.
Binary file removed manual/figures/SettingAVXinVS2012.pdf
Binary file not shown.
Binary file removed manual/figures/SettingBoostLocation.pdf
Binary file not shown.
Binary file removed manual/figures/glove.png
Binary file not shown.
Binary file removed manual/figures/sift.png
Binary file not shown.
Binary file removed manual/figures/test_run.pdf
Binary file not shown.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 7bb63ef

Please sign in to comment.