Redis, What a Vector Blast!

We have the vector sets

In the world of Artificial Intelligence and Machine Learning, the ability to understand and search data based on its semantic meaning, rather than exact keyword matches, has become fundamental. This is the realm of Vector Databases and Approximate Nearest Neighbor (ANN) algorithms.

Recently, at Devmy, we had the exceptional opportunity to preview Redis Vector Sets, a new Redismodule currently being optimized by its creator, Salvatore Sanfilippo aka @antirez.

Thanks to his willingness to provide us with a preliminary binary for macOS, we were able to integrate this new module into our RAG (Retrieval-Augmented Generation) system, replacing our previous solution, ChromaDB, for a direct and practical comparison. In this article, we will share our findings, starting from the general concepts of vector databases and the HNSW algorithm, then diving into the specifics of Redis Vector Sets and recounting our experience.

What are vector databases and why are they important?

Traditionally, databases excel at retrieving data based on exact queries or logical conditions on structured fields. However, much modern information (text, images, audio) is inherently unstructured. To handle it, machine learning models (often Deep Learning) are used to transform this data into high-dimensional numerical vectors, known as embeddings. These vectors capture the semantic essence of the original data: semantically similar data will have vectors close to each other in the vector space.

A Vector Database is a system specifically designed to efficiently store, index, and query these embeddings. The primary goal is to find the vectors (and thus the associated original data) closest to a given query vector, according to a distance metric (like cosine similarity or Euclidean distance). This process is known as Nearest Neighbor (NN) search.

The applications are vast:

Recommendation systems: finding products similar to those a user has liked.
Image search: finding images similar to a given image.
Natural language processing: finding documents or phrases with similar meaning.
RAG systems: retrieving relevant context from a knowledge base to improve the responses of generative language models.

Vector search algorithms

Finding the exact nearest neighbor (Exact Nearest Neighbor) in a high-dimensional space is computationally very expensive, often requiring comparison with every single vector in the database. For large datasets, this is impractical.

This is where approximate nearest neighbor (ANN) algorithms come in. These algorithms sacrifice a small percentage of precision in exchange for a huge gain in search speed. There are several families of ANN algorithms, including:

Tree-based: like Annoy, they build trees to partition the space.
Hashing-based: like Locality-Sensitive Hashing (LSH), they use hash functions to group nearby vectors.
Graph-based: like HNSW (Hierarchical Navigable Small World), they build multi-level proximity graphs.

How HNSW works

HNSW is one of the most popular and performant graph-based ANN algorithms, known for its excellent balance between search speed, result quality (recall), and index construction speed. It is the algorithm underlying Redis Vector Sets. The idea is to create connections between vectors in a way that allows for both rapid global traversal of the graph and accurate searching for similar vectors. It does this by creating multiple layers of vectors, where each layer includes a subset of the vectors in the layer below. This means that as you move up the layers, finding the correct general area of the graph becomes faster.

articolo(en).png

And once the general area is found, the search becomes more local, moving down the layers, where more vectors are included. Eventually, the search reaches the bottom layer, which includes all available vectors. Here, a more thorough search among neighbors is performed to refine the results. This method allows a search to quickly find the right general area of the graph before performing a full search of the dense bottom layer. Intuitively, an HNSW graph can be thought of as a kind of multidimensional skip list, where the upper layers are used for global search, and the lower layers are used for local search. HNSW has some key parameters that influence its performance and memory usage:

M: Defines the maximum number of connections (neighbors) that each node can have per layer (except for layer 0, which has M*2). A higher M value creates a denser graph, potentially improving navigation quality and recall (the ability to find the true nearest neighbors), but significantly increases memory usage, as each connection is a pointer, and the index construction time. The default in Redis Vector Sets is 16, a good compromise.
EF-Construction (Exploration Factor during Construction): Controls how much "effort" is made during the insertion of a new node to find its best neighbors to connect to. A higher value leads to a better quality graph, potentially better recall, but significantly slows down insertion. The default in Redis Vector Sets is 200.
EF - Search (Exploration Factor during Search): Controls the size of the candidate queue (the candidate queue or dynamic list refers to the list of nodes considered by the algorithm) explored during a search. A higher value increases the probability of finding the nearest neighbors, improving recall, but increases query latency. It is an important parameter to adjust to balance speed and accuracy based on the application.

Redis Vector Sets

Redis Vector Sets introduces a new data type in Redis, called precisely Vector Set, where string elements are associated with a vector. The goal is to allow the insertion of vector elements and the efficient retrieval of subsets most similar to a query vector or an element already present in the set. Based on an optimized HNSW implementation, it offers typical Redis performance.

Key Improvements to the HNSW Implementation

The HNSW implementation in Redis vector sets introduces some significant improvements compared to the standard algorithm, focusing particularly on robustness and dynamic data management. A fundamental difference is that all connections between all nodes in the graph are bidirectional: node A is connected to node B, and B is also connected to A.

This characteristic enables one of the most distinctive features of Redis vector sets: actual node deletion (VREM). Unlike many implementations that use tombstoning (marking a node as deleted to then remove it in a separate garbage collection phase), here the removal is immediate, and memory is actually freed right away. Furthermore, when a node is deleted, a specific reconciliation algorithm comes into play (also an extension not present in various papers) which reconnects the neighboring nodes left "orphaned", preserving the HNSW graph's quality criteria.

The practical implication for the user is significant; the HNSW index can be treated as a truly dynamic data structure. It is possible to add and remove elements freely, even in large quantities, without worrying about performance degradation or having to manually manage memory cleanup. The index continues to function correctly, and memory is recovered transparently. This ability to manage highly mutable vector indices is a specific strength of this implementation.

Main Features

Intuitive Commands: Provides a set of V* commands (VADD, VSIM, VREM, VEMB, VINFO, VSETATTR, etc.) that integrate naturally with the Redis ecosystem.
Quantization: Supports quantization to drastically reduce memory usage and speed up calculations, with minimal (INT8) or moderate (BIN) impact on recall.
- NOQUANT: No quantization, 32-bit floating-point vectors. Maximum precision, maximum memory/CPU usage.
- Q8 (Default): 8-bit integer quantization. Reduces memory by at least 4 times, reduces both insertion and search times (x2 or more), the impact on accuracy (recall) is minimal. Usually the best choice.
- BIN: Binary quantization (1 bit per component). Drastically improves speed, reduces memory by 32 times compared to NOQUANT and thus compared to full precision vectors (fp32), but reduces accuracy. To be considered as a choice when speed is a priority and the loss of accuracy is acceptable.
Filtered Search: Allows associating JSON attributes with each element and filtering vector search results using simple logical-mathematical expressions on these attributes (e.g., .year > 2020 and .genre == "science_fiction"). This combines semantic search with traditional scalar filters in the same operation. The FILTER-EF option controls how much additional effort to dedicate to the search when many elements are discarded by the filter.
Dimensionality Reduction (REDUCE): Allows reducing the dimensionality of vectors via random projection during insertion. Useful for very large vectors if a loss of precision is acceptable, reducing memory and CPU. Needs careful testing to evaluate the potential impact on recall.
Threading Model: Searches are executed by default in separate threads (up to a configurable maximum), allowing Redis to remain responsive. Even insertion with VADD ... CAS can execute the computationally expensive part of searching for candidates as neighbors in the background while the command is executed in the main thread.
Horizontal Scalability (Sharding): It is possible to partition a large dataset across multiple Redis instances (or keys on the same instance). Queries are executed on all partitions, and the results are merged client-side. This scales writes linearly and allows managing datasets larger than the memory of a single instance, although reads require multiple queries.

Let's delve into FILTER

When you use FILTER, Redis doesn't first perform the vector search and then filter the results, but integrates the filter into the HNSW search process. As the HNSW algorithm explores the graph and identifies candidate nodes, i.e., vectors potentially close to your query, it immediately checks if that candidate also passes your FILTER condition. Only candidates that are both vectorially close and satisfy the filter are considered valid for the final result.

But beware! There is, however, a potential problem: imagine your filter is very selective, meaning only a small percentage of your data satisfies it. The HNSW algorithm might quickly find many vectors close to your query, but most of them are discarded by the filter because they don't satisfy it. To find the number of results you requested (e.g., COUNT 10), Redis must continue exploring the graph much longer and consider many more candidates than in an unfiltered search. This takes more time and CPU resources.

In the worst case, if the elements satisfying the filter are extremely rare, Redis might end up having to examine almost the entire graph, making the search very slow. To prevent queries with very selective filters from becoming excessively slow, Redis Vector Sets has a default effort limit.

By default, when using FILTER, Redis will not examine more than COUNT * 100 total candidates to try to satisfy your request. So if you ask for COUNT 10, it will examine at most 10 100 = 1000 candidates. This has the advantage of keeping performance predictable, but also a disadvantage: if it doesn't find 10 elements that pass the filter within those 1000 candidates, it will simply return fewer than 10 results, the ones it found. Through the FILTER-EF parameter, you can override that automatic limit of `COUNT 100`.

For example, if you set FILTER-EF 5000, you tell Redis: "If necessary, you are authorized to examine up to 5000 candidates to find my COUNT filtered results." This increases the probability of getting the full number of results even with selective filters, but it can increase search time if matches are indeed rare. Setting FILTER-EF to 0 removes the limit, and Redis will search relentlessly (potentially scanning everything) until it finds COUNT results or exhausts the graph. It's powerful but potentially very slow. Important to note that even if you set a very high FILTER-EF (e.g., 5000), Redis won't necessarily do all that work! If the elements satisfying the filter are common, the HNSW algorithm will stop much earlier thanks to its early stopping mechanisms, as soon as it has found COUNT results of good quality.

Fundamental Commands (Summary)

VADD: Adds an element with its vector.
- Key options:
  - REDUCE: Reduce Dimensions: Applies random projection to reduce vector dimensionality.
  - FP32/VALUES: Input Format. Specifies if the vector is sent as binary data (FP32) or numeric strings (VALUES).
  - Q8/BIN/NOQUANT: Quantization: Chooses the compression type for the vector (int8, binary, or none).
  - EF (build): Construction Effort: Controls how accurately to search for neighbors during insertion (impacts index quality and VADD time).
  - M: Graph Connections. Maximum number of links per node in the HNSW graph.
SETATTR: JSON Attributes: Associates JSON data with the element, usable later for filtering.
CAS: Partial Threading. Executes part of the insertion (candidate search) in a separate thread to reduce command latency.
VSIM: Performs similarity search.
- Key options:
  - ELE/FP32/VALUES: Query Type: Indicates whether the query is the name of an existing element (ELE) or a vector (FP32/VALUES).
  - WITHSCORES: Show Scores: Also returns the similarity score for each result.
  - COUNT: Number of Results. Specifies the maximum number of similar elements to return.
  - EF (search): Search Effort: Controls the accuracy (recall) vs speed of the search in the HNSW graph.
  - FILTER expression: Filter Results: Applies a filter based on the JSON attributes of the elements.
  - FILTER-EF: Filtered Search Effort: Limit on the specific effort for searches using FILTER.
  - TRUTH: Exact Search. Performs a linear scan (slow) to find the perfect results (useful for testing recall).
  - NOTHREAD: No Thread. Forces search execution in the main Redis thread (can cause blocking).
VREM: Physically removes the element and its vector, freeing memory.
VEMB: Retrieves the (approximated / dequantized) vector of an element. RAW returns the internal representation.

1VEMB movie_embeddings "Die Hard"
2   1) "0.5142386555671692"
3   2) "0.09641975164413452"
4   3) "-0.2892592251300812"
5   4) "1.060617208480835"
6   5) "-0.16069957613945007"....

1VEMB movie_embeddings "Die Hard" RAW
21) int8
32) "\x10\x03\xf7!\xfb\xe7\xe3\xf8\xfd\xfa\x0f\n\x04\x01\x13\...

VSETATTR: Sets/updates/removes JSON attributes for filtering.
VGETATTR: Retrieves JSON attributes.
VDIM,VCARD: Get vector dimension and number of elements.

1VDIM movie_embeddings
2(integer) 1024
3
4VCARD movie_embeddings
5(integer) 998

VINFO, VLINKS: Introspection commands for debugging and analysis.

1VINFO word_embeddings_bin
2 1) quant-type
3 2) int8
4 3) hnsw-m
5 4) (integer) 16
6 5) vector-dim
7 6) (integer) 300
8 7) projection-input-dim
9 8) (integer) 0
10 9) size
1110) (integer) 3000000
1211) max-level
1312) (integer) 11
1413) attributes-count
1514) (integer) 3000000
1615) vset-uid
1716) (integer) 1
1817) hnsw-max-node-uid
1918) (integer) 3000000

VRANDMEMBER: Returns random elements from a set.

Our experience with Redis Vector Sets

Armed with the preliminary build provided by antirez, we undertook the integration of Redis Vector Sets into our existing RAG system. Our goal was twofold: evaluate the ease of integration and compare performance with ChromaDB, our previous solution.

First Impressions

Redis's performance made itself felt immediately. Both during the insertion phase and the query phase. VSIM queries, both on a contained dataset like our use case and on significant test datasets, were extremely fast. We also tried datasets with binary quantization to experiment with boosted performance.

We tried the various options for both the insertion command (VADD) and the query command (VSIM) to experiment with different behaviors, and the final choice was as follows:

Insertion: VADD FP32 [BLOB…] ID M 16 EF 200
Query: VSIM FP32 [BLOB…] COUNT 5 WITHSCORES EF 500

During testing, much of which was done together with antirez, we encountered a crucial aspect: the method of sending vectors from the client to Redis. Initially, we used the VALUES option, which involves sending the vector as a list of strings representing float numbers (e.g., VALUES 3 0.145 0.2345 0.3234). We noticed that:

Serializing large vectors into this string format on the client side can be surprisingly slow and consume CPU on the client itself.
The resulting payload is very verbose, increasing bandwidth usage and network latency compared to typical Redis commands.

We then tried using the FP32 option. This option allows sending the vector as a compact binary blob of 32-bit floats. The improvement was drastic: we observed speedups in the order of 10-20x in the overall time to send the VADD or VSIM command (when the query is a vector), simply by changing the transmission format. This is an important point to consider for anyone implementing an application that interacts with Redis Vector Sets: prefer FP32 over VALUES for sending vectors. You will find these instructions in the official documentation!

We are eager to see the stable version of Redis Vector Sets and continue exploring its potential.

Conclusions

Our experience with Redis Vector Sets, although based on a preliminary version, was extremely positive. The combination of Redis performance, an optimized HNSW implementation, advanced features like filtered search and quantization, and the possibility of consolidating data infrastructure, make it a very promising solution.

The main strengths we found are:

Speed: Searches are extremely fast, especially leveraging quantization.
Integrated Filtered Search: A powerful feature very useful in many contexts.
Memory Efficiency: Quantization (especially Q8) offers an excellent trade-off between memory usage and accuracy.

The importance of using the FP32 binary format for vector transmission cannot be underestimated to achieve maximum client-side performance. We thank Salvatore Sanfilippo for the opportunity to preview this exciting development in the world of Redis and vector search and for having, albeit unintentionally, made a small contribution..