In the world of Artificial Intelligence and Machine Learning, the ability to understand and search data based on its semantic meaning, rather than exact keyword matches, has become fundamental. This is the realm of Vector Databases and Approximate Nearest Neighbor (ANN) algorithms.
Recently, at Devmy, we had the exceptional opportunity to preview Redis Vector Sets, a new Redismodule currently being optimized by its creator, Salvatore Sanfilippo aka @antirez.
Thanks to his willingness to provide us with a preliminary binary for macOS, we were able to integrate this new module into our RAG (Retrieval-Augmented Generation) system, replacing our previous solution, ChromaDB, for a direct and practical comparison. In this article, we will share our findings, starting from the general concepts of vector databases and the HNSW algorithm, then diving into the specifics of Redis Vector Sets and recounting our experience.
Traditionally, databases excel at retrieving data based on exact queries or logical conditions on structured fields. However, much modern information (text, images, audio) is inherently unstructured. To handle it, machine learning models (often Deep Learning) are used to transform this data into high-dimensional numerical vectors, known as embeddings. These vectors capture the semantic essence of the original data: semantically similar data will have vectors close to each other in the vector space.
A Vector Database is a system specifically designed to efficiently store, index, and query these embeddings. The primary goal is to find the vectors (and thus the associated original data) closest to a given query vector, according to a distance metric (like cosine similarity or Euclidean distance). This process is known as Nearest Neighbor (NN) search.
The applications are vast:
RAG systems: retrieving relevant context from a knowledge base to improve the responses of generative language models.
Finding the exact nearest neighbor (Exact Nearest Neighbor) in a high-dimensional space is computationally very expensive, often requiring comparison with every single vector in the database. For large datasets, this is impractical.
This is where approximate nearest neighbor (ANN) algorithms come in. These algorithms sacrifice a small percentage of precision in exchange for a huge gain in search speed. There are several families of ANN algorithms, including:
Graph-based: like HNSW (Hierarchical Navigable Small World), they build multi-level proximity graphs.
HNSW is one of the most popular and performant graph-based ANN algorithms, known for its excellent balance between search speed, result quality (recall), and index construction speed. It is the algorithm underlying Redis Vector Sets. The idea is to create connections between vectors in a way that allows for both rapid global traversal of the graph and accurate searching for similar vectors. It does this by creating multiple layers of vectors, where each layer includes a subset of the vectors in the layer below. This means that as you move up the layers, finding the correct general area of the graph becomes faster.
And once the general area is found, the search becomes more local, moving down the layers, where more vectors are included. Eventually, the search reaches the bottom layer, which includes all available vectors. Here, a more thorough search among neighbors is performed to refine the results. This method allows a search to quickly find the right general area of the graph before performing a full search of the dense bottom layer. Intuitively, an HNSW graph can be thought of as a kind of multidimensional skip list, where the upper layers are used for global search, and the lower layers are used for local search. HNSW has some key parameters that influence its performance and memory usage:
Redis Vector Sets introduces a new data type in Redis, called precisely Vector Set, where string elements are associated with a vector. The goal is to allow the insertion of vector elements and the efficient retrieval of subsets most similar to a query vector or an element already present in the set. Based on an optimized HNSW implementation, it offers typical Redis performance.
The HNSW implementation in Redis vector sets introduces some significant improvements compared to the standard algorithm, focusing particularly on robustness and dynamic data management. A fundamental difference is that all connections between all nodes in the graph are bidirectional: node A is connected to node B, and B is also connected to A.
This characteristic enables one of the most distinctive features of Redis vector sets: actual node deletion (VREM). Unlike many implementations that use tombstoning (marking a node as deleted to then remove it in a separate garbage collection phase), here the removal is immediate, and memory is actually freed right away. Furthermore, when a node is deleted, a specific reconciliation algorithm comes into play (also an extension not present in various papers) which reconnects the neighboring nodes left "orphaned", preserving the HNSW graph's quality criteria.
The practical implication for the user is significant; the HNSW index can be treated as a truly dynamic data structure. It is possible to add and remove elements freely, even in large quantities, without worrying about performance degradation or having to manually manage memory cleanup. The index continues to function correctly, and memory is recovered transparently. This ability to manage highly mutable vector indices is a specific strength of this implementation.
VADD, VSIM, VREM, VEMB, VINFO, VSETATTR
, etc.) that integrate naturally with the Redis ecosystem..year > 2020 and .genre == "science_fiction"
). This combines semantic search with traditional scalar filters in the same operation. The FILTER-EF
option controls how much additional effort to dedicate to the search when many elements are discarded by the filter.VADD ... CAS
can execute the computationally expensive part of searching for candidates as neighbors in the background while the command is executed in the main thread.
When you use FILTER
, Redis doesn't first perform the vector search and then filter the results, but integrates the filter into the HNSW search process. As the HNSW algorithm explores the graph and identifies candidate nodes, i.e., vectors potentially close to your query, it immediately checks if that candidate also passes your FILTER
condition. Only candidates that are both vectorially close and satisfy the filter are considered valid for the final result.
But beware! There is, however, a potential problem: imagine your filter is very selective, meaning only a small percentage of your data satisfies it. The HNSW algorithm might quickly find many vectors close to your query, but most of them are discarded by the filter because they don't satisfy it. To find the number of results you requested (e.g., COUNT 10), Redis must continue exploring the graph much longer and consider many more candidates than in an unfiltered search. This takes more time and CPU resources.
In the worst case, if the elements satisfying the filter are extremely rare, Redis might end up having to examine almost the entire graph, making the search very slow. To prevent queries with very selective filters from becoming excessively slow, Redis Vector Sets has a default effort limit.
By default, when using FILTER
, Redis will not examine more than COUNT * 100
total candidates to try to satisfy your request. So if you ask for COUNT 10
, it will examine at most 10 100 = 1000 candidates. This has the advantage of keeping performance predictable, but also a disadvantage: if it doesn't find 10 elements that pass the filter within those 1000 candidates, it will simply return fewer than 10 results, the ones it found.
Through the FILTER-EF
parameter, you can override that automatic limit of `COUNT 100`.
For example, if you set FILTER-EF 5000
, you tell Redis: "If necessary, you are authorized to examine up to 5000 candidates to find my COUNT filtered results." This increases the probability of getting the full number of results even with selective filters, but it can increase search time if matches are indeed rare. Setting FILTER-EF to 0 removes the limit, and Redis will search relentlessly (potentially scanning everything) until it finds COUNT results or exhausts the graph. It's powerful but potentially very slow.
Important to note that even if you set a very high FILTER-EF
(e.g., 5000), Redis won't necessarily do all that work! If the elements satisfying the filter are common, the HNSW algorithm will stop much earlier thanks to its early stopping mechanisms, as soon as it has found COUNT results of good quality.
VADD
: Adds an element with its vector.REDUCE
: Reduce Dimensions: Applies random projection to reduce vector dimensionality.FP32/VALUES
: Input Format. Specifies if the vector is sent as binary data (FP32) or numeric strings (VALUES).Q8/BIN/NOQUANT
: Quantization: Chooses the compression type for the vector (int8, binary, or none).EF
(build): Construction Effort: Controls how accurately to search for neighbors during insertion (impacts index quality and VADD time).M
: Graph Connections. Maximum number of links per node in the HNSW graph.SETATTR
: JSON Attributes: Associates JSON
data with the element, usable later for filtering.CAS
: Partial Threading. Executes part of the insertion (candidate search) in a separate thread to reduce command latency.VSIM
: Performs similarity search.ELE/FP32/VALUES
: Query Type: Indicates whether the query is the name of an existing element (ELE) or a vector (FP32/VALUES).WITHSCORES
: Show Scores: Also returns the similarity score for each result.COUNT
: Number of Results. Specifies the maximum number of similar elements to return.EF
(search): Search Effort: Controls the accuracy (recall) vs speed of the search in the HNSW graph.FILTER
expression: Filter Results: Applies a filter based on the JSON attributes of the elements.FILTER-EF
: Filtered Search Effort: Limit on the specific effort for searches using FILTER.TRUTH
: Exact Search. Performs a linear scan (slow) to find the perfect results (useful for testing recall).NOTHREAD
: No Thread. Forces search execution in the main Redis thread (can cause blocking).VREM
: Physically removes the element and its vector, freeing memory.VEMB
: Retrieves the (approximated / dequantized) vector of an element. RAW returns the internal representation.
1VEMB movie_embeddings "Die Hard"
2 1) "0.5142386555671692"
3 2) "0.09641975164413452"
4 3) "-0.2892592251300812"
5 4) "1.060617208480835"
6 5) "-0.16069957613945007"....
1VEMB movie_embeddings "Die Hard" RAW
21) int8
32) "\x10\x03\xf7!\xfb\xe7\xe3\xf8\xfd\xfa\x0f\n\x04\x01\x13\...
VSETATTR
: Sets/updates/removes JSON attributes for filtering.VGETATTR
: Retrieves JSON attributes.VDIM,VCARD
: Get vector dimension and number of elements.
1VDIM movie_embeddings
2(integer) 1024
3
4VCARD movie_embeddings
5(integer) 998
VINFO, VLINKS
: Introspection commands for debugging and analysis.
1VINFO word_embeddings_bin
2 1) quant-type
3 2) int8
4 3) hnsw-m
5 4) (integer) 16
6 5) vector-dim
7 6) (integer) 300
8 7) projection-input-dim
9 8) (integer) 0
10 9) size
1110) (integer) 3000000
1211) max-level
1312) (integer) 11
1413) attributes-count
1514) (integer) 3000000
1615) vset-uid
1716) (integer) 1
1817) hnsw-max-node-uid
1918) (integer) 3000000
VRANDMEMBER
: Returns random elements from a set.
Armed with the preliminary build provided by antirez, we undertook the integration of Redis Vector Sets into our existing RAG system. Our goal was twofold: evaluate the ease of integration and compare performance with ChromaDB, our previous solution.
Redis's performance made itself felt immediately. Both during the insertion phase and the query phase. VSIM queries, both on a contained dataset like our use case and on significant test datasets, were extremely fast. We also tried datasets with binary quantization to experiment with boosted performance.
We tried the various options for both the insertion command (VADD) and the query command (VSIM) to experiment with different behaviors, and the final choice was as follows:
VADD FP32 [BLOB…] ID M 16 EF 200
VSIM FP32 [BLOB…] COUNT 5 WITHSCORES EF 500
During testing, much of which was done together with antirez, we encountered a crucial aspect: the method of sending vectors from the client to Redis. Initially, we used the VALUES option, which involves sending the vector as a list of strings representing float numbers (e.g., VALUES 3 0.145 0.2345 0.3234). We noticed that:
We then tried using the FP32 option. This option allows sending the vector as a compact binary blob of 32-bit floats. The improvement was drastic: we observed speedups in the order of 10-20x in the overall time to send the VADD or VSIM command (when the query is a vector), simply by changing the transmission format. This is an important point to consider for anyone implementing an application that interacts with Redis Vector Sets: prefer FP32 over VALUES for sending vectors. You will find these instructions in the official documentation!
We are eager to see the stable version of Redis Vector Sets and continue exploring its potential.
Our experience with Redis Vector Sets, although based on a preliminary version, was extremely positive. The combination of Redis performance, an optimized HNSW implementation, advanced features like filtered search and quantization, and the possibility of consolidating data infrastructure, make it a very promising solution.
The main strengths we found are:
The importance of using the FP32
binary format for vector transmission cannot be underestimated to achieve maximum client-side performance.
We thank Salvatore Sanfilippo for the opportunity to preview this exciting development in the world of Redis and vector search and for having, albeit unintentionally, made a small contribution..
Co-Founder e CTO di Devmy, appassionato di tecnologia e innovazione e calciatore mancato.
Oltre a guidare lo sviluppo tecnologico dell’azienda, è sempre alla ricerca di sfide complesse, nuove architetture e soluzioni all’avanguardia. Quando non è concentrato sui progetti dell'azienda, lo troverete impegnato a sperimentare tecnologie emergenti, esplorare nuove tendenze o fare brainstorming su come rendere Devmy ancora più innovativa.