Building a Hybrid Search App with Qdrant: A Technical Walkthrough
Recently I applied for a Search Solution Architect position at Qdrant. In the case that you don't know Qdrant, is a vector similarity search engine and vector database. It provides a production-ready service with a convenient API to store, search, and manage points, vectors with an additional payload Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications.
In this blog, we already have an approach to vector databases with Milvus in the following blog post:
Anyways, as part of the application, I was tasked to create a Hybrid Search App using Qdrant. Even though I didn't land the job, the journey resulted in a great technical experiment that I like to share here for the benefit of both beginners and experienced developers.
The complete project / code can be found here:
Project Objective
The assignment was to create a Hybrid Search App using Qdrant with the following features:
- Dense and Sparse Vector Search combined (hybrid search)
- Binary Quantization applied to dense vectors
- Late Interaction Model for re-ranking results
- Support for User-specific Filtering
- Qdrant Cluster with Two Nodes and Three Shards
All without a UI, just a backend pipeline and search engine.
Architecture Overview
In the repo that I shared before, you will find the following structure:
Part | Purpose |
---|---|
Docker Compose | Deploys a two-node Qdrant cluster locally with replication and sharding |
init_qdrant.py |
Initializes the Qdrant collection, applies binary quantization, sets up indexes, Downloads Wikipedia articles, assigns random user IDs, and ingests documents |
search.py |
Performs hybrid search (dense + sparse) and re-ranks results using a cross-encoder |
requirements.txt |
Lists all Python dependencies for easy environment setup |
Part 1: Docker Compose (Cluster Setup)
The first part is easy, a docker-compose file spining up 2 Qdrant nodes, setting a replication and 3 shards. I love the way that I was able to configure this, because, I realized, during this project how much I love passing configuration through environment variables instead of config files.
So, basically I build a file to do the following:
- Spin up 2 Qdrant nodes
- Enable cluster mode
- Set up replication and 3 shards
- Expose necessary ports for HTTP and gRPC communication
To run this, you need to have docker installed or something like Rancher Destop and run the following:
docker compose up -d
If everything goes well, you will see something like this:
13c622fd5508 qdrant/qdrant:latest "./qdrant --bootstra…" 2 weeks ago Up 37 minutes 0.0.0.0:6336->6333/tcp, [::]:6336->6333/tcp, 0.0.0.0:6337->6334/tcp, [::]:6337->6334/tcp, 0.0.0.0:6338->6335/tcp, [::]:6338->6335/tcp qdrant_node2
80352a5ea705 qdrant/qdrant:latest "./qdrant --uri http…" 2 weeks ago Up 37 minutes 0.0.0.0:6333-6335->6333-6335/tcp, :::6333-6335->6333-6335/tcp qdrant_node1
To validate if this was working or not, you can navigate to this URL: http://localhost:6333/cluster and you should have a response like the following:
{"result":{"status":"enabled","peer_id":1058100155072525,"peers":{"1058100155072525":{"uri":"http://qdrant_node1:6335/"},"8488705440897638":{"uri":"http://qdrant_node2:6335/"}},"raft_info":{"term":4,"commit":106,"pending_operations":0,"leader":8488705440897638,"role":"Follower","is_voter":true},"consensus_thread_status":{"consensus_thread_status":"working","last_update":"2025-04-25T20:05:01.089361773Z"},"message_send_failures":{}},"status":"ok","time":0.000017792}
Part 2: init_qdrant.py (Collection Initialization) and Ingesting data (Where the fun starts)
The next part was initialize the collection in Qdrant with Dense and Sparse vectors using all-MiniLM-L6-v2
and Splade_PP_en_v1
models, and creation of the index on user_id. So, basically, we are preparing the database for optimized hybrid search and user-specific queries.
The code of this file, can be found in the repo:
Inside of this script, you will find the function ingest_data
that what it does is download the Wikipedia data set using Hugging Face, assign a random user_id and ingest documents in batches with the associated metadata. So, basically, we are populating the collection with real-world, semi-structured data.
The Challenge for me at this point, was the speed of ingestion. The task required a million of datapoints and the ingestion was super slow, wasn't sure if this was Python or Qdrant.
Part 3: search.py (Hybrid Search and Re-ranking)
Once the data was flowing into the database, the next step was building the real magic: the search experience.
Inside search.py
, we take a user's query and do several things under the hood:
- First, we encode the query into a dense vector (semantic meaning) and a sparse vector (keyword/token-based meaning).
- Then, we run a hybrid search in Qdrant, combining the strengths of both approaches.
- After getting the initial candidates, we re-rank the results using a late interaction model based on a Cross Encoder (
cross-encoder/ms-marco-MiniLM-L-6-v2
). - The search also supports filtering by user_id if you want to restrict results to specific users, like a scoped search.
The final outcome? You get smarter, more contextually relevant search results.
At the beginning of the ingestion (when I only had ~49 documents), results weren't amazing (See bonus track below), but once you start feeding real data, the difference becomes noticeable.
Running a search is as simple as:
python3 search.py "Who is Lionel Messi?"
And if you want to filter by a specific user:
python3 search.py "Who is Lionel Messi?" --user-id 7
Result #1
ID: 560371
Original Score: 1.0000
Re-rank Score: 9.2263
User ID: 8
Text: Lionel Andrés Messi (; born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward for club Paris Saint-Germain and captains the Argentina national team. Often considered the best player in the world and widely regarded as one of the greatest players
Environment Setup (Getting Everything Up and Running)
Now that you know what each part does, here’s how to get your environment ready to actually spin up the cluster, load the data, and start searching.
1. Create and Activate a Virtual Environment
We always want a clean Python environment for projects like this. Run:
python3 -m venv env
source env/bin/activate
This will create a virtual environment named env
and activate it.
2. Install the Project Dependencies
Install everything needed with just one command:
pip install -r requirements.txt
This pulls all the libraries we use: Qdrant client, Sentence Transformers, Hugging Face datasets, etc.
3. Start the Qdrant Cluster
Now we need to bring our two-node Qdrant cluster to life:
docker compose up -d
This launches two Qdrant nodes locally, with replication and sharding enabled, so it feels like a mini production cluster.
4. Initialize the Collection
Once the cluster is running, let’s initialize the collection and download the data.
python3 init_qdrant.py
This will:
- Delete any existing collection called
hybrid_search
- Create a new collection with dense/sparse vectors and binary quantization
- Set up the
user_id
index for filtering later - Download the Wikipedia dataset from Hugging Face
5. Search!
Finally, you can run a hybrid search with re-ranking:
python3 search.py "your query here"
Example, like I showed you before:
python3 search.py "Who is Lionel Messi?"
Final Thoughts (Wrapping It Up)
Even though at the beginning only a handful of documents (around 49!) were ingested, the whole architecture is ready for scale.
This project demonstrates a complete flow: setting up a clustered environment, ingesting real-world data, doing hybrid vector search, applying binary quantization, and re-ranking results for better accuracy, all production-grade concepts.
It was a fantastic experiment born from a real-world application challenge at Qdrant, and even though the journey didn't end with the job, it absolutely sharpened my skills and wanted to go deeper on vector databases, hybrid search, and search architecture.
If you're starting your journey with vector databases, or planning to build smarter search systems, this setup is a perfect playground to learn, break things, and improve.
Happy experimenting! 🚀