Building a Hybrid Search App with Qdrant: A Technical Walkthrough

Recently I applied for a Search Solution Architect position at Qdrant. In the case that you don't know Qdrant, is a vector similarity search engine and vector database. It provides a production-ready service with a convenient API to store, search, and manage points, vectors with an additional payload Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications.

In this blog, we already have an approach to vector databases with Milvus in the following blog post:

Milvus Unleashed: A First Dive into Vector Databases
Milvus is a powerful open-source vector database that excels at managing unstructured data such as images, video, and audio. This article explores Milvus’s capabilities, including an image similarity search use case using Python.

Anyways, as part of the application, I was tasked to create a Hybrid Search App using Qdrant. Even though I didn't land the job, the journey resulted in a great technical experiment that I like to share here for the benefit of both beginners and experienced developers.

The complete project / code can be found here:

GitHub - xe-nvdk/qdrant-hybrid-search-demo
Contribute to xe-nvdk/qdrant-hybrid-search-demo development by creating an account on GitHub.

Project Objective

The assignment was to create a Hybrid Search App using Qdrant with the following features:

  • Dense and Sparse Vector Search combined (hybrid search)
  • Binary Quantization applied to dense vectors
  • Late Interaction Model for re-ranking results
  • Support for User-specific Filtering
  • Qdrant Cluster with Two Nodes and Three Shards

All without a UI, just a backend pipeline and search engine.

Architecture Overview

In the repo that I shared before, you will find the following structure:

Part Purpose
Docker Compose Deploys a two-node Qdrant cluster locally with replication and sharding
init_qdrant.py Initializes the Qdrant collection, applies binary quantization, sets up indexes, Downloads Wikipedia articles, assigns random user IDs, and ingests documents
search.py Performs hybrid search (dense + sparse) and re-ranks results using a cross-encoder
requirements.txt Lists all Python dependencies for easy environment setup

Part 1: Docker Compose (Cluster Setup)

The first part is easy, a docker-compose file spining up 2 Qdrant nodes, setting a replication and 3 shards. I love the way that I was able to configure this, because, I realized, during this project how much I love passing configuration through environment variables instead of config files.

So, basically I build a file to do the following:

  • Spin up 2 Qdrant nodes
  • Enable cluster mode
  • Set up replication and 3 shards
  • Expose necessary ports for HTTP and gRPC communication

To run this, you need to have docker installed or something like Rancher Destop and run the following:

docker compose up -d

If everything goes well, you will see something like this:

13c622fd5508   qdrant/qdrant:latest         "./qdrant --bootstra…"   2 weeks ago      Up 37 minutes   0.0.0.0:6336->6333/tcp, [::]:6336->6333/tcp, 0.0.0.0:6337->6334/tcp, [::]:6337->6334/tcp, 0.0.0.0:6338->6335/tcp, [::]:6338->6335/tcp   qdrant_node2
80352a5ea705   qdrant/qdrant:latest         "./qdrant --uri http…"   2 weeks ago      Up 37 minutes   0.0.0.0:6333-6335->6333-6335/tcp, :::6333-6335->6333-6335/tcp                                                                           qdrant_node1

To validate if this was working or not, you can navigate to this URL: http://localhost:6333/cluster and you should have a response like the following:

{"result":{"status":"enabled","peer_id":1058100155072525,"peers":{"1058100155072525":{"uri":"http://qdrant_node1:6335/"},"8488705440897638":{"uri":"http://qdrant_node2:6335/"}},"raft_info":{"term":4,"commit":106,"pending_operations":0,"leader":8488705440897638,"role":"Follower","is_voter":true},"consensus_thread_status":{"consensus_thread_status":"working","last_update":"2025-04-25T20:05:01.089361773Z"},"message_send_failures":{}},"status":"ok","time":0.000017792}

Part 2: init_qdrant.py (Collection Initialization) and Ingesting data (Where the fun starts)

The next part was initialize the collection in Qdrant with Dense and Sparse vectors using all-MiniLM-L6-v2 and Splade_PP_en_v1 models, and creation of the index on user_id. So, basically, we are preparing the database for optimized hybrid search and user-specific queries.

The code of this file, can be found in the repo:

qdrant-hybrid-search-demo/init_qdrant.py at main · xe-nvdk/qdrant-hybrid-search-demo
Contribute to xe-nvdk/qdrant-hybrid-search-demo development by creating an account on GitHub.

Inside of this script, you will find the function ingest_data that what it does is download the Wikipedia data set using Hugging Face, assign a random user_id and ingest documents in batches with the associated metadata. So, basically, we are populating the collection with real-world, semi-structured data.

The Challenge for me at this point, was the speed of ingestion. The task required a million of datapoints and the ingestion was super slow, wasn't sure if this was Python or Qdrant.

Part 3: search.py (Hybrid Search and Re-ranking)

Once the data was flowing into the database, the next step was building the real magic: the search experience.

Inside search.py, we take a user's query and do several things under the hood:

  • First, we encode the query into a dense vector (semantic meaning) and a sparse vector (keyword/token-based meaning).
  • Then, we run a hybrid search in Qdrant, combining the strengths of both approaches.
  • After getting the initial candidates, we re-rank the results using a late interaction model based on a Cross Encoder (cross-encoder/ms-marco-MiniLM-L-6-v2).
  • The search also supports filtering by user_id if you want to restrict results to specific users, like a scoped search.

The final outcome? You get smarter, more contextually relevant search results.

At the beginning of the ingestion (when I only had ~49 documents), results weren't amazing (See bonus track below), but once you start feeding real data, the difference becomes noticeable.

Running a search is as simple as:

python3 search.py "Who is Lionel Messi?"

And if you want to filter by a specific user:

python3 search.py "Who is Lionel Messi?" --user-id 7
Result #1
ID: 560371
Original Score: 1.0000
Re-rank Score: 9.2263
User ID: 8
Text: Lionel Andrés Messi (; born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward for  club Paris Saint-Germain and captains the Argentina national team. Often considered the best player in the world and widely regarded as one of the greatest players 

Environment Setup (Getting Everything Up and Running)

Now that you know what each part does, here’s how to get your environment ready to actually spin up the cluster, load the data, and start searching.

1. Create and Activate a Virtual Environment

We always want a clean Python environment for projects like this. Run:

python3 -m venv env
source env/bin/activate

This will create a virtual environment named env and activate it.

2. Install the Project Dependencies

Install everything needed with just one command:

pip install -r requirements.txt

This pulls all the libraries we use: Qdrant client, Sentence Transformers, Hugging Face datasets, etc.

3. Start the Qdrant Cluster

Now we need to bring our two-node Qdrant cluster to life:

docker compose up -d

This launches two Qdrant nodes locally, with replication and sharding enabled, so it feels like a mini production cluster.

4. Initialize the Collection

Once the cluster is running, let’s initialize the collection and download the data.

python3 init_qdrant.py

This will:

  • Delete any existing collection called hybrid_search
  • Create a new collection with dense/sparse vectors and binary quantization
  • Set up the user_id index for filtering later
  • Download the Wikipedia dataset from Hugging Face

Finally, you can run a hybrid search with re-ranking:

python3 search.py "your query here"

Example, like I showed you before:

python3 search.py "Who is Lionel Messi?"

Final Thoughts (Wrapping It Up)

Even though at the beginning only a handful of documents (around 49!) were ingested, the whole architecture is ready for scale.

This project demonstrates a complete flow: setting up a clustered environment, ingesting real-world data, doing hybrid vector search, applying binary quantization, and re-ranking results for better accuracy, all production-grade concepts.

It was a fantastic experiment born from a real-world application challenge at Qdrant, and even though the journey didn't end with the job, it absolutely sharpened my skills and wanted to go deeper on vector databases, hybrid search, and search architecture.

If you're starting your journey with vector databases, or planning to build smarter search systems, this setup is a perfect playground to learn, break things, and improve.

Happy experimenting! 🚀

Bonus Track

#architectthis #ai #machinelearning #nlp #vectorsearch #hybridsearch #llm… | Ignacio Van Droogenbroeck
I'm working on prototyping a Hybrid Search App using Qdrant. The dataset? Wikipedia. Last night, I started ingesting documents into the database. Just a few minutes in, after only a handful of entries, I ran a simple test query: "Who is Lionel Messi?" And here’s what came back as result #1: "Jeffrey Lionel Dahmer (; May 21, 1960 – November 28, 1994), also known as the Milwaukee Cannibal or the Milwaukee Monster, was an American serial killer..." Not exactly the GOAT I was expecting. The takeaway? If your ingestion process isn’t clean, it doesn’t matter how advanced your methods are (sparse, dense, reranked, trained, or unicorn-optimized) the output will still be trash. Bad data in = bad results out. And no reranker in the world is going to turn Dahmer into Messi. #ArchitectThis #AI #MachineLearning #NLP #VectorSearch #HybridSearch #LLM #OpenSource #Qdrant #HappyFriday