In [1]:
from google.colab import drive
drive.mount('/content/drive')
base_dir = "/content/drive/MyDrive/huggingface-rag"
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [2]:
!pip install qdrant-client sentence-transformers fastembed
Requirement already satisfied: qdrant-client in /usr/local/lib/python3.12/dist-packages (1.16.1)
Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.12/dist-packages (5.1.2)
Requirement already satisfied: fastembed in /usr/local/lib/python3.12/dist-packages (0.7.3)
Requirement already satisfied: grpcio>=1.41.0 in /usr/local/lib/python3.12/dist-packages (from qdrant-client) (1.76.0)
Requirement already satisfied: httpx>=0.20.0 in /usr/local/lib/python3.12/dist-packages (from httpx[http2]>=0.20.0->qdrant-client) (0.28.1)
Requirement already satisfied: numpy>=1.26 in /usr/local/lib/python3.12/dist-packages (from qdrant-client) (2.0.2)
Requirement already satisfied: portalocker<4.0,>=2.7.0 in /usr/local/lib/python3.12/dist-packages (from qdrant-client) (3.2.0)
Requirement already satisfied: protobuf>=3.20.0 in /usr/local/lib/python3.12/dist-packages (from qdrant-client) (5.29.5)
Requirement already satisfied: pydantic!=2.0.*,!=2.1.*,!=2.2.0,>=1.10.8 in /usr/local/lib/python3.12/dist-packages (from qdrant-client) (2.12.3)
Requirement already satisfied: urllib3<3,>=1.26.14 in /usr/local/lib/python3.12/dist-packages (from qdrant-client) (2.5.0)
Requirement already satisfied: transformers<5.0.0,>=4.41.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.57.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.67.1)
Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (2.9.0+cu126)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.6.1)
Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.16.3)
Requirement already satisfied: huggingface-hub>=0.20.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (0.36.0)
Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (11.3.0)
Requirement already satisfied: typing_extensions>=4.5.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.15.0)
Requirement already satisfied: loguru<0.8.0,>=0.7.2 in /usr/local/lib/python3.12/dist-packages (from fastembed) (0.7.3)
Requirement already satisfied: mmh3<6.0.0,>=4.1.0 in /usr/local/lib/python3.12/dist-packages (from fastembed) (5.2.0)
Requirement already satisfied: onnxruntime!=1.20.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from fastembed) (1.23.2)
Requirement already satisfied: py-rust-stemmers<0.2.0,>=0.1.0 in /usr/local/lib/python3.12/dist-packages (from fastembed) (0.1.5)
Requirement already satisfied: requests<3.0,>=2.31 in /usr/local/lib/python3.12/dist-packages (from fastembed) (2.32.4)
Requirement already satisfied: tokenizers<1.0,>=0.15 in /usr/local/lib/python3.12/dist-packages (from fastembed) (0.22.1)
Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client) (4.11.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client) (2025.11.12)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client) (1.0.9)
Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client) (3.11)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client) (0.16.0)
Requirement already satisfied: h2<5,>=3 in /usr/local/lib/python3.12/dist-packages (from httpx[http2]>=0.20.0->qdrant-client) (4.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (3.20.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2025.3.0)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (6.0.3)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (1.2.0)
Requirement already satisfied: coloredlogs in /usr/local/lib/python3.12/dist-packages (from onnxruntime!=1.20.0,>=1.17.0->fastembed) (15.0.1)
Requirement already satisfied: flatbuffers in /usr/local/lib/python3.12/dist-packages (from onnxruntime!=1.20.0,>=1.17.0->fastembed) (25.9.23)
Requirement already satisfied: sympy in /usr/local/lib/python3.12/dist-packages (from onnxruntime!=1.20.0,>=1.17.0->fastembed) (1.14.0)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic!=2.0.*,!=2.1.*,!=2.2.0,>=1.10.8->qdrant-client) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic!=2.0.*,!=2.1.*,!=2.2.0,>=1.10.8->qdrant-client) (2.41.4)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic!=2.0.*,!=2.1.*,!=2.2.0,>=1.10.8->qdrant-client) (0.4.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0,>=2.31->fastembed) (3.4.4)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (75.2.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.6)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.1.6)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.77)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.77)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.80)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (11.3.0.4)
Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (10.3.7.77)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (11.7.1.2)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.5.4.2)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (2.27.5)
Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.3.20)
Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.77)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (12.6.85)
Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (1.11.1.6)
Requirement already satisfied: triton==3.5.0 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.5.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (2025.11.3)
Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (0.7.0)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn->sentence-transformers) (1.5.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn->sentence-transformers) (3.6.0)
Requirement already satisfied: hyperframe<7,>=6.1 in /usr/local/lib/python3.12/dist-packages (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client) (6.1.0)
Requirement already satisfied: hpack<5,>=4.1 in /usr/local/lib/python3.12/dist-packages (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client) (4.1.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy->onnxruntime!=1.20.0,>=1.17.0->fastembed) (1.3.0)
Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.12/dist-packages (from anyio->httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client) (1.3.1)
Requirement already satisfied: humanfriendly>=9.1 in /usr/local/lib/python3.12/dist-packages (from coloredlogs->onnxruntime!=1.20.0,>=1.17.0->fastembed) (10.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers) (3.0.3)
In [3]:
import json
import os
import uuid
from tqdm import tqdm
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
from fastembed import SparseTextEmbedding
In [4]:
from sentence_transformers import SentenceTransformer

output_path = f"{base_dir}/ft-jina-transformers-v1"
dense_model = SentenceTransformer(output_path, trust_remote_code=True)
dense_dim = dense_model.get_sentence_embedding_dimension()
print(f"Dense model embedding size: {dense_dim}")

sparse_model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
Dense model embedding size: 768
In [5]:
import os
from qdrant_client import QdrantClient
from qdrant_client.http import models

qdrant_path = f"{base_dir}/qdrant_hybrid_db"

lock_file = os.path.join(qdrant_path, ".lock")
if os.path.exists(lock_file):
  try:
    os.remove(lock_file)
    print(f"Removed stale lock file: {lock_file}")
  except Exception as e:
    print(f"Warning: Could not remove lock file: {e}")

client = QdrantClient(path=qdrant_path)
collection_name = 'huggingface_transformers_docs'

client.delete_collection(collection_name)

client.create_collection(
  collection_name=collection_name,
  vectors_config={
    "text-dense": models.VectorParams(
      size=dense_dim,
      distance=models.Distance.COSINE,
    )
  },
  sparse_vectors_config={
    "text-sparse": models.SparseVectorParams(
      index=models.SparseIndexParams(
        on_disk=True,
      )
    )
  }
)
print(f"Collection '{collection_name}' created.")
Removed stale lock file: /content/drive/MyDrive/huggingface-rag/qdrant_hybrid_db/.lock
Collection 'huggingface_transformers_docs' created.
In [6]:
from tqdm import tqdm
import uuid
import json
import hashlib

batch_size = 64
chunked_path = f"{base_dir}/chunks.jsonl"

with open(chunked_path, 'r', encoding='utf-8') as f_in:
  total_docs = sum(1 for line in f_in if line.strip())

with open(chunked_path, 'r', encoding='utf-8') as f_in:
  batch_docs = []

  for line in tqdm(f_in, desc="Building document index", total=total_docs):
    line = line.strip()
    if not line:
      continue

    doc = json.loads(line)
    batch_docs.append(doc)

    if len(batch_docs) >= batch_size:
      batch_texts = [doc['text'] for doc in batch_docs]

      dense_vectors = dense_model.encode(batch_texts, convert_to_tensor=False).tolist()
      sparse_vectors = list(sparse_model.embed(batch_texts))

      points = []
      for idx, (d_vec, s_vec) in enumerate(zip(dense_vectors, sparse_vectors)):
        doc = batch_docs[idx]
        doc_id_hash = hashlib.md5(doc['text'].encode('utf-8')).hexdigest()

        payload = {
            "text": doc["text"],
            "source": doc.get("metadata", {}).get("source", "unknown"),
            "headers": doc.get("metadata", {}).get("headers", []),
            "full_metadata": doc.get("metadata", {})
        }

        qdrant_sparse_vec = models.SparseVector(
          indices=s_vec.indices.tolist(),
          values=s_vec.values.tolist()
        )

        points.append(models.PointStruct(
          id=doc_id_hash,
          payload=payload,
          vector={
            "text-dense": d_vec,
            "text-sparse": qdrant_sparse_vec
          }
        ))

      client.upsert(
        collection_name=collection_name,
        points=points
      )

      batch_docs = []

  if batch_docs:
    print(f"Indexing final batch of {len(batch_docs)} documents...")
    batch_texts = [doc["text"] for doc in batch_docs]

    dense_vectors = dense_model.encode(batch_texts, convert_to_tensor=False).tolist()
    sparse_vectors = list(sparse_model.embed(batch_texts))

    points = []

    for idx, (d_vec, s_vec) in enumerate(zip(dense_vectors, sparse_vectors)):
      doc = batch_docs[idx]
      doc_id_hash = hashlib.md5(doc['text'].encode('utf-8')).hexdigest()

      payload = {
        "text": doc["text"],
        "source": doc.get("metadata", {}).get("source", "unknown"),
        "headers": doc.get("metadata", {}).get("headers", []),
        "full_metadata": doc.get("metadata", {})
      }

      qdrant_sparse_vec = models.SparseVector(
        indices=s_vec.indices.tolist(),
        values=s_vec.values.tolist()
      )

      points.append(models.PointStruct(
        id=doc_id_hash,
        payload=payload,
        vector={
          "text-dense": d_vec,
          "text-sparse": qdrant_sparse_vec
        }
      ))

    client.upsert(
      collection_name=collection_name,
      points=points
    )

print("Index building complete")
Building document index: 100%|██████████| 6473/6473 [14:47<00:00,  7.29it/s]
Indexing final batch of 9 documents...
Index building complete
In [7]:
import time
import os
from qdrant_client import QdrantClient, models

def print_results(results, method_name):
  print(f"\n--- {method_name} Results ---")
  if not results:
    print("No results found.")
    return

  for i, point in enumerate(results):
    text_preview = point.payload['text'][:100].replace('\n', ' ')
    source = point.payload.get('source', 'Unknown Source')
    score = point.score
    print(f"{i+1}. [{score:.4f}] {source} | {text_preview}...")

lock_file = os.path.join(qdrant_path, ".lock")
if os.path.exists(lock_file):
  try:
    os.remove(lock_file)
    print(f"Removed stale lock file: {lock_file}")
  except Exception as e:
    print(f"Warning: Could not remove lock file: {e}")

query_text = "How to use AutoModel?"

print(f"2. Connecting to Qdrant at {qdrant_path}...")
client = QdrantClient(path=qdrant_path)

print(f"3. Processing Query: '{query_text}'")

query_dense_vec = dense_model.encode(query_text).tolist()

query_sparse_gen = list(sparse_model.embed([query_text]))[0]
query_sparse_vec = models.SparseVector(
  indices=query_sparse_gen.indices.tolist(),
  values=query_sparse_gen.values.tolist()
)

# Using query_points with 'using' parameter
results_dense = client.query_points(
  collection_name=collection_name,
  query=query_dense_vec,
  using="text-dense",
  limit=5,
  with_payload=True
).points
print_results(results_dense, "ONLY DENSE (Semantic)")

# Using query_points with 'using' parameter
results_sparse = client.query_points(
  collection_name=collection_name,
  query=query_sparse_vec,
  using="text-sparse",
  limit=5,
  with_payload=True
).points
print_results(results_sparse, "ONLY SPARSE (Keyword/SPLADE)")

prefetch_dense = models.Prefetch(
  query=query_dense_vec,
  using="text-dense",
  limit=20, # Expanded recall for RRF
)

prefetch_sparse = models.Prefetch(
  query=query_sparse_vec,
  using="text-sparse",
  limit=20,
)

# Fixed: method -> fusion argument name
results_hybrid = client.query_points(
  collection_name=collection_name,
  prefetch=[prefetch_dense, prefetch_sparse],
  query=models.FusionQuery(fusion=models.Fusion.RRF),
  limit=5,
  with_payload=True
).points

print_results(results_hybrid, "HYBRID (RRF Fusion)")
Removed stale lock file: /content/drive/MyDrive/huggingface-rag/qdrant_hybrid_db/.lock
2. Connecting to Qdrant at /content/drive/MyDrive/huggingface-rag/qdrant_hybrid_db...
3. Processing Query: 'How to use AutoModel?'

--- ONLY DENSE (Semantic) Results ---
1. [0.7454] model_doc/albert.md | Context: ALBERT > Pipeline > AutoModel  #### AutoModel  ```py import torch from transformers import ...
2. [0.7410] model_doc/hubert.md | Context: HuBERT > Pipeline > AutoModel  #### AutoModel  ```python import torch from transformers imp...
3. [0.7388] model_doc/vits.md | Context: VITS > Pipeline > AutoModel  #### AutoModel  ```python import torch import scipy from IPyth...
4. [0.7324] model_doc/bart.md | Context: BART > Pipeline > AutoModel  #### AutoModel  ```py import torch from transformers import Au...
5. [0.7308] model_doc/vit_mae.md | Context: ViTMAE > AutoModel  #### AutoModel  ```python import torch import requests from PIL import ...

--- ONLY SPARSE (Keyword/SPLADE) Results ---
1. [19.7641] model_doc/auto.md | Context: Auto Classes  # Auto Classes  In many cases, the architecture you want to use can be guesse...
2. [18.8307] models.md | Context: Loading models > Model classes > AutoModel  #### AutoModel  The AutoModel class is a conven...
3. [18.6722] model_doc/cohere.md | Context: Cohere > Notes  ## Notes  - Don't use the dtype parameter in `~AutoModel.from_pretrained` i...
4. [16.4616] tasks/image_feature_extraction.md | Context: Image Feature Extraction > Getting Features and Similarities using `AutoModel`  ## Getting ...
5. [16.2580] troubleshooting.md | Context: Troubleshoot > ValueError: Unrecognized configuration class XYZ for this kind of AutoModel ...

--- HYBRID (RRF Fusion) Results ---
1. [0.5000] model_doc/albert.md | Context: ALBERT > Pipeline > AutoModel  #### AutoModel  ```py import torch from transformers import ...
2. [0.5000] model_doc/auto.md | Context: Auto Classes  # Auto Classes  In many cases, the architecture you want to use can be guesse...
3. [0.3333] model_doc/hubert.md | Context: HuBERT > Pipeline > AutoModel  #### AutoModel  ```python import torch from transformers imp...
4. [0.3333] models.md | Context: Loading models > Model classes > AutoModel  #### AutoModel  The AutoModel class is a conven...
5. [0.2500] model_doc/vits.md | Context: VITS > Pipeline > AutoModel  #### AutoModel  ```python import torch import scipy from IPyth...
In [7]: