cognity-ai — Modular RAG Library

What's inside

Everything you need for production RAG

cognity-ai handles the plumbing so your agent handles the intelligence.

📄

Universal File Ingestion

PDF, DOCX, XLSX, PPTX, CSV, HTML, JSON, YAML, TXT, MD, and images. Embedded images in documents are auto-extracted and OCR'd inline.

👁️

Multimodal OCR

Gemini Vision, GPT-4o, Claude, Azure, Bedrock, or local Tesseract. Complex tables, handwriting, and mixed layouts handled natively.

🔀

4-Channel Hybrid Retrieval

Graph BFS + Vector semantic + Community global + Graph→Vector bridge. Fused with Reciprocal Rank Fusion. Best answer, every time.

🧠

NLP-First Extraction

spaCy handles ~70% of extraction free and locally. LLMs only augment what NLP misses — causal links, implicit associations.

⚡

Zero-Config Smart Defaults

Auto-selects the best methodology for your stack. hybrid_graph when Neo4j is available, falls back to naive gracefully. No YAML required.

🔌

Plugin Architecture

Register custom loaders, chunkers, embedders, generators, and retrievers. Every component is an ABC — swap anything without touching the pipeline.

♻️

Incremental Ingestion

SHA-256 hash-based change detection. Unchanged documents are skipped entirely — zero API calls on re-ingest of an unchanged corpus.

🌍

Knowledge Lifecycle

Confirm, deprecate, detect conflicts, prune low-confidence triples. Full provenance tracking. Sources influence retrieval scores.

Quick Start

From zero to RAG in minutes

Install, configure, ingest, and query — no infrastructure required for the default setup.

python

from cognity_ai import RAGLibrary

# Smart defaults: Gemini LLM + embedder, Neo4j graph, ChromaDB vector
rag = RAGLibrary(
    gemini_api_key="your-api-key",
    neo4j_password="your-password",
)

# Ingest — auto-detects format from extension
rag.ingest("report.pdf")
rag.ingest("data.xlsx")
rag.ingest("slides.pptx")
rag.ingest_dir("./docs/")        # recursive batch

# Build community graph (optional, enriches thematic queries)
rag.build_communities()

# Query
answer = rag.query("What are the key findings?")

# Query with full source attribution
result = rag.query_with_sources("Who founded Anthropic?")
print(result["answer"])
print(result["sources"])       # graph, vector, community
print(result["seed_entities"])  # ["Anthropic"]

python

from cognity_ai import RAGLibrary

rag = RAGLibrary(
    llm="openai",
    embedder="openai",
    vector_store="qdrant",
    graph_store="neo4j",
    openai_api_key="sk-...",
    neo4j_password="your-password",
)

rag.ingest("contract.docx")
answer = rag.query("What are the payment terms?")

python

# 100% offline — no API keys, no cloud services
# pip install "cognity-ai[sentence-transformers,faiss,networkx,nlp]"
from cognity_ai import RAGLibrary

rag = RAGLibrary(
    llm="ollama",             # local Ollama server
    embedder="sentence_transformers",
    vector_store="faiss",
    graph_store="networkx",
    ocr="tesseract",
)

rag.ingest("local_docs/report.pdf")
answer = rag.query("Summarize the main points")

python

# Ingest ANY file type — format auto-detected
rag.ingest("report.pdf")          # PDF: page-aware, embedded images
rag.ingest("contract.docx")       # Word: headings, tables, images
rag.ingest("financials.xlsx")     # Excel: per-sheet text
rag.ingest("deck.pptx")           # PowerPoint: slides + notes
rag.ingest("data.csv")            # CSV: header-aware
rag.ingest("page.html")           # HTML: tag stripping
rag.ingest("photo.jpg")           # Image: Gemini Vision OCR
rag.ingest("scan.png")            # Image: complex layout OCR
rag.ingest("notes.md")            # Markdown: heading detection
rag.ingest("config.yaml")         # YAML/JSON: key-value text

# Or ingest an entire directory at once
results = rag.ingest_dir("./knowledge_base/", recursive=True)
print(len(results), "files ingested")

RAG Methodologies

8 retrieval strategies, one API

Set at init time or override per-query. The best method auto-selects based on available infrastructure.

hybrid_graph DEFAULT

4-channel fusion: Graph BFS + Vector cosine + Community global + Graph→Vector bridge. Weighted RRF. Best for multi-hop reasoning and structured knowledge bases.

naive

Pure vector cosine similarity. Default when no graph store is available. Fastest setup — no graph DB required.

vector_only

Vector semantic search + community summaries. Good middle ground — richer than naive but no graph traversal overhead.

graph_only

Graph traversal only. Best for fully structured knowledge bases where all facts are in the graph and semantic search adds noise.

parent_child

Child chunks for precise retrieval, parent chunks for rich generation context. Best for long documents where precision matters.

multi_query

LLM generates N query reformulations → retrieves all → merges via RRF. Best for complex, ambiguous, or multi-faceted questions.

microsoft_graphrag

Official MS GraphRAG local + global search. Wraps the microsoft/graphrag library. Best for MS-ecosystem workflows.

adaptive

Classifies each query and routes to the optimal sub-retriever. Factual → graph; broad → community; semantic → vector. Zero configuration needed.

python — per-query override

# Default method set at init
rag = RAGLibrary(rag_method="hybrid_graph", ...)

# Override per-query without reinitializing
answer    = rag.query("Who founded X?")
thematic  = rag.query("Summarize all themes", method="microsoft_graphrag")
precise   = rag.query("What's on page 12?", method="parent_child")
multi_hop = rag.query("Compare AI safety approaches", method="multi_query")

API Reference

Everything through one class

The full RAGLibrary public API.

Method	Returns	Description
`ingest(source, doc_id, status)`	`dict`	Ingest any file. Auto-detects format from extension.
`ingest_dir(directory, recursive)`	`list[dict]`	Batch ingest all supported files in a directory.
`ingest_text(text, doc_id, source_name)`	`dict`	Ingest raw text directly. Backward-compatible.
`ingest_batch(documents)`	`list[dict]`	Batch ingest from list of dicts.
`build_communities()`	`list[CommunityInfo]`	Run Leiden/Louvain community detection + summarization.
`remove_document(doc_id)`	`None`	Remove doc + all graph/vector data.
`sync(current_doc_ids)`	`list[str]`	Garbage collect docs not in provided set.
`query(question, top_k, method)`	`str`	Retrieve + generate. Returns answer string.
`query_with_sources(question, method)`	`dict`	Full answer + graph/vector/community sources + scores.
`retrieve(query, top_k, method)`	`list[RetrievalResult]`	Retrieval only, no generation.
`confirm(doc_id)`	`None`	Boost all triples from doc to confidence 1.0.
`deprecate(doc_id)`	`None`	Halve triple confidences. Penalizes retrieval scores.
`detect_conflicts(entity_name)`	`list[dict]`	Find contradictions for entity across sources.
`prune(threshold)`	`int`	Remove relations below confidence threshold.
`health_report()`	`dict`	Entity / relation / doc / community counts.
`register_loader(ext, cls)`	`None`	Register custom file loader for extension.
`register_embedder(name, cls)`	`None`	Register custom embedder by string key.
`register_retriever(name, cls)`	`None`	Register custom retriever by string key.

python — custom plugin example

from cognity_ai import RAGLibrary
from cognity_ai.loaders.base import BaseLoader
from cognity_ai.models.document import Document

class MyLoader(BaseLoader):
    def load(self, path: str) -> list[Document]:
        return [Document(doc_id="x", text=open(path).read())]

    @property
    def supported_extensions(self): return [".myext"]

rag = RAGLibrary(...)
rag.register_loader(".myext", MyLoader)
rag.ingest("proprietary_file.myext")

Installation

Install only what you need

Optional dependency groups keep your environment lean. Mix and match any combination.

Default (recommended)

pip install "cognity-ai[default]"

# Includes:
# Gemini LLM + embedder
# Neo4j graph store
# ChromaDB vector store
# spaCy NLP extraction
# PDF, DOCX, XLSX, PPTX loaders
# Pillow (for OCR)

OpenAI + Qdrant stack

pip install "cognity-ai[openai,qdrant,pdf,nlp]"

Anthropic + Pinecone

pip install "cognity-ai[anthropic,pinecone,office]"

AWS Bedrock + FAISS

pip install "cognity-ai[bedrock,faiss,pdf]"

Fully offline (zero API cost)

pip install "cognity-ai[sentence-transformers,faiss,networkx,nlp,ocr-local]"

# Then start Ollama:
ollama pull llama3
ollama serve

All providers + stores + loaders

pip install "cognity-ai[all]"

# After install, download spaCy model:
python -m spacy download en_core_web_trf
# or lightweight:
python -m spacy download en_core_web_sm

all available extras

# LLM providers
[gemini] [openai] [azure] [anthropic] [bedrock] [vertex-ai] [cohere] [ollama]

# Embedders
[sentence-transformers]

# Vector stores
[chroma] [qdrant] [pinecone] [faiss] [weaviate] [milvus] [pgvector] [azure-search]

# Graph stores
[neo4j] [networkx] [memgraph] [arangodb] [microsoft-graphrag]

# NLP extraction
[nlp]    # spaCy

# File loaders
[pdf] [office] [csv] [html] [yaml]

# OCR
[ocr]       # Pillow (for LLM vision)
[ocr-local] # pytesseract + Pillow

# Convenience bundles
[full-loaders] [default] [all]

Multimodal RAG

🎬 Multimodal RAG (Experimental)

⚠️ Experimental API — may change between versions

Extend cognity-ai beyond text: ingest images, videos, and audio files using multimodal embedding models and transcription providers. This module is in active development and ships as an optional add-on that must be installed separately. Feedback and bug reports are welcome.

🖼️

Image RAG

Embed images with CLIP, SigLIP, or BLIP-2 alongside your text corpus. Query with natural language to retrieve semantically relevant images and mixed image-text results.

CLIP SigLIP BLIP-2

🎥

Video RAG

Extract keyframes for visual embeddings and transcribe speech with Whisper. Retrieve specific moments with timestamp-level precision. Supports MP4, MOV, AVI, and more.

Whisper CLIP Frames Timestamps

🎧

Audio RAG

Transcribe audio files locally with Whisper or via cloud APIs. Chunked transcripts are embedded and retrievable just like text documents. Supports MP3, WAV, FLAC, and OGG.

Whisper Local AssemblyAI Deepgram

Code Examples

python — Image RAG

from cognity_ai.embedders.clip import CLIPEmbedder
from cognity_ai.stores.vector.chroma_multimodal import ChromaMultimodalStore
from cognity_ai.pipeline.image import ImageIngestionPipeline
from cognity_ai.retrievers.multimodal import ImageRetriever

# Set up multimodal components
embedder = CLIPEmbedder(model="openai/clip-vit-base-patch32")
store    = ChromaMultimodalStore(collection="images")

# Ingest images
pipeline = ImageIngestionPipeline(embedder=embedder, store=store)
pipeline.ingest("product_photo.jpg")
pipeline.ingest_dir("./images/")

# Retrieve with natural language
retriever = ImageRetriever(embedder=embedder, store=store)
results   = retriever.retrieve("a red sports car parked outdoors", top_k=5)
for r in results:
    print(r.source_path, r.score)

python — Video RAG

from cognity_ai.embedders.clip import CLIPEmbedder
from cognity_ai.ocr.whisper_local import WhisperLocalTranscriber
from cognity_ai.pipeline.video import VideoIngestionPipeline
from cognity_ai.retrievers.video import VideoRetriever

# Extract keyframes (visual) + transcribe speech
embedder     = CLIPEmbedder(model="openai/clip-vit-large-patch14")
transcriber  = WhisperLocalTranscriber(model_size="medium")

pipeline = VideoIngestionPipeline(
    embedder    =embedder,
    transcriber =transcriber,
    frame_interval=2,   # extract 1 keyframe every 2 seconds
)
pipeline.ingest("product_demo.mp4")

# Retrieve with timestamp-level precision
retriever = VideoRetriever(embedder=embedder, transcriber=transcriber)
results   = retriever.retrieve("product unboxing scene", top_k=3)
for r in results:
    print(f"[{r.timestamp_start:.1f}s – {r.timestamp_end:.1f}s] {r.transcript_snippet}")

python — Audio RAG

from cognity_ai.ocr.whisper_local import WhisperLocalTranscriber
from cognity_ai.pipeline.audio import AudioIngestionPipeline
from cognity_ai.retrievers.audio import AudioRetriever

# Transcribe audio locally — no API keys required
transcriber = WhisperLocalTranscriber(model_size="large-v3", language="en")

pipeline = AudioIngestionPipeline(transcriber=transcriber)
pipeline.ingest("earnings_call.mp3")
pipeline.ingest("interview.wav")
pipeline.ingest_dir("./podcasts/")     # batch

# Retrieve transcript segments by meaning
retriever = AudioRetriever(transcriber=transcriber)
results   = retriever.retrieve("revenue guidance for next quarter", top_k=5)
for r in results:
    print(f"[{r.source}] score={r.score:.3f}")
    print(r.text)

Multimodal Embedders

Embedder	Dimensions	Modalities	Best For	Install
`CLIPEmbedder`	512 / 768	Image + Text	General image-text retrieval, zero-shot classification	`cognity-ai[clip]`
`SigLIPEmbedder`	1152	Image + Text	Higher accuracy image understanding, multilingual	`cognity-ai[clip]`
`ImageBindEmbedder`	1024	Image + Text + Audio + Video + IMU + Depth	Cross-modal retrieval across 6 modalities in one space	`cognity-ai[multimodal]`
`BLIP2Embedder`	256	Image + Text	Document images, charts, screenshots with rich captions	`cognity-ai[multimodal]`

Transcription Providers

Provider	Class	Method	Install
Whisper (local)	`WhisperLocalTranscriber`	Offline — runs on CPU or CUDA GPU	`cognity-ai[whisper]`
OpenAI Whisper API	`WhisperAPITranscriber`	Cloud — `whisper-1` model	`cognity-ai[openai]`
AssemblyAI	`AssemblyAITranscriber`	Cloud — speaker diarisation, topic detection	`cognity-ai[multimodal]`
Deepgram	`DeepgramTranscriber`	Cloud — real-time streaming + batch	`cognity-ai[multimodal]`
Google Speech-to-Text	`GoogleSTTTranscriber`	Cloud — 125+ languages, medical models	`cognity-ai[vertex-ai]`

Install Commands

shell — multimodal extras

# CLIP image embedder (CLIP + SigLIP)
pip install cognity-ai[clip]

# Video frame extraction (ffmpeg-python + keyframe sampler)
pip install cognity-ai[video]

# Local Whisper transcription (runs fully offline)
pip install cognity-ai[whisper]

# Full multimodal bundle — all of the above + ImageBind + BLIP-2 + cloud transcribers
pip install cognity-ai[multimodal]

The RAG library that
works with everything

Everything you need for production RAG

Universal File Ingestion

Multimodal OCR

4-Channel Hybrid Retrieval

NLP-First Extraction

Zero-Config Smart Defaults

Plugin Architecture

Incremental Ingestion

Knowledge Lifecycle

From zero to RAG in minutes

Every major provider supported

🤖 LLM Generators

🔢 Embedders

🗄️ Vector Stores

🕸️ Graph Stores

👁️ OCR Providers

8 retrieval strategies, one API

How the pipeline works

Everything through one class

Install only what you need

Default (recommended)

OpenAI + Qdrant stack

Anthropic + Pinecone

AWS Bedrock + FAISS

Fully offline (zero API cost)

All providers + stores + loaders

Coming from hybrid_rag?

🎬 Multimodal RAG (Experimental)

Image RAG

Video RAG

Audio RAG

The RAG library that works with everything

Everything you need for production RAG

Universal File Ingestion

Multimodal OCR

4-Channel Hybrid Retrieval

NLP-First Extraction

Zero-Config Smart Defaults

Plugin Architecture

Incremental Ingestion

Knowledge Lifecycle

From zero to RAG in minutes

Every major provider supported

🤖 LLM Generators

🔢 Embedders

🗄️ Vector Stores

🕸️ Graph Stores

👁️ OCR Providers

8 retrieval strategies, one API

How the pipeline works

Everything through one class

Install only what you need

Default (recommended)

OpenAI + Qdrant stack

Anthropic + Pinecone

AWS Bedrock + FAISS

Fully offline (zero API cost)

All providers + stores + loaders

Coming from hybrid_rag?

🎬 Multimodal RAG (Experimental)

Image RAG

Video RAG

Audio RAG

The RAG library that
works with everything