Any LLM · Any vector store · Any graph DB · Any file format.
One API. Smart defaults. Drop it into any AI agent.
cognity-ai handles the plumbing so your agent handles the intelligence.
PDF, DOCX, XLSX, PPTX, CSV, HTML, JSON, YAML, TXT, MD, and images. Embedded images in documents are auto-extracted and OCR'd inline.
Gemini Vision, GPT-4o, Claude, Azure, Bedrock, or local Tesseract. Complex tables, handwriting, and mixed layouts handled natively.
Graph BFS + Vector semantic + Community global + Graph→Vector bridge. Fused with Reciprocal Rank Fusion. Best answer, every time.
spaCy handles ~70% of extraction free and locally. LLMs only augment what NLP misses — causal links, implicit associations.
Auto-selects the best methodology for your stack. hybrid_graph when Neo4j is available, falls back to naive gracefully. No YAML required.
Register custom loaders, chunkers, embedders, generators, and retrievers. Every component is an ABC — swap anything without touching the pipeline.
SHA-256 hash-based change detection. Unchanged documents are skipped entirely — zero API calls on re-ingest of an unchanged corpus.
Confirm, deprecate, detect conflicts, prune low-confidence triples. Full provenance tracking. Sources influence retrieval scores.
Install, configure, ingest, and query — no infrastructure required for the default setup.
from cognity_ai import RAGLibrary
# Smart defaults: Gemini LLM + embedder, Neo4j graph, ChromaDB vector
rag = RAGLibrary(
gemini_api_key="your-api-key",
neo4j_password="your-password",
)
# Ingest — auto-detects format from extension
rag.ingest("report.pdf")
rag.ingest("data.xlsx")
rag.ingest("slides.pptx")
rag.ingest_dir("./docs/") # recursive batch
# Build community graph (optional, enriches thematic queries)
rag.build_communities()
# Query
answer = rag.query("What are the key findings?")
# Query with full source attribution
result = rag.query_with_sources("Who founded Anthropic?")
print(result["answer"])
print(result["sources"]) # graph, vector, community
print(result["seed_entities"]) # ["Anthropic"]
from cognity_ai import RAGLibrary
rag = RAGLibrary(
llm="openai",
embedder="openai",
vector_store="qdrant",
graph_store="neo4j",
openai_api_key="sk-...",
neo4j_password="your-password",
)
rag.ingest("contract.docx")
answer = rag.query("What are the payment terms?")
# 100% offline — no API keys, no cloud services
# pip install "cognity-ai[sentence-transformers,faiss,networkx,nlp]"
from cognity_ai import RAGLibrary
rag = RAGLibrary(
llm="ollama", # local Ollama server
embedder="sentence_transformers",
vector_store="faiss",
graph_store="networkx",
ocr="tesseract",
)
rag.ingest("local_docs/report.pdf")
answer = rag.query("Summarize the main points")
# Ingest ANY file type — format auto-detected
rag.ingest("report.pdf") # PDF: page-aware, embedded images
rag.ingest("contract.docx") # Word: headings, tables, images
rag.ingest("financials.xlsx") # Excel: per-sheet text
rag.ingest("deck.pptx") # PowerPoint: slides + notes
rag.ingest("data.csv") # CSV: header-aware
rag.ingest("page.html") # HTML: tag stripping
rag.ingest("photo.jpg") # Image: Gemini Vision OCR
rag.ingest("scan.png") # Image: complex layout OCR
rag.ingest("notes.md") # Markdown: heading detection
rag.ingest("config.yaml") # YAML/JSON: key-value text
# Or ingest an entire directory at once
results = rag.ingest_dir("./knowledge_base/", recursive=True)
print(len(results), "files ingested")
Change providers by changing a single string. No code changes required.
Set at init time or override per-query. The best method auto-selects based on available infrastructure.
4-channel fusion: Graph BFS + Vector cosine + Community global + Graph→Vector bridge. Weighted RRF. Best for multi-hop reasoning and structured knowledge bases.
Pure vector cosine similarity. Default when no graph store is available. Fastest setup — no graph DB required.
Vector semantic search + community summaries. Good middle ground — richer than naive but no graph traversal overhead.
Graph traversal only. Best for fully structured knowledge bases where all facts are in the graph and semantic search adds noise.
Child chunks for precise retrieval, parent chunks for rich generation context. Best for long documents where precision matters.
LLM generates N query reformulations → retrieves all → merges via RRF. Best for complex, ambiguous, or multi-faceted questions.
Official MS GraphRAG local + global search. Wraps the microsoft/graphrag library. Best for MS-ecosystem workflows.
Classifies each query and routes to the optimal sub-retriever. Factual → graph; broad → community; semantic → vector. Zero configuration needed.
# Default method set at init
rag = RAGLibrary(rag_method="hybrid_graph", ...)
# Override per-query without reinitializing
answer = rag.query("Who founded X?")
thematic = rag.query("Summarize all themes", method="microsoft_graphrag")
precise = rag.query("What's on page 12?", method="parent_child")
multi_hop = rag.query("Compare AI safety approaches", method="multi_query")
Every component is swappable. Every stage is observable.
The full RAGLibrary public API.
| Method | Returns | Description |
|---|---|---|
ingest(source, doc_id, status) | dict | Ingest any file. Auto-detects format from extension. |
ingest_dir(directory, recursive) | list[dict] | Batch ingest all supported files in a directory. |
ingest_text(text, doc_id, source_name) | dict | Ingest raw text directly. Backward-compatible. |
ingest_batch(documents) | list[dict] | Batch ingest from list of dicts. |
build_communities() | list[CommunityInfo] | Run Leiden/Louvain community detection + summarization. |
remove_document(doc_id) | None | Remove doc + all graph/vector data. |
sync(current_doc_ids) | list[str] | Garbage collect docs not in provided set. |
query(question, top_k, method) | str | Retrieve + generate. Returns answer string. |
query_with_sources(question, method) | dict | Full answer + graph/vector/community sources + scores. |
retrieve(query, top_k, method) | list[RetrievalResult] | Retrieval only, no generation. |
confirm(doc_id) | None | Boost all triples from doc to confidence 1.0. |
deprecate(doc_id) | None | Halve triple confidences. Penalizes retrieval scores. |
detect_conflicts(entity_name) | list[dict] | Find contradictions for entity across sources. |
prune(threshold) | int | Remove relations below confidence threshold. |
health_report() | dict | Entity / relation / doc / community counts. |
register_loader(ext, cls) | None | Register custom file loader for extension. |
register_embedder(name, cls) | None | Register custom embedder by string key. |
register_retriever(name, cls) | None | Register custom retriever by string key. |
from cognity_ai import RAGLibrary
from cognity_ai.loaders.base import BaseLoader
from cognity_ai.models.document import Document
class MyLoader(BaseLoader):
def load(self, path: str) -> list[Document]:
return [Document(doc_id="x", text=open(path).read())]
@property
def supported_extensions(self): return [".myext"]
rag = RAGLibrary(...)
rag.register_loader(".myext", MyLoader)
rag.ingest("proprietary_file.myext")
Optional dependency groups keep your environment lean. Mix and match any combination.
pip install "cognity-ai[default]"
# Includes:
# Gemini LLM + embedder
# Neo4j graph store
# ChromaDB vector store
# spaCy NLP extraction
# PDF, DOCX, XLSX, PPTX loaders
# Pillow (for OCR)
pip install "cognity-ai[openai,qdrant,pdf,nlp]"
pip install "cognity-ai[anthropic,pinecone,office]"
pip install "cognity-ai[bedrock,faiss,pdf]"
pip install "cognity-ai[sentence-transformers,faiss,networkx,nlp,ocr-local]"
# Then start Ollama:
ollama pull llama3
ollama serve
pip install "cognity-ai[all]"
# After install, download spaCy model:
python -m spacy download en_core_web_trf
# or lightweight:
python -m spacy download en_core_web_sm
# LLM providers
[gemini] [openai] [azure] [anthropic] [bedrock] [vertex-ai] [cohere] [ollama]
# Embedders
[sentence-transformers]
# Vector stores
[chroma] [qdrant] [pinecone] [faiss] [weaviate] [milvus] [pgvector] [azure-search]
# Graph stores
[neo4j] [networkx] [memgraph] [arangodb] [microsoft-graphrag]
# NLP extraction
[nlp] # spaCy
# File loaders
[pdf] [office] [csv] [html] [yaml]
# OCR
[ocr] # Pillow (for LLM vision)
[ocr-local] # pytesseract + Pillow
# Convenience bundles
[full-loaders] [default] [all]
The old build_pipeline() API still works — it'll emit a deprecation warning and delegate to RAGLibrary. Migrate at your own pace.
Before (deprecated)
from hybrid_rag.main import build_pipeline
c = build_pipeline()
c["pipeline"].ingest(
doc_id="d1",
text="...",
source_name="report",
)
answer = c["retriever"].query(
"What is X?"
)
After (new)
from cognity_ai import RAGLibrary
rag = RAGLibrary(
gemini_api_key="...",
neo4j_password="...",
)
rag.ingest_text(
"...",
doc_id="d1",
source_name="report",
)
answer = rag.query("What is X?")
Extend cognity-ai beyond text: ingest images, videos, and audio files using multimodal embedding models and transcription providers. This module is in active development and ships as an optional add-on that must be installed separately. Feedback and bug reports are welcome.
Embed images with CLIP, SigLIP, or BLIP-2 alongside your text corpus. Query with natural language to retrieve semantically relevant images and mixed image-text results.
Extract keyframes for visual embeddings and transcribe speech with Whisper. Retrieve specific moments with timestamp-level precision. Supports MP4, MOV, AVI, and more.
Transcribe audio files locally with Whisper or via cloud APIs. Chunked transcripts are embedded and retrievable just like text documents. Supports MP3, WAV, FLAC, and OGG.
from cognity_ai.embedders.clip import CLIPEmbedder
from cognity_ai.stores.vector.chroma_multimodal import ChromaMultimodalStore
from cognity_ai.pipeline.image import ImageIngestionPipeline
from cognity_ai.retrievers.multimodal import ImageRetriever
# Set up multimodal components
embedder = CLIPEmbedder(model="openai/clip-vit-base-patch32")
store = ChromaMultimodalStore(collection="images")
# Ingest images
pipeline = ImageIngestionPipeline(embedder=embedder, store=store)
pipeline.ingest("product_photo.jpg")
pipeline.ingest_dir("./images/")
# Retrieve with natural language
retriever = ImageRetriever(embedder=embedder, store=store)
results = retriever.retrieve("a red sports car parked outdoors", top_k=5)
for r in results:
print(r.source_path, r.score)
from cognity_ai.embedders.clip import CLIPEmbedder
from cognity_ai.ocr.whisper_local import WhisperLocalTranscriber
from cognity_ai.pipeline.video import VideoIngestionPipeline
from cognity_ai.retrievers.video import VideoRetriever
# Extract keyframes (visual) + transcribe speech
embedder = CLIPEmbedder(model="openai/clip-vit-large-patch14")
transcriber = WhisperLocalTranscriber(model_size="medium")
pipeline = VideoIngestionPipeline(
embedder =embedder,
transcriber =transcriber,
frame_interval=2, # extract 1 keyframe every 2 seconds
)
pipeline.ingest("product_demo.mp4")
# Retrieve with timestamp-level precision
retriever = VideoRetriever(embedder=embedder, transcriber=transcriber)
results = retriever.retrieve("product unboxing scene", top_k=3)
for r in results:
print(f"[{r.timestamp_start:.1f}s – {r.timestamp_end:.1f}s] {r.transcript_snippet}")
from cognity_ai.ocr.whisper_local import WhisperLocalTranscriber
from cognity_ai.pipeline.audio import AudioIngestionPipeline
from cognity_ai.retrievers.audio import AudioRetriever
# Transcribe audio locally — no API keys required
transcriber = WhisperLocalTranscriber(model_size="large-v3", language="en")
pipeline = AudioIngestionPipeline(transcriber=transcriber)
pipeline.ingest("earnings_call.mp3")
pipeline.ingest("interview.wav")
pipeline.ingest_dir("./podcasts/") # batch
# Retrieve transcript segments by meaning
retriever = AudioRetriever(transcriber=transcriber)
results = retriever.retrieve("revenue guidance for next quarter", top_k=5)
for r in results:
print(f"[{r.source}] score={r.score:.3f}")
print(r.text)
| Embedder | Dimensions | Modalities | Best For | Install |
|---|---|---|---|---|
CLIPEmbedder |
512 / 768 | Image + Text | General image-text retrieval, zero-shot classification | cognity-ai[clip] |
SigLIPEmbedder |
1152 | Image + Text | Higher accuracy image understanding, multilingual | cognity-ai[clip] |
ImageBindEmbedder |
1024 | Image + Text + Audio + Video + IMU + Depth | Cross-modal retrieval across 6 modalities in one space | cognity-ai[multimodal] |
BLIP2Embedder |
256 | Image + Text | Document images, charts, screenshots with rich captions | cognity-ai[multimodal] |
| Provider | Class | Method | Install |
|---|---|---|---|
| Whisper (local) | WhisperLocalTranscriber |
Offline — runs on CPU or CUDA GPU | cognity-ai[whisper] |
| OpenAI Whisper API | WhisperAPITranscriber |
Cloud — whisper-1 model |
cognity-ai[openai] |
| AssemblyAI | AssemblyAITranscriber |
Cloud — speaker diarisation, topic detection | cognity-ai[multimodal] |
| Deepgram | DeepgramTranscriber |
Cloud — real-time streaming + batch | cognity-ai[multimodal] |
| Google Speech-to-Text | GoogleSTTTranscriber |
Cloud — 125+ languages, medical models | cognity-ai[vertex-ai] |
# CLIP image embedder (CLIP + SigLIP)
pip install cognity-ai[clip]
# Video frame extraction (ffmpeg-python + keyframe sampler)
pip install cognity-ai[video]
# Local Whisper transcription (runs fully offline)
pip install cognity-ai[whisper]
# Full multimodal bundle — all of the above + ImageBind + BLIP-2 + cloud transcribers
pip install cognity-ai[multimodal]