Getting Started
Install cognity-ai, configure your providers, and run your first RAG query.
Installation
cognity-ai requires Python 3.11+. Choose an install profile that matches your stack, or install everything at once.
# Gemini + Neo4j + ChromaDB + spaCy + all file loaders
pip install -e ".[default]"
# OpenAI GPT + Qdrant + Neo4j + NLP + PDF + Office
pip install cognity-ai[openai,qdrant,neo4j,nlp,pdf,office]
# SentenceTransformers + FAISS + NetworkX — no cloud required
pip install cognity-ai[sentence-transformers,faiss,networkx,nlp,pdf]
# All providers, all stores, all formats
pip install -e ".[all]"
nlp, download the spaCy English model:
python -m spacy download en_core_web_sm
Configuration
All configuration is passed directly to the RAGLibrary(...) constructor as keyword arguments. No YAML files, no environment boilerplate — just Python.
| Parameter | Type | Default | Description |
|---|---|---|---|
llm | str | "gemini" | LLM generator provider key |
embedder | str | "gemini" | Embedding provider key |
vector_store | str | "chroma" | Vector store backend key |
graph_store | str | "neo4j" | Graph database backend key |
ocr | str | "gemini_vision" | OCR provider for image text extraction |
rag_method | str | "hybrid_graph" | Default retrieval strategy |
extraction | str | "hybrid" | Entity/relation extraction mode (nlp, llm, hybrid) |
chunker | str | "sentence" | Text chunking strategy |
page_index | str | "hybrid" | Page number detection strategy |
Configuration examples
from cognity-ai import RAGLibrary
# Default stack — Gemini + Neo4j + ChromaDB
rag = RAGLibrary()
# OpenAI stack
rag = RAGLibrary(
llm="openai",
embedder="openai",
vector_store="qdrant",
graph_store="neo4j",
)
# Fully local / offline stack
rag = RAGLibrary(
llm="ollama",
embedder="sentence_transformers",
vector_store="faiss",
graph_store="networkx",
ocr="tesseract",
rag_method="vector_only",
)
# Anthropic + Bedrock embeddings
rag = RAGLibrary(
llm="anthropic",
embedder="bedrock",
vector_store="pinecone",
graph_store="neo4j",
)
Your First RAG Pipeline
Five steps from zero to a working, production-ready RAG pipeline.
-
1
Import & Initialize
Create a
RAGLibraryinstance with your preferred providers.pythonfrom cognity-ai import RAGLibrary rag = RAGLibrary() # uses smart defaults -
2
Ingest Documents
Load individual files or recursively ingest an entire directory.
python# Individual files — any supported format rag.ingest("report.pdf") rag.ingest("data.xlsx") rag.ingest("notes.docx") rag.ingest("diagram.png") # OCR'd automatically # Batch ingest an entire directory (recursive) rag.ingest_dir("./knowledge-base") -
3
Build Communities (optional but recommended)
Community detection groups related entities into clusters, enabling global summarization queries and dramatically improving recall on broad questions.
pythonrag.build_communities() # Leiden algorithm via GDS -
4
Query
Ask natural language questions. The default
hybrid_graphmethod fuses 4 retrieval channels for maximum accuracy.pythonanswer = rag.query("What are the main revenue drivers in Q3?") print(answer) # Per-query method override answer = rag.query("Summarize all themes", method="multi_query") -
5
Retrieve with Sources
Get structured results with source metadata, page numbers, confidence scores, and relevance ranks.
pythonresult = rag.retrieve("key financial metrics", top_k=5) for chunk in result.chunks: print(chunk.text) print(f"Source: {chunk.source} | Page: {chunk.page} | Score: {chunk.score:.3f}")
File Format Support
cognity-ai ships with loaders for 14 file formats out of the box.
| Format | Extensions | Loader | Key Features |
|---|---|---|---|
| Plain Text | .txt, .md | TextLoader | UTF-8, encoding detection, Markdown stripping |
.pdf | PDFLoader | Text, tables, embedded images, page boundaries, OCR fallback | |
| Word | .docx, .doc | DocxLoader | Paragraphs, tables, headers, embedded images auto-OCR'd |
| Excel | .xlsx, .xls | ExcelLoader | Multi-sheet, cell types, named ranges, formula values |
| CSV / TSV | .csv, .tsv | CSVLoader | Dialect detection, header inference, chunking by row count |
| PowerPoint | .pptx, .ppt | PPTXLoader | Slide text, speaker notes, embedded images |
| HTML | .html, .htm | HTMLLoader | Tag stripping, link extraction, heading hierarchy |
| JSON | .json | JSONLoader | Nested object flattening, JSON Lines, key-path labels |
| YAML | .yaml, .yml | JSONLoader | Parsed to dict, same flattening as JSON |
| Images | .jpg, .png, .webp, .tiff, .bmp | ImageLoader | Full OCR, multi-page TIFF, optional base64 embedding |
OCR Configuration
cognity-ai uses a fallback chain for OCR. If the primary provider fails (missing key, rate limit, unsupported format), it automatically tries the next in chain.
| Provider | Key | Method | Best For | Install |
|---|---|---|---|---|
| Gemini Vision | "gemini_vision" | Multimodal LLM | Complex tables, mixed layouts | google-generativeai |
| OpenAI Vision | "openai_vision" | GPT-4o Vision | General purpose, high accuracy | openai |
| Claude Vision | "anthropic_vision" | Claude multimodal | Dense documents, reasoning | anthropic |
| Azure Vision | "azure_vision" | Azure AI Vision | Enterprise, compliance | azure-ai-vision |
| Bedrock Vision | "bedrock_vision" | Claude via Bedrock | AWS-native environments | boto3 |
| Tesseract | "tesseract" | Local OCR engine | Offline, no API cost | pytesseract |
# Set OCR provider at construction time
rag = RAGLibrary(ocr="openai_vision")
# Or override per-ingest for a specific file
rag.ingest("scanned_report.pdf", ocr="tesseract")
Knowledge Graph
During ingestion, cognity-ai extracts entities and relationships from every document and builds a knowledge graph. The graph enables structured traversal and global summarization that pure vector search cannot achieve.
Entity & Relation Extraction
The HybridExtractor runs spaCy NLP first (fast, local, free) and uses LLM augmentation only for semantic gaps that NLP misses — causal links, temporal dependencies, implicit associations.
Community Detection
Calling build_communities() runs the Leiden algorithm via Neo4j GDS to cluster related entities. Community summaries are stored and used as a high-level retrieval channel — crucial for broad "summarize everything about X" queries.
# Build entity communities after ingestion
rag.build_communities()
# Check graph health — entity counts, orphans, density
report = rag.health_report()
print(report)
# Detect conflicting facts in the knowledge graph
conflicts = rag.detect_conflicts()
for c in conflicts:
print(f"Conflict: {c.description} (confidence: {c.confidence:.2f})")
Incremental Updates
cognity-ai computes a SHA-256 hash of every ingested file. On re-ingest, unchanged files are skipped with zero API calls. Only modified or new files are processed.
# Re-ingest a directory — only changed files are processed
rag.ingest_dir("./knowledge-base")
# Confirm a fact — promotes it to higher confidence
rag.confirm(entity_id="entity_123")
# Deprecate outdated knowledge — marks it as superseded
rag.deprecate(source="old_report_2023.pdf")
# Prune all deprecated or low-confidence knowledge
rag.prune(min_confidence=0.5)
rag.prune() permanently removes nodes and vectors from the stores. Run rag.health_report() first to review what will be removed.
Plugin System
Every component in cognity-ai is an ABC registered in the PluginRegistry. You can register custom loaders, embedders, chunkers, generators, and retrievers by string key — your code drops into the pipeline without touching any core files.
from cognity_ai.loaders.base import BaseLoader
from cognity_ai.registry import PluginRegistry
from cognity_ai.models.document import Document
class MyNotionLoader(BaseLoader):
"""Load pages from Notion via API."""
def load(self, source: str) -> list[Document]:
# Fetch from Notion API ...
return [Document(text=page_text, source=source)]
# Register with a string key and supported extension
PluginRegistry.register_loader("notion", ".notion", MyNotionLoader)
# Now use it like any built-in loader
rag.ingest("page_id.notion")
PluginRegistry.register_embedder(), register_retriever(), register_chunker(), and register_generator() for the other component types. All follow the same pattern.
Next Steps
You have the basics. Here's where to go next.