Configuration
LibraryConfig and all provider configuration dataclasses for cognity-ai.
LibraryConfig
The top-level configuration dataclass. Pass an instance to RAGLibrary(config=...) to control every aspect of the pipeline. All fields have sensible defaults so you only need to override what you care about.
from cognity_ai.config import LibraryConfig
@dataclass
class LibraryConfig:
# Pipeline component selectors
rag_method: str = "hybrid_graph"
chunker: str = "sentence"
embedder: str = "gemini"
vector_store: str = "chroma"
graph_store: str = "neo4j"
llm: str = "gemini"
extraction: str = "hybrid"
ocr: str = "gemini_vision"
page_index: str = "hybrid"
# Provider sub-configs
gemini: GeminiConfig = field(default_factory=GeminiConfig)
openai: OpenAIConfig = field(default_factory=OpenAIConfig)
anthropic: AnthropicConfig = field(default_factory=AnthropicConfig)
azure_openai: AzureOpenAIConfig = field(default_factory=AzureOpenAIConfig)
bedrock: BedrockConfig = field(default_factory=BedrockConfig)
cohere: CohereConfig = field(default_factory=CohereConfig)
ollama: OllamaConfig = field(default_factory=OllamaConfig)
vertex_ai: VertexAIConfig = field(default_factory=VertexAIConfig)
neo4j: Neo4jConfig = field(default_factory=Neo4jConfig)
chroma: ChromaConfig = field(default_factory=ChromaConfig)
qdrant: QdrantConfig = field(default_factory=QdrantConfig)
pinecone: PineconeConfig = field(default_factory=PineconeConfig)
Fields
| Field | Type | Default | Description |
|---|---|---|---|
rag_method |
str |
"hybrid_graph" |
RAG pipeline strategy. Options: "naive", "vector", "graph", "hybrid_graph". |
chunker |
str |
"sentence" |
Text chunking strategy. See Chunkers for all options. |
embedder |
str |
"gemini" |
Embedding provider key. Options: "gemini", "openai", "cohere", "ollama", "bedrock", "vertex_ai". |
vector_store |
str |
"chroma" |
Vector store backend. Options: "chroma", "qdrant", "pinecone", "milvus", "pgvector", "weaviate". |
graph_store |
str |
"neo4j" |
Graph database backend. Options: "neo4j", "in_memory". |
llm |
str |
"gemini" |
Language model provider for generation. Options: "gemini", "openai", "anthropic", "azure_openai", "bedrock", "cohere", "ollama", "vertex_ai". |
extraction |
str |
"hybrid" |
Entity/relationship extraction mode. Options: "llm", "spacy", "hybrid". |
ocr |
str |
"gemini_vision" |
OCR backend for image-heavy documents. Options: "gemini_vision", "aws_textract", "google_vision", "tesseract". |
page_index |
str |
"hybrid" |
Page-level indexing strategy. Options: "dense", "sparse", "hybrid". |
gemini |
GeminiConfig |
GeminiConfig() |
Gemini provider settings (API key, models, temperature). |
openai |
OpenAIConfig |
OpenAIConfig() |
OpenAI provider settings. |
anthropic |
AnthropicConfig |
AnthropicConfig() |
Anthropic Claude provider settings. |
azure_openai |
AzureOpenAIConfig |
AzureOpenAIConfig() |
Azure OpenAI deployment settings. |
bedrock |
BedrockConfig |
BedrockConfig() |
AWS Bedrock region and model settings. |
cohere |
CohereConfig |
CohereConfig() |
Cohere provider settings. |
ollama |
OllamaConfig |
OllamaConfig() |
Ollama local server settings. |
vertex_ai |
VertexAIConfig |
VertexAIConfig() |
Google Vertex AI project and location settings. |
neo4j |
Neo4jConfig |
Neo4jConfig() |
Neo4j connection settings (URI, credentials, database). |
chroma |
ChromaConfig |
ChromaConfig() |
ChromaDB persistence directory and collection name. |
qdrant |
QdrantConfig |
QdrantConfig() |
Qdrant server URL, API key, and collection settings. |
pinecone |
PineconeConfig |
PineconeConfig() |
Pinecone index, environment, and namespace settings. |
Provider Config Dataclasses
Each provider sub-config is a plain Python @dataclass with typed fields. Fields without defaults are required when that provider is active.
GeminiConfig
Used when embedder="gemini" or llm="gemini".
@dataclass
class GeminiConfig:
api_key: str = "" # Required — GEMINI_API_KEY
embedding_model: str = "text-embedding-004"
generation_model: str = "gemini-2.0-flash"
temperature: float = 0.1
OpenAIConfig
Used when embedder="openai" or llm="openai".
@dataclass
class OpenAIConfig:
api_key: str = "" # Required — OPENAI_API_KEY
embedding_model: str = "text-embedding-3-small"
generation_model: str = "gpt-4o"
temperature: float = 0.1
AnthropicConfig
Used when llm="anthropic". Anthropic models are generation-only; use a different provider for embeddings.
@dataclass
class AnthropicConfig:
api_key: str = "" # Required — ANTHROPIC_API_KEY
model: str = "claude-3-5-sonnet-20241022"
max_tokens: int = 4096
temperature: float = 0.1
AzureOpenAIConfig
Used when llm="azure_openai" or embedder="azure_openai". All fields except api_version are required.
@dataclass
class AzureOpenAIConfig:
endpoint: str = "" # Required — e.g. https://my.openai.azure.com/
api_key: str = "" # Required — AZURE_OPENAI_API_KEY
api_version: str = "2024-02-15-preview"
deployment_name: str = "" # Required — chat deployment
embedding_deployment: str = "" # Required — embedding deployment
BedrockConfig
Used when llm="bedrock" or embedder="bedrock". Uses boto3 credentials from the environment.
@dataclass
class BedrockConfig:
region: str = "us-east-1"
embedding_model: str = "amazon.titan-embed-text-v2:0"
generation_model: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"
CohereConfig
Used when llm="cohere" or embedder="cohere".
@dataclass
class CohereConfig:
api_key: str = "" # Required — CO_API_KEY
embedding_model: str = "embed-english-v3.0"
generation_model: str = "command-r-plus"
input_type: str = "search_document" # or "search_query" at query time
OllamaConfig
Used when llm="ollama" or embedder="ollama". Runs fully locally — no API key required.
@dataclass
class OllamaConfig:
base_url: str = "http://localhost:11434"
embedding_model: str = "nomic-embed-text"
generation_model: str = "llama3"
VertexAIConfig
Used when llm="vertex_ai" or embedder="vertex_ai". Requires ADC or service account credentials.
@dataclass
class VertexAIConfig:
project: str = "" # Required — GCP project ID
location: str = "us-central1"
embedding_model: str = "text-embedding-005"
generation_model: str = "gemini-1.5-pro"
Neo4jConfig
Used when graph_store="neo4j".
@dataclass
class Neo4jConfig:
uri: str = "bolt://localhost:7687"
user: str = "neo4j"
password: str = "" # Required — NEO4J_PASSWORD
database: str = "neo4j"
ChromaConfig
Used when vector_store="chroma". Data is persisted locally — no server required.
@dataclass
class ChromaConfig:
persist_directory: str = ".chroma"
collection_name: str = "cognity-ai"
QdrantConfig
Used when vector_store="qdrant". Supports both self-hosted and Qdrant Cloud.
@dataclass
class QdrantConfig:
url: str = "http://localhost:6333"
api_key: str | None = None # Required for Qdrant Cloud
collection_name: str = "cognity-ai"
on_disk: bool = True # Persist vectors to disk
PineconeConfig
Used when vector_store="pinecone".
@dataclass
class PineconeConfig:
api_key: str = "" # Required — PINECONE_API_KEY
index_name: str = "cognity-ai"
environment: str = "us-east-1-aws"
namespace: str = "" # Optional namespace within index
Usage Example
Mix and match providers freely. The example below uses OpenAI for embeddings and generation with Qdrant as the vector store:
from cognity_ai import RAGLibrary
from cognity_ai.config import LibraryConfig, OpenAIConfig, QdrantConfig
config = LibraryConfig(
embedder="openai",
vector_store="qdrant",
llm="openai",
openai=OpenAIConfig(
api_key="sk-...",
generation_model="gpt-4o",
),
qdrant=QdrantConfig(
url="http://localhost:6333",
collection_name="my_docs",
),
)
rag = RAGLibrary(config=config)
# Ingest documents
rag.ingest("./documents/")
# Query
result = rag.query("What are the key findings?")
print(result.answer)
OPENAI_API_KEY, GEMINI_API_KEY) and omit api_key from the config entirely. cognity-ai reads these automatically when the field is left as its empty-string default.
For a fully offline setup using Ollama and Chroma:
from cognity_ai import RAGLibrary
from cognity_ai.config import LibraryConfig, OllamaConfig, ChromaConfig
config = LibraryConfig(
embedder="ollama",
vector_store="chroma",
llm="ollama",
graph_store="in_memory",
ollama=OllamaConfig(
base_url="http://localhost:11434",
generation_model="llama3",
),
chroma=ChromaConfig(persist_directory="./.cognity_ai_data"),
)
rag = RAGLibrary(config=config)