Configuration — cognity-ai API

LibraryConfig

The top-level configuration dataclass. Pass an instance to RAGLibrary(config=...) to control every aspect of the pipeline. All fields have sensible defaults so you only need to override what you care about.

Python

from cognity_ai.config import LibraryConfig

@dataclass
class LibraryConfig:
    # Pipeline component selectors
    rag_method:   str = "hybrid_graph"
    chunker:      str = "sentence"
    embedder:     str = "gemini"
    vector_store: str = "chroma"
    graph_store:  str = "neo4j"
    llm:          str = "gemini"
    extraction:   str = "hybrid"
    ocr:          str = "gemini_vision"
    page_index:   str = "hybrid"

    # Provider sub-configs
    gemini:       GeminiConfig      = field(default_factory=GeminiConfig)
    openai:       OpenAIConfig      = field(default_factory=OpenAIConfig)
    anthropic:    AnthropicConfig   = field(default_factory=AnthropicConfig)
    azure_openai: AzureOpenAIConfig = field(default_factory=AzureOpenAIConfig)
    bedrock:      BedrockConfig     = field(default_factory=BedrockConfig)
    cohere:       CohereConfig      = field(default_factory=CohereConfig)
    ollama:       OllamaConfig      = field(default_factory=OllamaConfig)
    vertex_ai:    VertexAIConfig    = field(default_factory=VertexAIConfig)
    neo4j:        Neo4jConfig       = field(default_factory=Neo4jConfig)
    chroma:       ChromaConfig      = field(default_factory=ChromaConfig)
    qdrant:       QdrantConfig      = field(default_factory=QdrantConfig)
    pinecone:     PineconeConfig    = field(default_factory=PineconeConfig)

Fields

Field	Type	Default	Description
`rag_method`	`str`	`"hybrid_graph"`	RAG pipeline strategy. Options: `"naive"`, `"vector"`, `"graph"`, `"hybrid_graph"`.
`chunker`	`str`	`"sentence"`	Text chunking strategy. See Chunkers for all options.
`embedder`	`str`	`"gemini"`	Embedding provider key. Options: `"gemini"`, `"openai"`, `"cohere"`, `"ollama"`, `"bedrock"`, `"vertex_ai"`.
`vector_store`	`str`	`"chroma"`	Vector store backend. Options: `"chroma"`, `"qdrant"`, `"pinecone"`, `"milvus"`, `"pgvector"`, `"weaviate"`.
`graph_store`	`str`	`"neo4j"`	Graph database backend. Options: `"neo4j"`, `"in_memory"`.
`llm`	`str`	`"gemini"`	Language model provider for generation. Options: `"gemini"`, `"openai"`, `"anthropic"`, `"azure_openai"`, `"bedrock"`, `"cohere"`, `"ollama"`, `"vertex_ai"`.
`extraction`	`str`	`"hybrid"`	Entity/relationship extraction mode. Options: `"llm"`, `"spacy"`, `"hybrid"`.
`ocr`	`str`	`"gemini_vision"`	OCR backend for image-heavy documents. Options: `"gemini_vision"`, `"aws_textract"`, `"google_vision"`, `"tesseract"`.
`page_index`	`str`	`"hybrid"`	Page-level indexing strategy. Options: `"dense"`, `"sparse"`, `"hybrid"`.
`gemini`	`GeminiConfig`	`GeminiConfig()`	Gemini provider settings (API key, models, temperature).
`openai`	`OpenAIConfig`	`OpenAIConfig()`	OpenAI provider settings.
`anthropic`	`AnthropicConfig`	`AnthropicConfig()`	Anthropic Claude provider settings.
`azure_openai`	`AzureOpenAIConfig`	`AzureOpenAIConfig()`	Azure OpenAI deployment settings.
`bedrock`	`BedrockConfig`	`BedrockConfig()`	AWS Bedrock region and model settings.
`cohere`	`CohereConfig`	`CohereConfig()`	Cohere provider settings.
`ollama`	`OllamaConfig`	`OllamaConfig()`	Ollama local server settings.
`vertex_ai`	`VertexAIConfig`	`VertexAIConfig()`	Google Vertex AI project and location settings.
`neo4j`	`Neo4jConfig`	`Neo4jConfig()`	Neo4j connection settings (URI, credentials, database).
`chroma`	`ChromaConfig`	`ChromaConfig()`	ChromaDB persistence directory and collection name.
`qdrant`	`QdrantConfig`	`QdrantConfig()`	Qdrant server URL, API key, and collection settings.
`pinecone`	`PineconeConfig`	`PineconeConfig()`	Pinecone index, environment, and namespace settings.

Provider Config Dataclasses

Each provider sub-config is a plain Python @dataclass with typed fields. Fields without defaults are required when that provider is active.

GeminiConfig

Used when embedder="gemini" or llm="gemini".

Python

@dataclass
class GeminiConfig:
    api_key:          str   = ""                        # Required — GEMINI_API_KEY
    embedding_model:  str   = "text-embedding-004"
    generation_model: str   = "gemini-2.0-flash"
    temperature:      float = 0.1

OpenAIConfig

Used when embedder="openai" or llm="openai".

Python

@dataclass
class OpenAIConfig:
    api_key:          str   = ""                          # Required — OPENAI_API_KEY
    embedding_model:  str   = "text-embedding-3-small"
    generation_model: str   = "gpt-4o"
    temperature:      float = 0.1

AnthropicConfig

Used when llm="anthropic". Anthropic models are generation-only; use a different provider for embeddings.

Python

@dataclass
class AnthropicConfig:
    api_key:     str   = ""                               # Required — ANTHROPIC_API_KEY
    model:       str   = "claude-3-5-sonnet-20241022"
    max_tokens:  int   = 4096
    temperature: float = 0.1

AzureOpenAIConfig

Used when llm="azure_openai" or embedder="azure_openai". All fields except api_version are required.

Python

@dataclass
class AzureOpenAIConfig:
    endpoint:             str = ""                        # Required — e.g. https://my.openai.azure.com/
    api_key:              str = ""                        # Required — AZURE_OPENAI_API_KEY
    api_version:          str = "2024-02-15-preview"
    deployment_name:      str = ""                        # Required — chat deployment
    embedding_deployment: str = ""                        # Required — embedding deployment

BedrockConfig

Used when llm="bedrock" or embedder="bedrock". Uses boto3 credentials from the environment.

Python

@dataclass
class BedrockConfig:
    region:           str = "us-east-1"
    embedding_model:  str = "amazon.titan-embed-text-v2:0"
    generation_model: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"

CohereConfig

Used when llm="cohere" or embedder="cohere".

Python

@dataclass
class CohereConfig:
    api_key:          str = ""                            # Required — CO_API_KEY
    embedding_model:  str = "embed-english-v3.0"
    generation_model: str = "command-r-plus"
    input_type:       str = "search_document"             # or "search_query" at query time

OllamaConfig

Used when llm="ollama" or embedder="ollama". Runs fully locally — no API key required.

Python

@dataclass
class OllamaConfig:
    base_url:         str = "http://localhost:11434"
    embedding_model:  str = "nomic-embed-text"
    generation_model: str = "llama3"

VertexAIConfig

Used when llm="vertex_ai" or embedder="vertex_ai". Requires ADC or service account credentials.

Python

@dataclass
class VertexAIConfig:
    project:          str = ""                            # Required — GCP project ID
    location:         str = "us-central1"
    embedding_model:  str = "text-embedding-005"
    generation_model: str = "gemini-1.5-pro"

Neo4jConfig

Used when graph_store="neo4j".

Python

@dataclass
class Neo4jConfig:
    uri:      str = "bolt://localhost:7687"
    user:     str = "neo4j"
    password: str = ""                                   # Required — NEO4J_PASSWORD
    database: str = "neo4j"

ChromaConfig

Used when vector_store="chroma". Data is persisted locally — no server required.

Python

@dataclass
class ChromaConfig:
    persist_directory: str = ".chroma"
    collection_name:   str = "cognity-ai"

QdrantConfig

Used when vector_store="qdrant". Supports both self-hosted and Qdrant Cloud.

Python

@dataclass
class QdrantConfig:
    url:             str            = "http://localhost:6333"
    api_key:         str | None    = None                  # Required for Qdrant Cloud
    collection_name: str            = "cognity-ai"
    on_disk:         bool           = True                  # Persist vectors to disk

PineconeConfig

Used when vector_store="pinecone".

Python

@dataclass
class PineconeConfig:
    api_key:     str = ""                                # Required — PINECONE_API_KEY
    index_name:  str = "cognity-ai"
    environment: str = "us-east-1-aws"
    namespace:   str = ""                                # Optional namespace within index

Usage Example

Mix and match providers freely. The example below uses OpenAI for embeddings and generation with Qdrant as the vector store:

Python

from cognity_ai import RAGLibrary
from cognity_ai.config import LibraryConfig, OpenAIConfig, QdrantConfig

config = LibraryConfig(
    embedder="openai",
    vector_store="qdrant",
    llm="openai",
    openai=OpenAIConfig(
        api_key="sk-...",
        generation_model="gpt-4o",
    ),
    qdrant=QdrantConfig(
        url="http://localhost:6333",
        collection_name="my_docs",
    ),
)

rag = RAGLibrary(config=config)

# Ingest documents
rag.ingest("./documents/")

# Query
result = rag.query("What are the key findings?")
print(result.answer)

💡

Tip You can also pass provider keys via environment variables (e.g. OPENAI_API_KEY, GEMINI_API_KEY) and omit api_key from the config entirely. cognity-ai reads these automatically when the field is left as its empty-string default.

For a fully offline setup using Ollama and Chroma:

Python

from cognity_ai import RAGLibrary
from cognity_ai.config import LibraryConfig, OllamaConfig, ChromaConfig

config = LibraryConfig(
    embedder="ollama",
    vector_store="chroma",
    llm="ollama",
    graph_store="in_memory",
    ollama=OllamaConfig(
        base_url="http://localhost:11434",
        generation_model="llama3",
    ),
    chroma=ChromaConfig(persist_directory="./.cognity_ai_data"),
)

rag = RAGLibrary(config=config)