LibraryConfig

The top-level configuration dataclass. Pass an instance to RAGLibrary(config=...) to control every aspect of the pipeline. All fields have sensible defaults so you only need to override what you care about.

Python
from cognity_ai.config import LibraryConfig

@dataclass
class LibraryConfig:
    # Pipeline component selectors
    rag_method:   str = "hybrid_graph"
    chunker:      str = "sentence"
    embedder:     str = "gemini"
    vector_store: str = "chroma"
    graph_store:  str = "neo4j"
    llm:          str = "gemini"
    extraction:   str = "hybrid"
    ocr:          str = "gemini_vision"
    page_index:   str = "hybrid"

    # Provider sub-configs
    gemini:       GeminiConfig      = field(default_factory=GeminiConfig)
    openai:       OpenAIConfig      = field(default_factory=OpenAIConfig)
    anthropic:    AnthropicConfig   = field(default_factory=AnthropicConfig)
    azure_openai: AzureOpenAIConfig = field(default_factory=AzureOpenAIConfig)
    bedrock:      BedrockConfig     = field(default_factory=BedrockConfig)
    cohere:       CohereConfig      = field(default_factory=CohereConfig)
    ollama:       OllamaConfig      = field(default_factory=OllamaConfig)
    vertex_ai:    VertexAIConfig    = field(default_factory=VertexAIConfig)
    neo4j:        Neo4jConfig       = field(default_factory=Neo4jConfig)
    chroma:       ChromaConfig      = field(default_factory=ChromaConfig)
    qdrant:       QdrantConfig      = field(default_factory=QdrantConfig)
    pinecone:     PineconeConfig    = field(default_factory=PineconeConfig)

Fields

Field Type Default Description
rag_method str "hybrid_graph" RAG pipeline strategy. Options: "naive", "vector", "graph", "hybrid_graph".
chunker str "sentence" Text chunking strategy. See Chunkers for all options.
embedder str "gemini" Embedding provider key. Options: "gemini", "openai", "cohere", "ollama", "bedrock", "vertex_ai".
vector_store str "chroma" Vector store backend. Options: "chroma", "qdrant", "pinecone", "milvus", "pgvector", "weaviate".
graph_store str "neo4j" Graph database backend. Options: "neo4j", "in_memory".
llm str "gemini" Language model provider for generation. Options: "gemini", "openai", "anthropic", "azure_openai", "bedrock", "cohere", "ollama", "vertex_ai".
extraction str "hybrid" Entity/relationship extraction mode. Options: "llm", "spacy", "hybrid".
ocr str "gemini_vision" OCR backend for image-heavy documents. Options: "gemini_vision", "aws_textract", "google_vision", "tesseract".
page_index str "hybrid" Page-level indexing strategy. Options: "dense", "sparse", "hybrid".
gemini GeminiConfig GeminiConfig() Gemini provider settings (API key, models, temperature).
openai OpenAIConfig OpenAIConfig() OpenAI provider settings.
anthropic AnthropicConfig AnthropicConfig() Anthropic Claude provider settings.
azure_openai AzureOpenAIConfig AzureOpenAIConfig() Azure OpenAI deployment settings.
bedrock BedrockConfig BedrockConfig() AWS Bedrock region and model settings.
cohere CohereConfig CohereConfig() Cohere provider settings.
ollama OllamaConfig OllamaConfig() Ollama local server settings.
vertex_ai VertexAIConfig VertexAIConfig() Google Vertex AI project and location settings.
neo4j Neo4jConfig Neo4jConfig() Neo4j connection settings (URI, credentials, database).
chroma ChromaConfig ChromaConfig() ChromaDB persistence directory and collection name.
qdrant QdrantConfig QdrantConfig() Qdrant server URL, API key, and collection settings.
pinecone PineconeConfig PineconeConfig() Pinecone index, environment, and namespace settings.

Provider Config Dataclasses

Each provider sub-config is a plain Python @dataclass with typed fields. Fields without defaults are required when that provider is active.

GeminiConfig

Used when embedder="gemini" or llm="gemini".

Python
@dataclass
class GeminiConfig:
    api_key:          str   = ""                        # Required — GEMINI_API_KEY
    embedding_model:  str   = "text-embedding-004"
    generation_model: str   = "gemini-2.0-flash"
    temperature:      float = 0.1

OpenAIConfig

Used when embedder="openai" or llm="openai".

Python
@dataclass
class OpenAIConfig:
    api_key:          str   = ""                          # Required — OPENAI_API_KEY
    embedding_model:  str   = "text-embedding-3-small"
    generation_model: str   = "gpt-4o"
    temperature:      float = 0.1

AnthropicConfig

Used when llm="anthropic". Anthropic models are generation-only; use a different provider for embeddings.

Python
@dataclass
class AnthropicConfig:
    api_key:     str   = ""                               # Required — ANTHROPIC_API_KEY
    model:       str   = "claude-3-5-sonnet-20241022"
    max_tokens:  int   = 4096
    temperature: float = 0.1

AzureOpenAIConfig

Used when llm="azure_openai" or embedder="azure_openai". All fields except api_version are required.

Python
@dataclass
class AzureOpenAIConfig:
    endpoint:             str = ""                        # Required — e.g. https://my.openai.azure.com/
    api_key:              str = ""                        # Required — AZURE_OPENAI_API_KEY
    api_version:          str = "2024-02-15-preview"
    deployment_name:      str = ""                        # Required — chat deployment
    embedding_deployment: str = ""                        # Required — embedding deployment

BedrockConfig

Used when llm="bedrock" or embedder="bedrock". Uses boto3 credentials from the environment.

Python
@dataclass
class BedrockConfig:
    region:           str = "us-east-1"
    embedding_model:  str = "amazon.titan-embed-text-v2:0"
    generation_model: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"

CohereConfig

Used when llm="cohere" or embedder="cohere".

Python
@dataclass
class CohereConfig:
    api_key:          str = ""                            # Required — CO_API_KEY
    embedding_model:  str = "embed-english-v3.0"
    generation_model: str = "command-r-plus"
    input_type:       str = "search_document"             # or "search_query" at query time

OllamaConfig

Used when llm="ollama" or embedder="ollama". Runs fully locally — no API key required.

Python
@dataclass
class OllamaConfig:
    base_url:         str = "http://localhost:11434"
    embedding_model:  str = "nomic-embed-text"
    generation_model: str = "llama3"

VertexAIConfig

Used when llm="vertex_ai" or embedder="vertex_ai". Requires ADC or service account credentials.

Python
@dataclass
class VertexAIConfig:
    project:          str = ""                            # Required — GCP project ID
    location:         str = "us-central1"
    embedding_model:  str = "text-embedding-005"
    generation_model: str = "gemini-1.5-pro"

Neo4jConfig

Used when graph_store="neo4j".

Python
@dataclass
class Neo4jConfig:
    uri:      str = "bolt://localhost:7687"
    user:     str = "neo4j"
    password: str = ""                                   # Required — NEO4J_PASSWORD
    database: str = "neo4j"

ChromaConfig

Used when vector_store="chroma". Data is persisted locally — no server required.

Python
@dataclass
class ChromaConfig:
    persist_directory: str = ".chroma"
    collection_name:   str = "cognity-ai"

QdrantConfig

Used when vector_store="qdrant". Supports both self-hosted and Qdrant Cloud.

Python
@dataclass
class QdrantConfig:
    url:             str            = "http://localhost:6333"
    api_key:         str | None    = None                  # Required for Qdrant Cloud
    collection_name: str            = "cognity-ai"
    on_disk:         bool           = True                  # Persist vectors to disk

PineconeConfig

Used when vector_store="pinecone".

Python
@dataclass
class PineconeConfig:
    api_key:     str = ""                                # Required — PINECONE_API_KEY
    index_name:  str = "cognity-ai"
    environment: str = "us-east-1-aws"
    namespace:   str = ""                                # Optional namespace within index

Usage Example

Mix and match providers freely. The example below uses OpenAI for embeddings and generation with Qdrant as the vector store:

Python
from cognity_ai import RAGLibrary
from cognity_ai.config import LibraryConfig, OpenAIConfig, QdrantConfig

config = LibraryConfig(
    embedder="openai",
    vector_store="qdrant",
    llm="openai",
    openai=OpenAIConfig(
        api_key="sk-...",
        generation_model="gpt-4o",
    ),
    qdrant=QdrantConfig(
        url="http://localhost:6333",
        collection_name="my_docs",
    ),
)

rag = RAGLibrary(config=config)

# Ingest documents
rag.ingest("./documents/")

# Query
result = rag.query("What are the key findings?")
print(result.answer)
💡
Tip You can also pass provider keys via environment variables (e.g. OPENAI_API_KEY, GEMINI_API_KEY) and omit api_key from the config entirely. cognity-ai reads these automatically when the field is left as its empty-string default.

For a fully offline setup using Ollama and Chroma:

Python
from cognity_ai import RAGLibrary
from cognity_ai.config import LibraryConfig, OllamaConfig, ChromaConfig

config = LibraryConfig(
    embedder="ollama",
    vector_store="chroma",
    llm="ollama",
    graph_store="in_memory",
    ollama=OllamaConfig(
        base_url="http://localhost:11434",
        generation_model="llama3",
    ),
    chroma=ChromaConfig(persist_directory="./.cognity_ai_data"),
)

rag = RAGLibrary(config=config)