Software Components

Software components are the core building blocks of the GAIK implementation layer. Each component encapsulates one well-defined capability — speech-to-text, document parsing, structured extraction, classification, or retrieval. They are designed to be used standalone or composed into custom pipelines, giving developers precise control over every step of a knowledge processing workflow.

See also: No-Code Assets for prompt templates and packaged skills.

Use software components when you need fine-grained control over each processing step, when a predefined module doesn't fit your requirements, or when integrating GenAI capabilities into an existing system.

Transcriber

The Transcriber converts spoken audio or video recordings into written text. It uses OpenAI's Whisper model for accurate speech-to-text transcription and optionally applies a GPT-based post-processing step to clean up the raw output — fixing punctuation, removing filler words, and improving overall readability. It handles long recordings automatically through chunked processing, making it suitable for real-world audio captured in noisy or uncontrolled environments.

Key features:

Whisper-based transcription with high accuracy across languages
Optional GPT enhancement for clean, readable output
Automatic chunking for recordings of any length
Supports MP3, WAV, M4A, OGG, and video formats

Potential applications:

Workplace incident and safety observation reporting
Construction site and field service diaries
Meeting and interview transcription
Medical dictation and clinical notes
Customer call recording analysis
Lecture and training material capture
Implementation code: software_components/transcriber
Usage examples: examples/transcriber

Document Parser

The Document Parser converts PDF, DOCX, and other file types into clean, structured markdown text. It offers two parsing strategies: a vision-based parser that uses GPT to interpret page layouts, tables, and visual structure; and local parsers (PyMuPDF, Docling) that process documents without API calls. The parsed markdown output preserves tables, headings, and multi-page structure, making it a reliable input for downstream extraction or indexing.

Key features:

Vision-based and local parsing options to suit different accuracy and cost requirements
Preserves tables, headings, and document structure in markdown output
Multi-page processing with consistent formatting
Suitable for both simple forms and complex, visually rich documents

Potential applications:

Invoice, receipt, and purchase order digitization
Contract and legal document preprocessing
Technical manual and product specification extraction
Research paper and report indexing
Compliance document analysis
HR form and CV processing
Implementation code: software_components/parsers
Usage examples: examples/parsers

Extractor

The Extractor turns any unstructured text — transcripts, parsed documents, or free-form notes — into validated, structured records. What makes it distinctive is how extraction is configured: instead of writing code or defining a database schema, you describe what you need in plain language. A Requirement Parser interprets your field definitions, a Schema Generator builds a type-safe Pydantic model, and a Data Extractor uses an LLM to populate each field. The generated schema is saved and reused for future runs, eliminating the cost and latency of regenerating it each time.

Key features:

Plain-language requirements replace manual schema definition
Type-safe extraction with full Pydantic validation
Supports constrained fields, allowed values, and conditional rules
Schema persistence for efficient batch processing

Potential applications:

Incident and safety report field extraction from voice or text
Invoice and purchase order data capture
Contract clause and obligation extraction
Survey and form response structuring
Quality inspection data recording
HR and recruitment data extraction from CVs
Implementation code: software_components/extractor
Usage examples: examples/extractor

Document Classifier

The Document Classifier assigns a predefined label to a document based on its content. It uses an LLM to evaluate the document against a user-defined set of classes and returns both the predicted class and a confidence score. It is most useful as a preprocessing step in multi-document pipelines — routing each document to the appropriate extraction schema, storage location, or downstream process before any further processing begins.

Key features:

Define any set of classes in plain language — no training required
Returns confidence scores for review and threshold-based routing
Works as a standalone step or as a gate before extraction

Potential applications:

Incoming document triage and routing in finance or legal workflows
Email and attachment categorization
Insurance claim and case type detection
Archive tagging and document library organization
Regulatory filing type identification
Preprocessing step for multi-schema extraction pipelines
Implementation code: software_components/doc_classifier
Usage examples: examples/classifier

RAG Components

The RAG (Retrieval-Augmented Generation) components provide a modular pipeline for building document-grounded question answering systems. Rather than a single monolithic tool, the RAG pipeline is split into five composable blocks — each with a clear responsibility. You can use individual blocks where needed or assemble the full pipeline for a complete Q&A system grounded in your own documents.

Pipeline overview:

1 · RAG Parser

Extracts text from source documents and splits it into manageable chunks with preserved metadata (page number, source file, section heading). The chunks serve as the unit of indexing and retrieval throughout the rest of the pipeline.

Example: A company safety manual (150 pages) is parsed into ~600 chunks. Each chunk carries its page number and section title so answers can be traced back to the source.

2 · Embedder

Converts each text chunk into a dense numerical vector using an embedding model. Semantically similar chunks produce similar vectors, enabling meaning-based search rather than keyword matching.

Example: The chunk "All incidents must be reported within 24 hours of occurrence" is embedded as a vector. A question like "What is the deadline for incident reporting?" produces a similar vector — enabling a match even though no keywords are shared.

3 · Vector Store

Stores and indexes all embeddings for fast similarity search at query time. The vector store persists the knowledge base between sessions, so documents only need to be processed once and can be queried repeatedly.

Example: A product documentation knowledge base is indexed once and reused across hundreds of daily queries without reprocessing the source documents.

Two backends are available: in-memory / ChromaDB for quick prototyping, and PostgreSQL + pgvector (PgVectorStore) for production use with semantic, keyword, and hybrid search (RRF).

4 · Retriever

Accepts a user question, embeds it, and searches the vector store for the most semantically relevant chunks. Optionally reranks results to improve precision before passing them to the answer generator.

Example: The question "What PPE is required in the assembly area?" retrieves the three most relevant policy chunks from a 200-page safety manual, even if the manual never uses the exact word "PPE".

5 · Answer Generator

Takes the retrieved chunks and the original question and uses an LLM to compose a coherent, factual answer. The response is grounded strictly in the retrieved content, and source citations are included so the answer can be verified.

Example: Given the retrieved policy chunks, the answer generator responds: "According to the Safety Manual (Section 4.2), employees in the assembly area must wear safety glasses, steel-toed boots, and high-visibility vests at all times."

Key features:

Fully modular — use individual blocks or assemble the full pipeline
Metadata-preserving chunking for accurate source attribution
Optional reranking to improve retrieval precision
Citation-aware answer generation for trustworthy, verifiable outputs

Potential applications:

Internal knowledge base and company policy Q&A
Technical documentation and product manual assistants
Regulatory and compliance document lookup
Customer support knowledge retrieval
Contract and legal clause search
Training material and onboarding knowledge assistants
Implementation code: software_components/RAG
Usage examples: examples/RAG

Software Components

Reusable low-level components in the implementation layer

See also: No-Code Assets for prompt templates and packaged skills.

Transcriber

Key features:

Whisper-based transcription with high accuracy across languages
Optional GPT enhancement for clean, readable output
Automatic chunking for recordings of any length
Supports MP3, WAV, M4A, OGG, and video formats

Potential applications:

Workplace incident and safety observation reporting
Construction site and field service diaries
Meeting and interview transcription
Medical dictation and clinical notes
Customer call recording analysis
Lecture and training material capture
Implementation code: software_components/transcriber
Usage examples: examples/transcriber

Document Parser

Key features:

Vision-based and local parsing options to suit different accuracy and cost requirements
Preserves tables, headings, and document structure in markdown output
Multi-page processing with consistent formatting
Suitable for both simple forms and complex, visually rich documents

Potential applications:

Invoice, receipt, and purchase order digitization
Contract and legal document preprocessing
Technical manual and product specification extraction
Research paper and report indexing
Compliance document analysis
HR form and CV processing
Implementation code: software_components/parsers
Usage examples: examples/parsers

Extractor

Key features:

Plain-language requirements replace manual schema definition
Type-safe extraction with full Pydantic validation
Supports constrained fields, allowed values, and conditional rules
Schema persistence for efficient batch processing

Potential applications:

Incident and safety report field extraction from voice or text
Invoice and purchase order data capture
Contract clause and obligation extraction
Survey and form response structuring
Quality inspection data recording
HR and recruitment data extraction from CVs
Implementation code: software_components/extractor
Usage examples: examples/extractor

Document Classifier

Key features:

Define any set of classes in plain language — no training required
Returns confidence scores for review and threshold-based routing
Works as a standalone step or as a gate before extraction

Potential applications:

Incoming document triage and routing in finance or legal workflows
Email and attachment categorization
Insurance claim and case type detection
Archive tagging and document library organization
Regulatory filing type identification
Preprocessing step for multi-schema extraction pipelines
Implementation code: software_components/doc_classifier
Usage examples: examples/classifier

RAG Components

Pipeline overview:

1 · RAG Parser

Example: A company safety manual (150 pages) is parsed into ~600 chunks. Each chunk carries its page number and section title so answers can be traced back to the source.

2 · Embedder

Converts each text chunk into a dense numerical vector using an embedding model. Semantically similar chunks produce similar vectors, enabling meaning-based search rather than keyword matching.

Example: The chunk "All incidents must be reported within 24 hours of occurrence" is embedded as a vector. A question like "What is the deadline for incident reporting?" produces a similar vector — enabling a match even though no keywords are shared.

3 · Vector Store

Example: A product documentation knowledge base is indexed once and reused across hundreds of daily queries without reprocessing the source documents.

Two backends are available: in-memory / ChromaDB for quick prototyping, and PostgreSQL + pgvector (PgVectorStore) for production use with semantic, keyword, and hybrid search (RRF).

4 · Retriever

Example: The question "What PPE is required in the assembly area?" retrieves the three most relevant policy chunks from a 200-page safety manual, even if the manual never uses the exact word "PPE".

5 · Answer Generator

Example: Given the retrieved policy chunks, the answer generator responds: "According to the Safety Manual (Section 4.2), employees in the assembly area must wear safety glasses, steel-toed boots, and high-visibility vests at all times."

Key features:

Fully modular — use individual blocks or assemble the full pipeline
Metadata-preserving chunking for accurate source attribution
Optional reranking to improve retrieval precision
Citation-aware answer generation for trustworthy, verifiable outputs

Potential applications:

Internal knowledge base and company policy Q&A
Technical documentation and product manual assistants
Regulatory and compliance document lookup
Customer support knowledge retrieval
Contract and legal clause search
Training material and onboarding knowledge assistants
Implementation code: software_components/RAG
Usage examples: examples/RAG

Transcriber

Document Parser

Extractor

Document Classifier

RAG Components

1 · RAG Parser

2 · Embedder

3 · Vector Store

4 · Retriever

5 · Answer Generator

On this page

Software Components

Transcriber

Document Parser

Extractor

Document Classifier

RAG Components

1 · RAG Parser

2 · Embedder

3 · Vector Store

4 · Retriever

5 · Answer Generator

On this page