Software Components
Reusable low-level components in the implementation layer
Software components are the core building blocks of the GAIK implementation layer. Each component encapsulates one well-defined capability — speech-to-text, document parsing, structured extraction, classification, or retrieval. They are designed to be used standalone or composed into custom pipelines, giving developers precise control over every step of a knowledge processing workflow.
See also: No-Code Assets for prompt templates and packaged skills.
Use software components when you need fine-grained control over each processing step, when a predefined module doesn't fit your requirements, or when integrating GenAI capabilities into an existing system.
Transcriber
The Transcriber converts spoken audio or video recordings into written text. It uses OpenAI's Whisper model for accurate speech-to-text transcription and optionally applies a GPT-based post-processing step to clean up the raw output — fixing punctuation, removing filler words, and improving overall readability. It handles long recordings automatically through chunked processing, making it suitable for real-world audio captured in noisy or uncontrolled environments.
Key features:
- Whisper-based transcription with high accuracy across languages
- Optional GPT enhancement for clean, readable output
- Automatic chunking for recordings of any length
- Supports MP3, WAV, M4A, OGG, and video formats
Potential applications:
-
Workplace incident and safety observation reporting
-
Construction site and field service diaries
-
Meeting and interview transcription
-
Medical dictation and clinical notes
-
Customer call recording analysis
-
Lecture and training material capture
-
Implementation code: software_components/transcriber
-
Usage examples: examples/transcriber
Document Parser
The Document Parser converts PDF, DOCX, and other file types into clean, structured markdown text. It offers two parsing strategies: a vision-based parser that uses GPT to interpret page layouts, tables, and visual structure; and local parsers (PyMuPDF, Docling) that process documents without API calls. The parsed markdown output preserves tables, headings, and multi-page structure, making it a reliable input for downstream extraction or indexing.
Key features:
- Vision-based and local parsing options to suit different accuracy and cost requirements
- Preserves tables, headings, and document structure in markdown output
- Multi-page processing with consistent formatting
- Suitable for both simple forms and complex, visually rich documents
Potential applications:
-
Invoice, receipt, and purchase order digitization
-
Contract and legal document preprocessing
-
Technical manual and product specification extraction
-
Research paper and report indexing
-
Compliance document analysis
-
HR form and CV processing
-
Implementation code: software_components/parsers
-
Usage examples: examples/parsers
Extractor
The Extractor turns any unstructured text — transcripts, parsed documents, or free-form notes — into validated, structured records. What makes it distinctive is how extraction is configured: instead of writing code or defining a database schema, you describe what you need in plain language. A Requirement Parser interprets your field definitions, a Schema Generator builds a type-safe Pydantic model, and a Data Extractor uses an LLM to populate each field. The generated schema is saved and reused for future runs, eliminating the cost and latency of regenerating it each time.
Key features:
- Plain-language requirements replace manual schema definition
- Type-safe extraction with full Pydantic validation
- Supports constrained fields, allowed values, and conditional rules
- Schema persistence for efficient batch processing
Potential applications:
-
Incident and safety report field extraction from voice or text
-
Invoice and purchase order data capture
-
Contract clause and obligation extraction
-
Survey and form response structuring
-
Quality inspection data recording
-
HR and recruitment data extraction from CVs
-
Implementation code: software_components/extractor
-
Usage examples: examples/extractor
Document Classifier
The Document Classifier assigns a predefined label to a document based on its content. It uses an LLM to evaluate the document against a user-defined set of classes and returns both the predicted class and a confidence score. It is most useful as a preprocessing step in multi-document pipelines — routing each document to the appropriate extraction schema, storage location, or downstream process before any further processing begins.
Key features:
- Define any set of classes in plain language — no training required
- Returns confidence scores for review and threshold-based routing
- Works as a standalone step or as a gate before extraction
Potential applications:
-
Incoming document triage and routing in finance or legal workflows
-
Email and attachment categorization
-
Insurance claim and case type detection
-
Archive tagging and document library organization
-
Regulatory filing type identification
-
Preprocessing step for multi-schema extraction pipelines
-
Implementation code: software_components/doc_classifier
-
Usage examples: examples/classifier
RAG Components
The RAG (Retrieval-Augmented Generation) components provide a modular pipeline for building document-grounded question answering systems. Rather than a single monolithic tool, the RAG pipeline is split into five composable blocks — each with a clear responsibility. You can use individual blocks where needed or assemble the full pipeline for a complete Q&A system grounded in your own documents.
Pipeline overview:
1 · RAG Parser
Extracts text from source documents and splits it into manageable chunks with preserved metadata (page number, source file, section heading). The chunks serve as the unit of indexing and retrieval throughout the rest of the pipeline.
Example: A company safety manual (150 pages) is parsed into ~600 chunks. Each chunk carries its page number and section title so answers can be traced back to the source.
2 · Embedder
Converts each text chunk into a dense numerical vector using an embedding model. Semantically similar chunks produce similar vectors, enabling meaning-based search rather than keyword matching.
Example: The chunk "All incidents must be reported within 24 hours of occurrence" is embedded as a vector. A question like "What is the deadline for incident reporting?" produces a similar vector — enabling a match even though no keywords are shared.
3 · Vector Store
Stores and indexes all embeddings for fast similarity search at query time. The vector store persists the knowledge base between sessions, so documents only need to be processed once and can be queried repeatedly.
Example: A product documentation knowledge base is indexed once and reused across hundreds of daily queries without reprocessing the source documents.
Two backends are available: in-memory / ChromaDB for quick prototyping, and PostgreSQL + pgvector (PgVectorStore) for production use with semantic, keyword, and hybrid search (RRF).
4 · Retriever
Accepts a user question, embeds it, and searches the vector store for the most semantically relevant chunks. Optionally reranks results to improve precision before passing them to the answer generator.
Example: The question "What PPE is required in the assembly area?" retrieves the three most relevant policy chunks from a 200-page safety manual, even if the manual never uses the exact word "PPE".
5 · Answer Generator
Takes the retrieved chunks and the original question and uses an LLM to compose a coherent, factual answer. The response is grounded strictly in the retrieved content, and source citations are included so the answer can be verified.
Example: Given the retrieved policy chunks, the answer generator responds: "According to the Safety Manual (Section 4.2), employees in the assembly area must wear safety glasses, steel-toed boots, and high-visibility vests at all times."
Key features:
- Fully modular — use individual blocks or assemble the full pipeline
- Metadata-preserving chunking for accurate source attribution
- Optional reranking to improve retrieval precision
- Citation-aware answer generation for trustworthy, verifiable outputs
Potential applications:
-
Internal knowledge base and company policy Q&A
-
Technical documentation and product manual assistants
-
Regulatory and compliance document lookup
-
Customer support knowledge retrieval
-
Contract and legal clause search
-
Training material and onboarding knowledge assistants
-
Implementation code: software_components/RAG
-
Usage examples: examples/RAG
GAIK