Catch up
It has been a moment since writing a blog post. I recently built out an iOS app for tracking my weightlifting and running. After programming on the project in my free time for two months, I needed a break. So my attention turned back to using my local machine’s gpu to explore the world of large language models (LLMs) and how they can be personalized with our own data privately.
Building a Local Knowledge Worker with LLMs
Before my dive into the iOS app, I was working on fine-tuning a local large language model (LLM) directly on my Linux desktop. My initial focus was on autonomous vehicles—experimenting with how self-driving car models are trained using real-world data.
But eventually, I shifted my attention toward large language models and how we can personalize them with our own data.
Round One: Manual Data Injection (and Failure)
My first attempt at personalizing an LLM involved manually injecting data into the model. I quickly realized this was more complex than expected and, honestly, didn’t get great results. But I wasn’t discouraged—I saw it as a learning opportunity and began planning a more structured second attempt.
Round Two: The Knowledge Worker
This time, I set out to build something I call a Knowledge Worker—a local, command-line-based assistant powered by a language model running entirely on my machine. The idea was simple but powerful: install a local LLM, ingest domain-specific documents (like PDFs), and fine-tune or customize the model to answer questions with high context-awareness.
The Stack
Here’s what I used to build it:
-
Model: LLaMA 2 7B Chat (Q2_K GGUF format), downloaded from Hugging Face
-
Model Loader: llama-cpp-python to run the model locally via CPU/GPU
-
Vector DB: ChromaDB for storing and retrieving embeddings
-
Embedding Models: Sentence Transformers
-
Tooling: LangChain for orchestration
-
Document Parsing: unstructured for extracting text from PDFs
-
Agents & Search: CrewAI and duckduckgo-search to enable real-time web research
The Workflow
Load a Model
I used llama-cpp-python to load my quantized LLaMA 2 model entirely offline.
Ingest Documents
I fed the app a few PDF documents—mostly training guides and exercise science research. Using LangChain and ChromaDB, I parsed and embedded the contents into a vector database.
Question Answering
Once indexed, I could query the system about specific training methodologies. For example, I asked it to summarize a trainer’s workout philosophy, and it returned detailed, contextually accurate answers.
Chaining with Research Agents
To go beyond static document analysis, I added CrewAI agents equipped with DuckDuckGo search. This allowed the assistant to combine internal knowledge (from PDFs) with fresh data from the web in a single response chain.
Why This Matters
Many want the power of AI without relying on external APIs or exposing sensitive data. A locally hosted model that can read internal docs, scan the web, and generate actionable insights could supercharge internal workflows, research, and decision-making—all while preserving privacy.