So we are back. In my previous article I shared some improvements to my voice assistant connected to a local LLM running on my GPU.

I implemented metrics to track how quickly the supporting processes were running, added a RAG layer to improve prompting the LLM with custom data, and created a Flask server connected to a basic web UI for audio recording.

At the end of that session, I got a bit ambitious with Codex to see how well it would handle a larger implementation like the server and UI. The quality of its output matched the level of depth and specificity I provided—no more, no less.

Tonight, I’m considering redirecting the project back to a command-line tool to keep things simple and focus on the AI functionality. Looking back, it’s clear I had fun building during the last session, but I veered off course in the process.

voice assistant diagram

The goal of this project is to learn more about working with large language models. I had a lot of fun moving fast with Codex but rapid development without a clear cut plan often leads to impulsive decisions.

After a few adjustments, we’re back to a command-line tool. Additionally, I’m putting the voice processing on hold to focus on what I can do with the large language model.

I want this assistant to become a domain expert of sorts. That means I’ll need to gather information and train it. Looking more closely at the AI-generated code, I’m not even fully certain the RAG prompt and retrieval are working. So I’m digging in and coding it myself.

  • Checking to make sure the different components work
  • RAG prompt returned from the builder is just the text—not the data
  • Vector DB file exists from previous training
  • Ah, I need a conditional custom mode flag to inject RAG into the prompt
  • Adjusted my command to include the flag
  • RAG is now working and retrieving documents
  • Removed voice functionality
  • Tested the model and RAG combo after adding an informational document about my town to the vector DB
  • Great—the model is now working with the injected prompt and providing relevant responses

Now that I’ve had a minute to think about the direction of the project, I want to figure out the best way to put this functionality online. I’m excited to take that next step. Looks like the voice assistant is no longer a voice assistant.

I still think there’s value in making an LLM adapt to become a domain expert on specific data.

Local Voice Assistant V1

Local Voice Assistant V2