Codex Experimentation Journal

Tonight I am continuing to experiment with Codex agents. I am using a workflow for building my iOS app where I take a few tasks I need to complete and hand them over. I have a bug when a user populates an exercise from memory, the text input shows autocomplete suggestions which is not desired in this case. Simultaneously I asked it to write unit tests for my login and register views.

Lets discuss the bug first. 10:57pm

Codex added a state flag to suppress filtering and made a few other changes which could be suspect at my first notion.

I am running a newer version of x code that is not yet supported by GitHub workflows so I cannot run tests in my GitHub PR. This forces me to pull the changes and run them locally.

Codex arrived at a solution for my bug about 5 minutes later. I opened a PR and pulled the changes to my local.

First round 5 unit test errors. Feed them back to codex 11:04pm.

Check on login and register views

Still working.. 11:08pm it finishes. Tests are broken.

After feeding errors back to codex for each task twice I am still looking at errors. Love. This is a feeling I’ve had before.

Lets fix the issues.

Looking back to the bug around autocomplete. 11:18pm - I fixed all the tests. I am new to writing Swift tests so my ability to spot import and configuration issues is still on the upswing.

The biggest issue with Codex for me at the moment is if I push changes back to it’s feature branch, I cannot get Codex to fetch the updates and continue on in that task…

So it doesn’t like me fixing it’s code.. 🙂

It breaks the workflow because now I have to open a new conversation and lose context from the work we just did.

Switching back over to Register and Login views. I fixed the tests and it is now 11:27pm

Learning swift unit test patterns is delightful. 11:48pm

GitHub cannot be reached….

Taking a step back - comparing tonight’s pace of work against last night’s is a stark difference. The iOS app is much more complex than the local voice assistant in terms of code.

I set out to complete two tickets. The bug and add unit tests to two existing views.

The bug was allegedly fixed with a passing test assertion. Let’s confirm with a manual QA. It works! Love that. I was not confident.

The Login and Register are still receiving an error in regards to the mocking/assertion of each relative HTTP request. I would rather tell Codex to do this and get a snack. Due to the collaboration issues on the git branch I’ll have to merge a few broken tests and have it fix them in the next task.

GitHub is back - 11:59pm

I’ve set the command… time for a little break. Back with Zevia in hand - Codex has yet again failed to fix the tests.

Why am I doing the leg work here Codex?

Feeling a bit tired I feed the errors back into the prompt, wait for a response, update the PR, pull the code locally, and run the tests. Still failing.

This was a fun experiment. Lets comment these test cases and get everything passing :)

They didn’t exist before so no harm no foul. Going on an hour and fifteen minutes for this session we accomplished 1/2 our goals. I will need to revisit the view unit tests again to make sure they are asserting the correct information and figure out the commented test cases.

In this scenario I am new to Swift and move a bit slower than I do in other languages. Codex likely helped me move a little faster in this case. I will say it abstracted away most of any learning I would have if doing it by hand. Interesting how that effect will definitely compound over time.

If I compared this session to myself in another language coding at a normal speed I would be disappointed.

A few critiques

Codex takes a few minutes to do a large analysis for each request. If there is a small one-line changes I’ll wait minutes for Codex and if I quickly make the change Codex cannot update the branch with my commits. 👎

I probably used more energy than value I created. A brain seems more efficient.

There are multiple instances where Codex adds a few extra changes that could introduce other new issues.

All this to support my idea that the feeling of joy as AI generates apps from scratch almost entirely replaced by circular error prompting as projects move away from MVP and towards real world production system sizes.

That is all the energy I have tonight.

Writing these experiences is enjoyable. If you are here - thanks for reading.

Jones Codes

Explorer

Codex Experimentation Journal

Graph View

Recent Posts

From LRU Cache to Distributed Systems: A Complete Guide to Caching in Modern Applications

Posts

Testing Distributed Systems: Beyond Unit Tests

Solving the Dual Write Problem - Transactional Outbox and Idempotency

Cache Invalidation in Microservices: The Hard Part