Agentic Engineering with Codex Hard Learned Lessons

💡 This tutorial will walk you through best practices for “Vibe Engineering” with OpenAI’s Codex product. I’ll walk you through setting it up with the best settings, a methodology for how to think about Vibe Engineering for any project, and link resources for digging in further yourself.

Note: these instructions are assuming you use Mac (sorry Windows users, you’re on your own).

Who am I?

I’m Basil Chatha - I was a PM at Intuit Credit Karma, worked at a venture studio for a bit, tried to acquire a (non-tech) business, then started an AI consulting company building image GenAI for film/gaming studios, and enterprise retailers using ComfyUI and (later) voice agents to automate call centers for the finance industry. Now, I’m an FDE and have been building enterprise GenAI applications for PE portfolio companies in insurance, healthcare, fintech, and more.

Connect with me on Linkedin and Twitter.

Setup the Codex CLI

→ Official documentation: https://developers.openai.com/codex/cli/

TL;DR

Install Homebrew (if you haven’t already): https://brew.sh/
Install Node: brew install node
Install: npm install -g @openai/codex | Upgrade (if you already have it installed): npm install -g @openai/codex@latest
Best codex setup: echo 'alias codex="codex --search --model=gpt-5.4 -c model_reasoning_effort="xhigh" --sandbox workspace-write -c sandbox_workspace_write.network_access=true"' >> ~/.zshrc

Vibe Coding vs Agentic Engineering

I like to call out the difference between “Vibe Coding” and “Agentic Engineering” because I’ve noticed a lot of non-technical people view vibe coding as some sort of panacea that lets them do the work of senior software engineers without any technical background. That is just not true. There’s a reason you’ve been seeing articles reporting that the engineers who are seeing the biggest lifts in productivity were already the best ones (ie the ones who had a solid foundation to build on).

Vibe coding implies you can just 1-shot a beautiful prompt into a complex application. You may be able to do it once, but that’s just luck - and engineering is not supposed to rely on luck. The idea behind Vibe Engineering is that you want to break down a project into it’s subcomponents and methodically build each of those pieces before putting them together into the final product.

💡 This distinction between vibe engineering and vibe coding is what separates men from boys.

Example

For example, if you want to build a simple web application that takes a pdf and turns it into a png, you have at least two components that need to be built: a frontend UI, and a backend with one API endpoint that converts the pdf into a png and returns it to the user. If I were vibe coding this project, I would first focus on building the endpoint (giving the LLM context of the full project so it keeps other factors in mind as you build out other features) and THEN build out the frontend. Building each feature separately allows you to isolate bugs much more quickly than if you try to 1-shot the entire project at once, which gets much harder to do well the more complex your project gets.

Setup MCPs

First, what are MCPs?

The Model Context Protocol (MCP) is an open standard that allows large language models (LLMs) to connect with external tools and data sources. This gives AI systems the ability to access real-time information, perform actions, and automate complex tasks in a standardized way. Think of MCP as a universal adapter or a "USB-C for AI”

Before MCP: The "N x M" problem

Before the creation of MCP in late 2024, developers faced significant challenges when integrating AI applications with external tools. This was often called the "N x M" problem, where 'N' is the number of AI models and 'M' is the number of tools, and a custom, one-off connection was needed for nearly every combination.

Here is what developers had to do before MCP:

Create custom API integrations: For every external service (e.g., a database, an email service, or a CRM tool), developers had to write custom, model-specific code to handle API requests and responses.
Manage fragmented integrations: The lack of a uniform standard meant that every new data source required a custom implementation. This was time-consuming, expensive, and difficult to scale.
Deal with limited ecosystems: Some platforms created their own proprietary "plugin" systems (like OpenAI's earlier framework). While effective within their ecosystem, these integrations were not interoperable with other AI models or third-party tools.
Rely on static, pre-trained data: Without easy, standardized access to real-time, external data, LLMs were limited to the information they were trained on. This made them prone to providing outdated information or "hallucinating" plausible-but-incorrect facts.

How MCP makes things better for developers:

MCP simplifies the integration process and unlocks a wide range of capabilities for LLMs, making them more powerful and useful in real-world scenarios.

Easier, reusable integrations: MCP replaces fragmented integrations with a single, open protocol. Once a developer builds an MCP "server" for a tool, any AI application with an MCP "client" can connect to it and use its capabilities.
Lower development costs: By providing a common standard, MCP reduces the time, cost, and effort of integrating AI systems with external tools and data sources.
Enhanced security: The protocol defines how developers can implement authentication and permission controls, ensuring secure, two-way connections between AI agents and data

Best MCPs to use?

💡 In general, lots of the latest tooling companies have already built their own MCP that you can set up pretty easily. Just search for your favorite tool in Google followed by “MCP” and that will take you far.

My recommendation is to start with just one MCP to get used to it before adding a bunch more to your setup. And if there’s only one MCP you should start with, it should be one that automatically pulls in the latest API documentation into context for your LLM to read before implementing it. Before I found out about the MCPs I’m going to lay out to you below, I used to manually find the relevant documentation for a feature I was vibe coding, and tell the LLM to “read/understand it thoroughly before implementing.” It is much better to have the MCP automatically pull in the freshest, most relevant documentation for your LLM before it implements an API.

💡 NOTE: Just use EITHER Context7 OR exa-code. I used to use Context7 but have recently switched to exa-code.

Context7

Official installation guide here: https://github.com/upstash/context7

Get an API key from https://context7.com/
cursor ~/.codex/config.toml
Add:

[mcp_servers.context7]
command = "npx"
args = ["-y", "@upstash/context7-mcp", "--api-key", "YOUR-API-KEY"]

Exa-Code

Official installation guide here: https://docs.exa.ai/reference/exa-mcp

Get an API key by signing up for Exa: https://exa.ai/
cursor ~/.codex/config.toml
Add:

[mcp_servers.exa]
command = "npx"
args = ["-y", "exa-mcp-server", "--tools=get_code_context_exa"]

[mcp_servers.exa.env]
EXA_API_KEY = "YOUR_API_KEY"

💡 Context: I used to use Context7, but a lot of my really good engineering friends recently told me they’ve been using exa-code and love it so I switched and have liked it so far. They both do similar things but Exa will search thousands of repos for latest best practices before implementing stuff (or at least that’s the promise) vs Context7 just pulls in the latest documentation so I like the promise of exa-code more. Either will get the job done.

Setup Custom Prompts

git clone [https://github.com/wshobson/agents.git](https://github.com/wshobson/agents.git)
cp -R ./agents/commands/tools ~/.codex/prompts
(if you haven’t installed ripgrep and/or sd) brew install ripgrep sd
rg -lU --null -e '---\nmodel: (claude-sonnet-4-0|claude-opus-4-1)\n---\n' | xargs -0 sd -- '---\nmodel: (claude-sonnet-4-0|claude-opus-4-1)\n---\n' ''
rg -l --null -e 'claude' | xargs -0 sd -- '\bclaude\b' 'codex'

When you run codex next, you’ll see the custom prompts populate when you type '/prompts':

Example

“Explain this codebase to me” - try this prompt on its own in codex, and then try the same prompt but use the /prompts:code-explain custom prompt and compare the outputs.

The VIBE Method

Verbalize:
1. First, you describe your app idea using LLMs and the RICE-Q method to make your instructions clear:
  1. Role: Tell the AI its role (e.g., “You are a senior app developer”).
  2. Instruction: Say what you want.
  3. Context: Give background details.
  4. Examples: Show what you mean.
Instruct:
1. Next, the AI makes a detailed plan and task list.
Build
1. LLMs build according to the plan generated in the “Instruct” step
Evaluate:
1. Test, Deploy, and Reflect with LLMs (w/ access to MCP + tools)
  1. Test the app, deploy it, give feedback to the LLM, and reflect on what you learned.
  2. Ask the LLM to write tests for the code and refine it based on the results.
  3. (optional) Ask the LLM to critique the code it wrote and to make your code more performant

💡 The key is that you need to run the VIBE method on each piece of your project separately, and then bring the pieces together. As mentioned before, this ensures you (and your LLM) can fix bugs easier and you have better control over the overall architecture than if you try to 1-shot the entire thing at once.

Example

AGENTS.md Guardrails

“9 lines that prevent 90% of mistakes” (h/t Jason Liu)

# Use uv run, never python
# Prefer async over sync patterns
# Write at 9th grade level in documentation
# Avoid heavily mocking tests without user permission
# Update docs when code changes
# Never git add ., specify files
# Run linters/formatters before committing
# Type check before merging
# Run affected tests for changed files

Agents

What are they?

From OpenAI:

An agent is a large language model (LLM), configured with instructions and tools.

Multi-Agent System

In a multi-agent system, you have an “Orchestrator” LLM that acts as a puppet master and decides which agents to run to achieve its overall objective. Each subagent has it’s own specific objective. A couple benefits of multi-agent systems are that each subagent focuses on a specific role, but also gets its own context window to focus on that job whereas if you tried to have one agent do the whole thing, it would be worse (partially because it has more to do so is more likely to hallucinate, and partially because the larger the context you give an LLM, the worse the performance).

Example

You are building a login feature in your app. You tell your orchestrator agent to build the login feature, and the orchestrator kicks off two agents to build it: one is the backend engineer that builds the functionality, and the other is a security engineer that checks that code for vulnerabilities. Once both agents have completed their tasks, the orchestrator returns the output.

Lab: Receipt Invoicing

Plan

Create conda environment
Set up agents.md
“I have three different categories of receipts in the receipts folder, saved as pdfs. Read the total amount, the date, and the receipt_id of each receipt and put it into a csv file and send it to my boss. Think hard before implementing anything.”

Agentic Engineering with Codex Hard Learned Lessons

Who am I?

Setup the Codex CLI

TL;DR

Vibe Coding vs Agentic Engineering

Example

Setup MCPs

First, what are MCPs?

Best MCPs to use?

Context7

Exa-Code

Setup Custom Prompts

Example

The VIBE Method

Example

AGENTS.md Guardrails

Agents

What are they?

Multi-Agent System

Example

Lab: Receipt Invoicing

Plan

About Basil Chatha