Agentic Engineering with Codex Hard Learned Lessons

Basil Chatha

Basil Chatha

· 10 min read
Everyone's vibe coding. Almost no one's shipping. There's a reason the engineers seeing the biggest AI productivity gains were already the best engineers - it's because LLM access doesn't replace engineering judgment. This guide breaks down Vibe Engineering: a disciplined methodology for building real software with OpenAI Codex. You'll get the exact CLI flags to run, how to set up MCP servers so your LLM always has live API docs, the VIBE method for decomposing any project without losing control, and the 9-line AGENTS.md guardrail file that prevents 90% of mistakes.

💡 This tutorial will walk you through best practices for “Vibe Engineering” with OpenAI’s Codex product. I’ll walk you through setting it up with the best settings, a methodology for how to think about Vibe Engineering for any project, and link resources for digging in further yourself.

Note: these instructions are assuming you use Mac (sorry Windows users, you’re on your own).

Who am I?

Image

I’m Basil Chatha - I was a PM at Intuit Credit Karma, worked at a venture studio for a bit, tried to acquire a (non-tech) business, then started an AI consulting company building image GenAI for film/gaming studios, and enterprise retailers using ComfyUI and (later) voice agents to automate call centers for the finance industry. Now, I’m an FDE and have been building enterprise GenAI applications for PE portfolio companies in insurance, healthcare, fintech, and more.

Connect with me on Linkedin and Twitter.

Setup the Codex CLI

→ Official documentation: https://developers.openai.com/codex/cli/

TL;DR

  1. Install Homebrew (if you haven’t already): https://brew.sh/
  2. Install Node: brew install node
  3. Install: npm install -g @openai/codex | Upgrade (if you already have it installed): npm install -g @openai/codex@latest
  4. Best codex setup: echo 'alias codex="codex --search --model=gpt-5.4 -c model_reasoning_effort="xhigh" --sandbox workspace-write -c sandbox_workspace_write.network_access=true"' >> ~/.zshrc
Image

Vibe Coding vs Agentic Engineering

ImageImage

I like to call out the difference between “Vibe Coding” and “Agentic Engineering” because I’ve noticed a lot of non-technical people view vibe coding as some sort of panacea that lets them do the work of senior software engineers without any technical background. That is just not true. There’s a reason you’ve been seeing articles reporting that the engineers who are seeing the biggest lifts in productivity were already the best ones (ie the ones who had a solid foundation to build on).

Vibe coding implies you can just 1-shot a beautiful prompt into a complex application. You may be able to do it once, but that’s just luck - and engineering is not supposed to rely on luck. The idea behind Vibe Engineering is that you want to break down a project into it’s subcomponents and methodically build each of those pieces before putting them together into the final product.

💡 This distinction between vibe engineering and vibe coding is what separates men from boys.

Example

For example, if you want to build a simple web application that takes a pdf and turns it into a png, you have at least two components that need to be built: a frontend UI, and a backend with one API endpoint that converts the pdf into a png and returns it to the user. If I were vibe coding this project, I would first focus on building the endpoint (giving the LLM context of the full project so it keeps other factors in mind as you build out other features) and THEN build out the frontend. Building each feature separately allows you to isolate bugs much more quickly than if you try to 1-shot the entire project at once, which gets much harder to do well the more complex your project gets.

Image

Setup MCPs

First, what are MCPs?

The Model Context Protocol (MCP) is an open standard that allows large language models (LLMs) to connect with external tools and data sources. This gives AI systems the ability to access real-time information, perform actions, and automate complex tasks in a standardized way. Think of MCP as a universal adapter or a "USB-C for AI”

Before MCP: The "N x M" problem

Before the creation of MCP in late 2024, developers faced significant challenges when integrating AI applications with external tools. This was often called the "N x M" problem, where 'N' is the number of AI models and 'M' is the number of tools, and a custom, one-off connection was needed for nearly every combination.

Here is what developers had to do before MCP:

  • Create custom API integrations: For every external service (e.g., a database, an email service, or a CRM tool), developers had to write custom, model-specific code to handle API requests and responses.
  • Manage fragmented integrations: The lack of a uniform standard meant that every new data source required a custom implementation. This was time-consuming, expensive, and difficult to scale.
  • Deal with limited ecosystems: Some platforms created their own proprietary "plugin" systems (like OpenAI's earlier framework). While effective within their ecosystem, these integrations were not interoperable with other AI models or third-party tools.
  • Rely on static, pre-trained data: Without easy, standardized access to real-time, external data, LLMs were limited to the information they were trained on. This made them prone to providing outdated information or "hallucinating" plausible-but-incorrect facts.

How MCP makes things better for developers:

MCP simplifies the integration process and unlocks a wide range of capabilities for LLMs, making them more powerful and useful in real-world scenarios.

  • Easier, reusable integrations: MCP replaces fragmented integrations with a single, open protocol. Once a developer builds an MCP "server" for a tool, any AI application with an MCP "client" can connect to it and use its capabilities.
  • Lower development costs: By providing a common standard, MCP reduces the time, cost, and effort of integrating AI systems with external tools and data sources.
  • Enhanced security: The protocol defines how developers can implement authentication and permission controls, ensuring secure, two-way connections between AI agents and data

Best MCPs to use?

💡 In general, lots of the latest tooling companies have already built their own MCP that you can set up pretty easily. Just search for your favorite tool in Google followed by “MCP” and that will take you far.

My recommendation is to start with just one MCP to get used to it before adding a bunch more to your setup. And if there’s only one MCP you should start with, it should be one that automatically pulls in the latest API documentation into context for your LLM to read before implementing it. Before I found out about the MCPs I’m going to lay out to you below, I used to manually find the relevant documentation for a feature I was vibe coding, and tell the LLM to “read/understand it thoroughly before implementing.” It is much better to have the MCP automatically pull in the freshest, most relevant documentation for your LLM before it implements an API.

💡 NOTE: Just use EITHER Context7 OR exa-code. I used to use Context7 but have recently switched to exa-code.

Context7

Official installation guide here: https://github.com/upstash/context7

  1. Get an API key from https://context7.com/
  2. cursor ~/.codex/config.toml
  3. Add:
[mcp_servers.context7]
command = "npx"
args = ["-y", "@upstash/context7-mcp", "--api-key", "YOUR-API-KEY"]

Exa-Code

Official installation guide here: https://docs.exa.ai/reference/exa-mcp

  1. Get an API key by signing up for Exa: https://exa.ai/
  2. cursor ~/.codex/config.toml
  3. Add:

[mcp_servers.exa]
command = "npx"
args = ["-y", "exa-mcp-server", "--tools=get_code_context_exa"]

[mcp_servers.exa.env]
EXA_API_KEY = "YOUR_API_KEY"
💡 Context: I used to use Context7, but a lot of my really good engineering friends recently told me they’ve been using exa-code and love it so I switched and have liked it so far. They both do similar things but Exa will search thousands of repos for latest best practices before implementing stuff (or at least that’s the promise) vs Context7 just pulls in the latest documentation so I like the promise of exa-code more. Either will get the job done.

Setup Custom Prompts

  1. git clone [https://github.com/wshobson/agents.git](https://github.com/wshobson/agents.git)
  2. cp -R ./agents/commands/tools ~/.codex/prompts
  3. (if you haven’t installed ripgrep and/or sd) brew install ripgrep sd
  4. rg -lU --null -e '---\nmodel: (claude-sonnet-4-0|claude-opus-4-1)\n---\n' | xargs -0 sd -- '---\nmodel: (claude-sonnet-4-0|claude-opus-4-1)\n---\n' ''
  5. rg -l --null -e 'claude' | xargs -0 sd -- '\bclaude\b' 'codex'

When you run codex next, you’ll see the custom prompts populate when you type '/prompts':

Example

“Explain this codebase to me” - try this prompt on its own in codex, and then try the same prompt but use the /prompts:code-explain custom prompt and compare the outputs.

The VIBE Method

  1. Verbalize:
    1. First, you describe your app idea using LLMs and the RICE-Q method to make your instructions clear:
      1. Role: Tell the AI its role (e.g., “You are a senior app developer”).
      2. Instruction: Say what you want.
      3. Context: Give background details.
      4. Examples: Show what you mean.
  2. Instruct:
    1. Next, the AI makes a detailed plan and task list.
  3. Build
    1. LLMs build according to the plan generated in the “Instruct” step
  4. Evaluate:
    1. Test, Deploy, and Reflect with LLMs (w/ access to MCP + tools)
      1. Test the app, deploy it, give feedback to the LLM, and reflect on what you learned.
      2. Ask the LLM to write tests for the code and refine it based on the results.
      3. (optional) Ask the LLM to critique the code it wrote and to make your code more performant
💡 The key is that you need to run the VIBE method on each piece of your project separately, and then bring the pieces together. As mentioned before, this ensures you (and your LLM) can fix bugs easier and you have better control over the overall architecture than if you try to 1-shot the entire thing at once.

Example

Image

AGENTS.md Guardrails

“9 lines that prevent 90% of mistakes” (h/t Jason Liu)
# Use uv run, never python
# Prefer async over sync patterns
# Write at 9th grade level in documentation
# Avoid heavily mocking tests without user permission
# Update docs when code changes
# Never git add ., specify files
# Run linters/formatters before committing
# Type check before merging
# Run affected tests for changed files

Agents

What are they?

From OpenAI:

An agent is a large language model (LLM), configured with instructions and tools.

Multi-Agent System

In a multi-agent system, you have an “Orchestrator” LLM that acts as a puppet master and decides which agents to run to achieve its overall objective. Each subagent has it’s own specific objective. A couple benefits of multi-agent systems are that each subagent focuses on a specific role, but also gets its own context window to focus on that job whereas if you tried to have one agent do the whole thing, it would be worse (partially because it has more to do so is more likely to hallucinate, and partially because the larger the context you give an LLM, the worse the performance).

Example

You are building a login feature in your app. You tell your orchestrator agent to build the login feature, and the orchestrator kicks off two agents to build it: one is the backend engineer that builds the functionality, and the other is a security engineer that checks that code for vulnerabilities. Once both agents have completed their tasks, the orchestrator returns the output.

Lab: Receipt Invoicing

ImageImage

Plan

  1. Create conda environment
  2. Set up agents.md
  3. “I have three different categories of receipts in the receipts folder, saved as pdfs. Read the total amount, the date, and the receipt_id of each receipt and put it into a csv file and send it to my boss. Think hard before implementing anything.”
Basil Chatha

About Basil Chatha

Basil was previously a Product Manager at Intuit Credit Karma before starting an AI consulting firm focused on building image / voice agents for big-box retailers, insurance companies, and private equity firms.