Pre-Ship AI-Assisted Review Planning

The Approach: Think Before You Prompt

A project I created is code complete—but as we know, code complete and “ship it” are two different things.

I’m using Codex Web for the actual reviews since it’s connected to my GitHub repo. But a sophisticated product needs more than “do a code review.”

My workflow involves two layers of AI collaboration:

Stage	Tool	Purpose
1. Design the reviews	ChatGPT	Structure what needs to be examined
2. Write the prompts	ChatGPT	Convert structure into Codex-ready instructions
3. Execute	Codex Web	Run against the actual codebase

This look before you leap approach increases outcome quality by involving AI in planning, not just execution.

A) Engineering Quality Reviews

1. Architecture & Boundaries Review

Are modules separated cleanly? Any leaky abstractions between agents ↔ LLM ↔ Kanban?
Is the “single source of truth” clear? (config, roles, prompts, traces)
Any cyclic dependencies or import-order landmines?

2. Governance / Ratchet Enforcement Review

Can approved artifacts be overwritten via edge cases? (delete/recreate, rename, symlink, unstaged edits)
Does Ralph have any path to mutate tests indirectly?
Are governance rules enforced consistently across webhook + orchestrator polling paths?

3. Correctness & Idempotency Review

Spawner idempotency + flood control: prove no duplicate children under retries/races
Webhook handler idempotency: duplicate events won’t double-run agents
“Exactly once” vs “at least once” behavior documented and safe

4. Error Handling & Recovery Review

Failure modes: provider outage, JSON repair failure, Kanboard API fail, git fail?
Are retries bounded and safe?
Dead-letter / quarantine behaviors clear?

5. Performance & Scaling Review

Profiling tool sanity: does it measure the right things?
Hot paths: trace parsing, compression, JSON repair, LLM call loops
Latency budgets by agent phase, and obvious bottlenecks

6. Test Suite Quality Review

Are tests meaningful or just mocking the happy path?
Coverage of “bad paths” (timeouts, malformed JSON, missing env, Kanban rejects)
Is there any flakiness risk (time, filesystem, subprocess)?

B) Operational Readiness Reviews

7. Observability Review

Logs: are important fields always present? correlation IDs consistent?
Traces: do they contain enough to reproduce issues? (raw output, prompt pack hash, provider metadata)
Monitoring dashboards: are their recommendations sane and not noisy?

8. Configuration & Deployability Review

config/llm.yaml schema clarity + validation
.env.example completeness + secure defaults
“doctor/health” commands give actionable output
Upgrade path: backward compatibility, migration notes

9. Release Engineering Review

Branch / PR hygiene: changelog, versioning, tags/releases
CI suggestions: minimal workflows to run tests/lint on PR
Reproducible dev setup instructions

C) Security Reviews

10. Secrets & Data Leakage Review

Do traces/logs ever leak secrets (keys, headers, env vars)?
Does config hashing truly exclude secrets everywhere?
Are prompts/responses potentially sensitive? How is redaction handled (or intentionally not handled)?

11. Supply Chain & Dependency Review

New deps: pyyaml, requests, anything else — pinned? minimal?
Any risky subprocess calls?
Any shell injection surfaces?

12. Threat Model Review (Practical)

Attacker Vector	Controls to Validate
Malicious Kanban card content	`syntax_guard`, JSON repair, file path guards
Compromised provider output	git guards, command validation
Local user on box	workspace boundaries, allow-list enforcement

13. LLM Safety-Integration Review

Prompt injection resistance across phases (design → plan → tests → code)
Are untrusted inputs ever used to generate commands/paths?
Are there “deny lists” or “allow list” boundaries (e.g., only write within workspace)?

D) Documentation Reviews

14. User Documentation Review

Can a new user install + run end-to-end from README alone?
Missing prerequisites? (Kanboard plugin assumptions like MetaMagik)
Clear “happy path” + troubleshooting sections?

15. Developer Documentation Review

Architecture diagram current?
Where to add a new provider? a new agent? a new phase?
Prompt-pack strategy (even if not implemented yet) clearly anticipated?

16. Runbook / Ops Docs Review

What to do when: provider down, Kanboard rejects, spawner dupe risk, health fails
How to read traces & monitor outputs
How to safely retry / resume

E) Product / UX Reviews (Lightweight)

17. Workflow UX Review

Kanban column naming: stable? documented?
Does the system behave predictably when humans move cards “wrong”?
Is feedback surfaced back into cards in a usable way?

18. Demo Story Review

Is there a canonical “showcase epic” that exercises every stage?
Are there screenshots / commands / expected outputs?

Codex Execution Bundles

For running these as separate review tasks in Codex:

Bundle	Reviews	Focus
Bundle 1	Architecture + Governance + Idempotency	Structural integrity
Bundle 2	Observability + Config/Deployability + Docs	Operational clarity
Bundle 3	Secrets + Threat Model + Injection Surfaces	Security posture
Bundle 4	Tests + Performance/Scaling	Quality & resilience

Code Review Strategy Before Shipping