Scroll finance TikTok or jump into a Facebook trading group and you’ll see it everywhere: EAs, bots, “AI” promising set-and-forget profits. It’s exciting… and a little sketchy. This series cuts through the noise and shows you a simple, no-code way to build your own Expert Advisors (EAs) using three public large language models (LLMs)—ChatGPT, Grok, and DeepSeek. You’ll describe your idea in plain English, plug it into a repeatable template, and have a model draft code you can actually test. You don’t need to be a coder. You do need a clear idea and a calm process. We’ll bring both.
Why this works now
The tooling finally caught up. Modern LLMs plus code helpers can take a one-page plan and spit out a reasonable first pass in Pine Script, MQL, or Python. They handle longer, structured instructions and return readable, commented code. No heavy installs, no servers—open your browser, paste your spec, iterate. The “secret” isn’t arcane syntax; it’s clarity. The clearer your description, the better the draft.
The template (aka how to talk to models)
Don’t think like a programmer; think like a coach writing a play. You define the behavior. The model translates it into code. Use this simple, repeatable template:
- Mission (one sentence): what the strategy tries to do.
- Inputs: the knobs (lookbacks, thresholds, size, max daily loss).
- Entry rules: short, numbered lines—no essays.
- Exit rules: profit target, stop logic, trailing behavior.
- Risk rules: daily shutdowns, position limits, per-trade risk.
- Test settings: commissions, slippage, session times, date range, symbols.
LLMs are pattern matchers. Give them a clean pattern, get consistent code. Garbage in, garbage out still applies.
Mini example you can steal:
Mission: Trade morning momentum on 5-minute bars with strict risk limits.
Inputs: lookback_high=12, lookback_low=12, risk_per_trade=0.5%, max_daily_drawdown=2%.
Entry rules:
- Long on a break above the first hour’s high if spread ≤ threshold.
- Short on a break below the first hour’s low if spread ≤ threshold.
Exit rules: - Initial stop = 1× ATR(14); trail at 1× ATR after price moves 1× ATR in favor.
- Profit target = 2× ATR; whichever hits first exits.
Risk rules: - Daily shutdown at −2% or +3% realized P&L.
- Only one position; no pyramiding.
Test settings:
– Commission $X/side; Slippage Y ticks; Session 09:30–16:00;
– Range 2022-01-01 → 2024-12-31; Symbols ES, NQ.
The two-model workflow (your sanity saver)
We start with Grok as our default code generator. Two things we always ask it to do before the code dump:
- Restate the rules in its own words. This catches misunderstandings early.
- Print an “Assumptions” header at the top of the script: costs, slippage, session times, order types, data window.
That little restatement step prevents big headaches. If Grok reads “break of the first hour’s high” as “close above the high,” we spot it instantly, tweak the spec, and regenerate. Clear spec → clean code.
Once we’ve got a draft, we don’t start poking random lines. We build a manual model first.
The manual model (ground truth, not vibes)
This is a tiny, human ledger of what should happen over a short, fixed date range. It includes:
- Exact bars for entries/exits,
- Direction and size,
- How the stop/target evolves,
- A rough P&L curve using the same costs and slippage you’ll use in backtests.
Spreadsheet, notes on a chart—whatever. The point is to have something objective to compare against. Then run Grok’s script on the same symbols and dates and line up the key bits: entry time/price/size, exit condition/price, running drawdown. If it matches your manual model, great—time to tune and paper trade. If it doesn’t, we get a second opinion.
Bring in ChatGPT (the friendly skeptic)
If results don’t line up, it’s usually a communication issue, not a “the code is cursed” situation. We paste the exact same spec into ChatGPT and ask for the same ritual:
- Restated rules,
- Fresh implementation,
- A short trade-by-trade log,
- A simple “decision trace” printed bar by bar (e.g., “range armed,” “spread too high—skip,” “ATR trail updated”).
Nine times out of ten, the mismatch is a tiny assumption: bar-close vs intrabar triggers, inclusive vs exclusive session edges, intrabar stops vs on-close stops, or a spread filter quietly nuking signals. With both versions in front of you, pick the one that matches the manual model best. That’s your base.
Where does DeepSeek fit? Use it as a third angle when you want long-context analysis, extra-verbose diagnostics, or a different coding style. It’s also great for rewriting comments to be crystal clear or reviewing others codes to verify if Grok or ChatGPT’s understanding aligns with your vision.
Clean it up and lock it down
Once you’ve chosen the version that matches reality, have the model tidy the code:
- Sharper variable names, consistent comments,
- Remove dead branches and unused inputs,
- Print a Settings block at runtime so every backtest is labeled
Reproducibility is everything. If you can’t recreate yesterday’s curve today, you can’t trust “improvements” tomorrow. A locked setup also makes collaboration easier—everyone sees the same assumptions in one place.
From backtest to live (gently)
When the code mirrors the manual model, move to forward testing on demo for a few days. You’re looking for drift:
- Fills: are live fills consistently worse than your assumptions?
- Timing: are orders sent when you think they are?
- Filters: are spread/volatility filters silently blocking trades?
Use the same two-model trick: have Grok and ChatGPT each print a one-line reason for every live trade and a note for every skipped signal. When the “why” is visible (“spread > threshold,” “session closed,” “no range break”), the fix is usually straightforward.
Speed bumps to avoid (you’ll thank yourself later)
- Vague triggers. “Breaks above” = tick-through or close-above? Say it.
- Session edges. Timezone, open/close, inclusive/exclusive. Spell it out.
- Intrabar stops. Do they trigger mid-bar or only at close? Decide.
- Costs/slippage. Use realistic numbers from day one; print them in the header.
- Data leakage. Don’t peek at future bars (classic off-by-one traps).
- Parameter bloat. Start lean. Add knobs only when they answer a real question.
Pro tip: ask the model to explain a losing day in plain English before it writes code. If the explanation sounds hand-wavy, your rules are still fuzzy.
What we’ll build next
Next up: a small EA that looks for hammer and shooting-star candlestick patterns and trades them. We’ll write a tight plain-English spec with the template above, generate code with Grok, validate against a short manual model, and run a basic backtest with explicit costs/slippage and a fixed date range. We’ll also add a simple decision trace so you can watch—bar by bar—why a setup was taken or skipped. By the end, you’ll have a tiny, reproducible project you can extend with volatility filters, session rules, or risk “circuit breakers” that pause trading after a rough patch.
The short version
You set the rules. The models do the typing. The manual model keeps everyone honest. With a clear template, a two-model cross-check, and a bias for reproducibility, you can go from idea to a trustworthy first draft—no coding background required. Then a few days of forward testing and readable logs tighten the gap between backtest and reality. That’s the rhythm we’ll use throughout the series: clarify → generate → verify → tidy → test. Keep assumptions visible. Let data—not vibes—decide the next edit.