AI & Agents · Functions

Capabilities

Prompt engineering with evals.

Ground-truth eval sets authored before the agent ships. Iterate prompts and retrieval grounding against them, not vibes.

Retrieval grounding (RAG).

Schema retrieval, doc retrieval, and verified-dataset grounding to keep generations accurate and auditable inside an enterprise BI surface.

Multi-step orchestration.

Reasoning-plan generation, multi-step query execution, self-recovery from errors, structured output back into Lark Docs and webpages.

Self-improving flywheels.

Nightly batch testing, issue tagging, automated prompt refinement. Manual eval shrinks from days to hours; quality drifts up, not down.

Agent UX inside existing surfaces.

Agents live inside the BI tool sales reps already use — adoption follows the existing workflow rather than asking users to migrate.

Solo-built consumer agents.

Same toolkit applied to public products — Zentrading's multi-step reasoning over financial data, paid per analysis.

Evidence

Text-to-SQL Production Agent.

Authored a 100+ question ground-truth eval set, iterated prompts and grounding to lift accuracy from ~30% to ~90%. Self-improving flywheel — nightly batch testing, issue tagging, prompt refinement — cut manual eval time by ~85% and drove a further ~70% post-launch error reduction.

In-Depth Analytics Agent.

Native to ByteDance's internal AI-powered analytics platform. Generates a reasoning plan, runs multi-step queries across selected datasets, self-recovers from query errors, outputs polished Lark Docs and webpages.

Reporting Agents.

Daily and weekly reporting agents delivering segmented progress-to-plan digests to sales leadership across TikTok's ad-revenue business lines each morning.

Zentrading (solo).

Retail equity research agent — multi-step reasoning over financial data, paid per analysis. zentrading.space.

Masterzen + lineup (solo).

Seven AI products shipped in ~60 days using Claude Code as a build partner — Next.js + Supabase + Vercel stacks, Python agent layers where useful.

At scale

~30% → ~90% production accuracy via self-improving eval flywheel
~85% reduction in manual eval time post-flywheel
100+ question ground-truth eval set as the contract for quality
Multi-step analytics + reporting agents in production for a global ad-revenue org

Stack

Claude API · RAG retrieval · LLM orchestration · Python · SQL grounding · ground-truth evals · Next.js · Supabase · Vercel · Claude Code

Also shows up in

← All functions The products →