A recurring AI automation does not become trustworthy because it worked once. The first clean output proves that the workflow can run under one condition. It does not prove that the next file, client, reporting period, or source change will be safe to send without review.
A QA sampling plan gives the operator a simple rule for how much to inspect. It keeps review effort focused where the risk is highest: new workflows, changed inputs, public claims, money-related decisions, client deliverables, and outputs that the operator has started editing by hand.
Choose The Review Mode
Use three review modes. Do not start with random spot checks on a new workflow.
| Review mode | When to use it | What to inspect |
|---|---|---|
| Full review | First live runs, changed inputs, new client, public output, high-risk decision. | Every output, source file, claim, and final action. |
| Targeted review | Stable workflow with one changed field, prompt, template, or source. | The changed area plus a few known failure cases. |
| Spot check | Mature workflow with stable inputs and no recent incidents. | A small set of outputs plus stop-condition checks. |
The mode should move back to full review whenever the input contract changes, an incident occurs, or the output starts needing repeated manual correction.
Start With A Baseline Batch
Before reducing review, run a baseline batch. For a small solo-operator workflow, the batch does not need formal statistics. It needs enough variety to show the workflow can handle normal, messy, and stop-condition inputs.
Include:
- One clean input that should pass.
- One input with a missing optional field.
- One input with a missing required field.
- One input with an unusual but valid value.
- One input that should trigger human review.
- One input that should stop the workflow.
Keep the source input, generated output, review notes, and final decision together. This becomes the reference set for future prompt, script, or handoff changes.
Define The Sample Rule
Write the sample rule in plain language before the workflow runs again.
Workflow:
Owner:
Review mode:
Baseline examples stored:
First-run sample rule:
Stable-run sample rule:
Change-trigger sample rule:
Stop conditions:
Escalation owner:
Last review date:
Next review date:
Example rule:
Review every output for the first two production runs.
After two clean runs, review every fifth output and every output with missing fields.
Return to full review after any source schema change, unsupported claim, private-data exposure, or repeated manual correction.
This kind of rule is easier to follow than a vague instruction like “check quality occasionally.”
Track What Failed
Sampling only helps if failed checks become better operating rules. Use a short failure log:
| Failure | What it means | Next action |
|---|---|---|
| Source mismatch | The output used the wrong file, period, URL, or field. | Update the input contract and rerun full review. |
| Unsupported claim | The output made a statement that the source did not prove. | Add source requirements or remove the claim. |
| Private-data exposure | The input or output included data that should not be used. | Pause and redesign the access boundary. |
| Repeated manual edit | The operator keeps fixing the same section. | Update the prompt, script, template, or handoff. |
| Action risk | The workflow wants to send, publish, overwrite, or spend. | Require human review before action. |
If the same failure appears twice, do not treat it as a sampling miss. Treat it as a workflow design issue.
Lower Review Only When Evidence Supports It
Move from full review to targeted review when:
- The baseline batch is stored.
- The first live runs match the acceptance criteria.
- Known stop conditions stop instead of producing output.
- Manual edits are rare and explained.
- Source evidence is traceable.
- A rollback or manual fallback exists.
Move from targeted review to spot checks only after the workflow stays stable through normal variation. A private formatting helper can move faster. A public article draft, client report, affiliate recommendation, billing workflow, or customer message should stay stricter.
Copy This QA Sampling Checklist
Use this before marking a workflow ready for reduced review:
- Acceptance criteria are written.
- Baseline examples include clean, messy, and stop-condition inputs.
- Source evidence is stored with reviewed outputs.
- Sample rule is written in plain language.
- Review mode is named.
- Stop conditions are written.
- Repeated manual edits have an owner and fix path.
- Public, client-facing, money-related, or irreversible actions require review.
- The fallback process is documented.
- Next review date is set.
The sampling plan should make the workflow easier to operate, not easier to ignore. If the operator cannot explain why a reduced sample is acceptable, keep full review in place.
Related Operator Stack Pages
- Define pass/fail rules first with the AI automation acceptance criteria checklist.
- Keep approval proof in the AI automation evidence packet template.
- Rerun baseline examples with the AI automation regression test checklist after prompt or source changes.
- Watch for drift with the AI automation monitoring checklist.
- Pause risky runs with the AI automation human review threshold checklist.
- Keep source evidence reviewable with the AI workflow source log template.
- Apply the same discipline to reports with the AI spreadsheet report QA checklist.
- Record repeated failures in the AI automation exception log template.