Skip to main content

On This Page

AI vs. Manual Code Review: Implementing the Two-Pass Engineering Workflow

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

AI Code Review vs Manual Review - When to Use Each (2026)

Modern engineering teams face a critical bottleneck where human attention is finite relative to the high volume of code changes produced. Research from Google and Microsoft shows developers spend up to 12 hours weekly on reviews, with pull requests often waiting 48 hours for initial human feedback. In 2025, 73% of developers cited these wait times as the single biggest friction point in the development lifecycle.

Why This Matters

Technical debt and merge conflicts compound when manual reviews stall, yet AI alone cannot understand organizational context or architectural foresight. The ideal model involves a two-pass system where AI handles mechanical validation—such as null safety and security patterns—allowing senior engineers to focus on high-level design and domain requirements. Failure to balance these leads to either crippling delivery delays or the accumulation of brittle, poorly-architected systems.

Key Insights

  • Google internal research (2025) confirms developers spend 6 to 12 hours per week reviewing pull requests, representing a massive investment in human capital.
  • Microsoft studies reveal pull requests wait an average of 24 to 48 hours for a first human response, significantly stalling deployment velocity.
  • LLM-based tools like CodeRabbit and GitHub Copilot perform semantic reasoning to identify logic errors such as overwritten discount conditions or missing null checks.
  • Rule-based static analysis via tools like Semgrep provides deterministic detection of security vulnerabilities like open redirects or unsafe YAML loading.
  • SmartBear research indicates human review effectiveness drops significantly after 400 lines of code, making AI essential for maintaining consistency in large diffs.
  • A two-pass workflow—AI for mechanical issues and humans for architecture—typically reduces total review cycle time by 30% to 50%.

Working Examples

Example of code with semantic errors: missing null checks on database results and buggy discount overwriting logic.

async function processPayment(orderId: string, amount: number) {
  const order = await db.orders.findById(orderId);
  const user = await db.users.findById(order.userId);
  let finalAmount = amount;
  if (user.tier === 'premium') {
    finalAmount = amount * 0.85;
  }
  if (user.referralCount > 5) {
    finalAmount = amount * 0.90;
  }
  const charge = await paymentGateway.charge(user.paymentMethod, finalAmount);
  await db.orders.update(orderId, { status: 'paid', chargeId: charge.id });
  return { success: true, chargeId: charge.id };
}

A Semgrep rule used for deterministic detection of missing null checks in database results.

rules:
- id: unchecked-db-result
  patterns:
  - pattern: |
      $RESULT = await $DB.$METHOD(...);
      ...
      $RESULT.$FIELD
  - pattern-not: |
      $RESULT = await $DB.$METHOD(...);
      ...
      if ($RESULT) { ... }
  message: "Database result used without null check"
  severity: WARNING
  languages: [typescript, javascript]

A performance anti-pattern where an N+1 query is hidden within clean, readable Python code.

def get_team_activity(team_id: str) -> list[ActivityItem]:
  team = db.get_team(team_id)
  members = db.get_team_members(team_id)
  activity = []
  for member in members:
    user_activity = db.get_user_activity(member.user_id) # N+1 query
    for item in user_activity:
      if item.created_at > team.last_review_date:
        activity.append(item)
  return sorted(activity, key=lambda x: x.created_at, reverse=True)

Practical Applications

  • Automated First Pass: Use CodeRabbit or Copilot to catch mechanical bugs like N+1 queries or missing null checks before human review. Pitfall: Treating AI as a total replacement leads to architectural debt and missing business requirements.
  • Security Scanning: Deploy Semgrep or DeepSource to detect known patterns like SQLi or unsafe deserialization (e.g., yaml.load). Pitfall: Running too many overlapping tools generates redundant noise and developer fatigue.
  • Low-Risk Automation: Implement AI-only approvals for dependency updates (Dependabot), formatting (Prettier), and documentation. Pitfall: Failing to configure tool-specific instruction files like .coderabbit.yaml leads to irrelevant style comments.

References:

Continue reading

Next article

Scaling AWS VPCs: Architecture Patterns for Multi-Account Environments

Related Content