Skip to main content
spring internals

Spring Data Internals: Repository Proxy Generation, Query Derivation, and What Happens at Startup

7 min read Chapter 52 of 78

Spring Data Internals: Repository Proxy Generation, Query Derivation, and What Happens at Startup

You declare an interface. You extend JpaRepository. You inject it into a service. You call findById() and get an entity back. At no point did you write an implementation class.

Spring Data JPA did. At startup. Using the same proxy machinery covered in CH8.

This chapter traces exactly what happens between @EnableJpaRepositories and the first repository method invocation. The cost is not zero. Understanding it explains why your application takes 8 seconds to start and why a misspelled property name crashes the context before a single request arrives.

Spring Data repository proxy generation showing interface to JDK proxy with query method routing strategies

The Repository Has No Implementation. The Proxy Is the Implementation.

In the SaaS backend, OrderRepository looks like this:

public interface OrderRepository extends JpaRepository<Order, UUID> {
    List<Order> findByTenantIdAndStatus(UUID tenantId, OrderStatus status);
}

No class implements this interface. Yet Spring injects a fully functional object when you write:

@Service
public class OrderService {
    private final OrderRepository orderRepository;

    public OrderService(OrderRepository orderRepository) {
        this.orderRepository = orderRepository;
    }
}

Call orderRepository.getClass().getName() and you see:

jdk.proxy3.$Proxy142

That is a JDK dynamic proxy. The same mechanism from CH8-S1. The proxy implements OrderRepository and routes every method call through an InvocationHandler that knows how to dispatch to the correct implementation.

How the Proxy Gets Created

The sequence starts with @EnableJpaRepositories:

@SpringBootApplication
@EnableJpaRepositories(basePackages = "com.saas.order.repository")
public class OrderServiceApplication { }

Spring Boot auto-configuration applies this annotation implicitly. The annotation imports JpaRepositoriesRegistrar, which extends RepositoryBeanDefinitionRegistrarSupport. This registrar runs during bean definition scanning, before any beans are instantiated.

The registrar scans the specified packages for interfaces extending Repository (or any of its sub-interfaces: CrudRepository, JpaRepository, etc.). For each discovered interface, it registers a BeanDefinition with the bean class set to JpaRepositoryFactoryBean.

This is the FactoryBean pattern. The container does not instantiate JpaRepositoryFactoryBean and inject it directly. Instead, it calls getObject() on the factory bean, and the return value becomes the actual bean in the context. The return value is the proxy.

Inside JpaRepositoryFactoryBean.getObject(), the creation follows these steps:

  1. Create a JpaRepositoryFactory with the EntityManager
  2. The factory determines the repository base class: SimpleJpaRepository
  3. The factory creates a ProxyFactory (from Spring AOP, CH8)
  4. The factory adds method interceptors for query execution
  5. The factory produces a JDK dynamic proxy implementing the repository interface

The proxy has two layers of dispatch:

  • Standard CRUD methods (save, findById, deleteById, findAll): delegated to an instance of SimpleJpaRepository, which calls EntityManager directly.
  • Custom query methods (findByTenantIdAndStatus): intercepted by QueryExecutorMethodInterceptor, which routes to the appropriate query strategy.

SimpleJpaRepository: The Default Implementation You Never See

Every JpaRepository proxy delegates standard operations to SimpleJpaRepository<T, ID>. This is a concrete class with @Transactional annotations on its methods. When you call orderRepository.save(order), the proxy forwards the call to SimpleJpaRepository.save(), which calls entityManager.merge() or entityManager.persist() depending on whether the entity is new.

// Inside SimpleJpaRepository (simplified)
@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        entityManager.persist(entity);
        return entity;
    } else {
        return entityManager.merge(entity);
    }
}

The isNew() check uses the entity’s @Id field. If the ID is null (for object types) or 0 (for primitives), the entity is considered new. This is why using primitive long as an ID type causes subtle bugs: 0 is a valid value that Spring Data treats as “new.”

Query Derivation: Method Names Become JPQL

The findByTenantIdAndStatus method has no @Query annotation. Spring Data parses the method name into a query at startup.

The parser is PartTree. It splits the method name into:

  • Subject: find (also supports count, delete, exists)
  • Predicate: ByTenantIdAndStatus

The predicate is parsed into Part objects:

Part 1: tenantId (type: SIMPLE_PROPERTY)
Part 2: status (type: SIMPLE_PROPERTY)
Connector: AND

Each Part maps to a JPQL predicate clause. The generated query:

SELECT o FROM Order o WHERE o.tenantId = ?1 AND o.status = ?2

More complex method names produce more complex queries:

// findByTenantIdAndStatusOrderByCreatedAtDesc
SELECT o FROM Order o
WHERE o.tenantId = ?1 AND o.status = ?2
ORDER BY o.createdAt DESC

// countByTenantIdAndStatusNot
SELECT COUNT(o) FROM Order o
WHERE o.tenantId = ?1 AND o.status != ?2

// existsByTenantIdAndEmailContaining
SELECT CASE WHEN COUNT(o) > 0 THEN true ELSE false END
FROM Order o
WHERE o.tenantId = ?1 AND o.email LIKE ?2

The grammar supports: And, Or, Between, LessThan, GreaterThan, Like, In, Not, OrderBy, True, False, IsNull, IsNotNull, StartingWith, EndingWith, Containing, and more.

@Query: When Method Names Are Not Enough

Derived queries have limits. You cannot express subqueries, joins across multiple entities, or aggregations in a method name. For the SaaS backend’s reporting endpoint:

public interface OrderRepository extends JpaRepository<Order, UUID> {

    @Query("SELECT new com.saas.order.dto.OrderSummary(o.id, o.status, o.total, c.name) " +
           "FROM Order o JOIN o.customer c " +
           "WHERE o.tenantId = :tenantId AND o.createdAt >= :since")
    List<OrderSummary> findOrderSummaries(
        @Param("tenantId") UUID tenantId,
        @Param("since") Instant since
    );
}

At startup, Spring Data detects the @Query annotation and skips method name parsing. It validates the JPQL against the JPA metamodel. If the query references a non-existent entity or property, startup fails.

For native SQL:

@Query(value = "SELECT o.id, o.status, o.total FROM orders o " +
               "WHERE o.tenant_id = :tenantId AND o.created_at >= :since",
       nativeQuery = true)
List<Object[]> findOrderSummariesNative(
    @Param("tenantId") UUID tenantId,
    @Param("since") Instant since
);

Native queries skip JPQL validation. Errors surface at runtime.

The Startup Cost

Spring Data validates all repository queries at context initialization. This is deliberate. Every derived query method is parsed, converted to JPQL, and validated against the JPA metamodel. Every @Query annotation is parsed and validated (for JPQL, not native).

For a SaaS backend with 40 repository interfaces and 200 custom query methods, this adds measurable time to startup. The cost includes:

  1. Repository scanning: finding all interfaces extending Repository
  2. Proxy creation: creating ProxyFactory and registering interceptors for each repository
  3. Query parsing and validation: parsing every method name, constructing JPQL, validating against the metamodel
  4. JPA metamodel initialization: Hibernate processes all @Entity classes, builds the metamodel

This is a tradeoff. Slow startup catches broken queries before the application serves traffic.

The Failure Mode: Misspelled Property Name

// BROKEN: 'tenantID' does not match the entity field 'tenantId'
public interface OrderRepository extends JpaRepository<Order, UUID> {
    List<Order> findByTenantIDAndStatus(UUID tenantId, OrderStatus status);
}

The Order entity has a field named tenantId (lowercase ‘d’). The method name uses TenantID (uppercase ‘D’). PartTree tries to resolve property tenantID on the Order entity and fails.

Startup crashes:

org.springframework.data.mapping.PropertyReferenceException:
No property 'tenantID' found for type 'Order'.
Did you mean 'tenantId'?

This is one of the best error messages in the Spring ecosystem. It catches what would be a runtime NullPointerException or incorrect query result and turns it into a startup failure with a suggestion.

But it only works for derived queries. Misspell a property in a @Query JPQL string and you get a different error:

org.hibernate.query.SemanticException:
Could not resolve attribute 'tenantID' of 'com.saas.order.Order'

Misspell a column in a nativeQuery = true and you get nothing at startup. The error surfaces at runtime when the query executes.

The Correct Pattern

// CORRECT: property names match entity fields exactly
public interface OrderRepository extends JpaRepository<Order, UUID> {

    // Derived query: validated at startup
    List<Order> findByTenantIdAndStatus(UUID tenantId, OrderStatus status);

    // @Query for complex queries: JPQL validated at startup
    @Query("SELECT o FROM Order o WHERE o.tenantId = :tenantId " +
           "AND o.status IN :statuses ORDER BY o.createdAt DESC")
    List<Order> findByTenantIdAndStatuses(
        @Param("tenantId") UUID tenantId,
        @Param("statuses") Collection<OrderStatus> statuses
    );

    // Native query for performance-critical paths: NOT validated at startup
    // Must be covered by integration tests
    @Query(value = "SELECT o.* FROM orders o " +
                   "WHERE o.tenant_id = ?1 AND o.status = ?2 " +
                   "LIMIT ?3",
           nativeQuery = true)
    List<Order> findTopNByTenantAndStatus(
        UUID tenantId, String status, int limit
    );
}

Use derived queries for simple lookups. Use @Query with JPQL for anything involving joins, subqueries, or aggregations. Use native queries only when you need database-specific features or performance characteristics that JPQL cannot express. Always test native queries in integration tests.

The startup cost is the price of correctness. A 2-second slower startup that catches 50 broken queries before production is a trade worth making every time.