Skip to content

Version 1 of failover plugin causing significant service slowdown #1733

@weronkagolonka

Description

@weronkagolonka

Describe the bug

We’ve upgraded the wrapper to version 3.1.0, which fixed the issue of accumulating database connections, thanks for that! However, we ran into a different problem, where failover plugin version 1 added a significant overhead to each database query.

Expected Behavior

Time taken to respond to a request takes nano/microseconds, not up to half a minute.

What plugins are used? What other connection properties were set?

failover (version1), efm2

Current Behavior

In about 30 minutes after deploying the applications that utilise the wrapper (Kotlin REST API services), we started noticing a significant latency increase, spiking up to 20-30 seconds even for a simple GET request. Once we added more tracing, we noticed that the overhead came from the wrapper itself:

Image Image

Since we’ve been explicitly declaring wrapperPlugins in the Hikari DataSource, we haven't been using the default version 2 of the failover plugin, but version 1. After seeing the traces, we tried to update it to version 2. Once we did that, the problem completely disappeared, and the services started serving request at normal latency.

Reproduction Steps

  • A Java/Kotlin Rest API application using Hikari DataSource for establishing connection with the database. Based on the example provided in the repo with properties we used in our application:
fun main() {
    val hikariConfig = HikariConfig()
    hikariConfig.jdbcUrl = "jdbc:aws-wrapper:postgresql://<svc-name>.<k8s-namespace>.svc.cluster.local:5432/<db-name>"
    hikariConfig.driverClassName = "software.amazon.jdbc.Driver"
    hikariConfig.username = "<db-username>"
    hikariConfig.password = "<db-password>"
    hikariConfig.maximumPoolSize = 3
    hikariConfig.exceptionOverrideClassName = "software.amazon.jdbc.util.HikariCPSQLException"
    hikariConfig.dataSourceProperties = Properties().apply {
        setProperty("ssl", "true")
        setProperty("sslmode", "require")
        setProperty("wrapperPlugins", "failover,efm2")
        setProperty("enableTelemetry", "true")
        setProperty("telemetryTracesBackend", "OTLP")
        setProperty("telemetryMetricsBackend", "NONE")
    }

    val hikariDataSource = HikariDataSource(hikariConfig)

    hikariDataSource.connection.use { conn ->
        conn.createStatement().use { statement ->
            statement.executeQuery("SELECT * from pg_catalog.aurora_db_instance_identifier()").use { rs ->
                while (rs.next()) {
                    println(rs.getString(1))
                }
            }
        }
    }
}
  • Database - Aurora PostgreSQL with custom endpoints
  • After ~30 minutes after application startup, time taken for responding to service request is several seconds - in our case, the endpoints trigger multiple database queries, and for each of them there was an overhead of ~5 seconds coming from the failover plugin. This piled up to 20-30 seconds of total response latency.

Possible Solution

We don’t really have any hard evidence apart from the traces that pointed us to the wrapper, and the actual performance improvement after changing plugin’s version.

Additional Information/Context

Adding this issue to check with you whether you are aware of this problem - perhaps the version 1 of the plugin is rarely used and no one discovered this performance issue before. Since the plugin is still available, such performance decrease should not be expected. The actual issue has been fixed for us, it's just a heads-up.

The AWS Advanced JDBC Wrapper version used

3.1.0

JDK version used

Kotlin 2.3.0

Operating System and version

Amazon Linux 2023

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions