I have an App Engine Flex java service (REST backend, 1 instance, always up) which connects to Cloud SQL instance using IAM authentication (service account). Everything seems to work fine most of the time but sometimes I'm having timeouts on REST calls and my service and its clients experiencing DoS.
GCP logs shows me that I had 7365 occurrences of this error during last 7 days which is insane:
org.springframework.dao.DataAccessResourceFailureException: Unable to acquire JDBC Connection; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
I was thinking of limited availability of Cloud SQL Instance itself b/c mine is not highly available, but I see those errors in Postgres log (which is my Cloud SQL DB) which shows that the instance itself is available but credentials are not ok:
2022-06-16 08:33:33.316 UTC [610407]: [1-1] db=db-dev,user=connect-xxx@appspot FATAL: Cloud SQL IAM service account authentication failed for user "connect-xxx@appspot"
2022-06-16 08:33:33.316 UTC [610407]: [1-1] db=db-dev,user=connect-xxx@appspot DETAIL: Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
Connection matched pg_hba.conf line 21: "local all +cloudsqliamserviceaccount cloudsql-iam-svc-acct"
I was thinking of some issues in Cloud SQL Auth Proxy but its logs don't show anything that could help.
My connection settings:
<bean id="hikariConfig" class="com.zaxxer.hikari.HikariConfig">
<property name="poolName" value="springHikariCP" />
<property name="connectionTestQuery" value="SELECT 1" />
<property name="driverClassName" value="org.postgresql.Driver" />
<property name="connectionTimeout" value="600000"/>
<property name="jdbcUrl" value="jdbc:postgresql://127.0.0.1:3306/db-dev?cloudSqlInstance=connect-xxx:europe-west9:dev&socketFactory=com.google.cloud.sql.postgres.SocketFactory" />
<property name="username" value="connect-xxx@appspot"/>
<!-- IAM auth -->
<property name="password" value="dummy"/>
<property name="dataSourceProperties">
<props>
<prop key="sslmode">disable</prop>
<prop key="enableIamAuth">true</prop>
</props>
</property>
<!---->
</bean>
My App Engine service has 1vCPU & 4 gigs of RAM. I see some spikes in CPU usage (up to 80%) but they are not coincide with DoS. Mem usage is always not higher than 1.5 gigs. My Postgres instance has 1vCPU & 3.75 gigs of RAM. CPU usage is always about 5%, mem usage is about 2 gigs, always not more than 6 transactions per second.
Any ideas?
UPD regarding SQL proxy:
I use
beta_settings:
cloud_sql_instances: connect-xxx:europe-west9:dev=tcp:3306
in my app.yaml. I also clearly use 3306 port in connection string. Cloud SQL Proxy is there and also listening on 3306:
docker logs yyy
2022/06/21 15:44:23 current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.
2022/06/21 15:44:24 Listening on 0.0.0.0:3306 for connect-xxx:europe-west9:dev
2022/06/21 15:44:24 Ready for new connections
2022/06/21 15:44:24 Generated RSA key in 731.70246ms
systemd-r 246 systemd-resolve 14u IPv4 13413 0t0 TCP 127.0.0.53:53 (LISTEN)
container 345 root 11u IPv4 14603 0t0 TCP 127.0.0.1:45795 (LISTEN)
sshd 422 root 3u IPv4 14545 0t0 TCP *:22 (LISTEN)
sshd 422 root 4u IPv6 14546 0t0 TCP *:22 (LISTEN)
dockerd 447 root 27u IPv4 17276 0t0 TCP 172.17.0.1:3306 (LISTEN)
PG Socket factory dep:
<dependency>
<groupId>com.google.cloud.sql</groupId>
<artifactId>postgres-socket-factory</artifactId>
<version>1.6.0</version>
</dependency>
UPDATE 2: Google confirmed this is an issue on their side and they will roll out the fix in a few weeks.