0

I have an App Engine Flex java service (REST backend, 1 instance, always up) which connects to Cloud SQL instance using IAM authentication (service account). Everything seems to work fine most of the time but sometimes I'm having timeouts on REST calls and my service and its clients experiencing DoS.

GCP logs shows me that I had 7365 occurrences of this error during last 7 days which is insane:

org.springframework.dao.DataAccessResourceFailureException: Unable to acquire JDBC Connection; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection

I was thinking of limited availability of Cloud SQL Instance itself b/c mine is not highly available, but I see those errors in Postgres log (which is my Cloud SQL DB) which shows that the instance itself is available but credentials are not ok:

2022-06-16 08:33:33.316 UTC [610407]: [1-1] db=db-dev,user=connect-xxx@appspot FATAL:  Cloud SQL IAM service account authentication failed for user "connect-xxx@appspot"
2022-06-16 08:33:33.316 UTC [610407]: [1-1] db=db-dev,user=connect-xxx@appspot DETAIL:  Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
Connection matched pg_hba.conf line 21: "local   all           +cloudsqliamserviceaccount         cloudsql-iam-svc-acct"

I was thinking of some issues in Cloud SQL Auth Proxy but its logs don't show anything that could help.

My connection settings:

<bean id="hikariConfig" class="com.zaxxer.hikari.HikariConfig">
        <property name="poolName" value="springHikariCP" />
        <property name="connectionTestQuery" value="SELECT 1" />
        <property name="driverClassName" value="org.postgresql.Driver" />
        <property name="connectionTimeout" value="600000"/>
        <property name="jdbcUrl" value="jdbc:postgresql://127.0.0.1:3306/db-dev?cloudSqlInstance=connect-xxx:europe-west9:dev&socketFactory=com.google.cloud.sql.postgres.SocketFactory" />
        <property name="username" value="connect-xxx@appspot"/>
        <!-- IAM auth -->
        <property name="password" value="dummy"/>
        <property name="dataSourceProperties">
            <props>
                <prop key="sslmode">disable</prop>
                <prop key="enableIamAuth">true</prop>
            </props>
        </property>
        <!---->
    </bean>

My App Engine service has 1vCPU & 4 gigs of RAM. I see some spikes in CPU usage (up to 80%) but they are not coincide with DoS. Mem usage is always not higher than 1.5 gigs. My Postgres instance has 1vCPU & 3.75 gigs of RAM. CPU usage is always about 5%, mem usage is about 2 gigs, always not more than 6 transactions per second.

Any ideas?

UPD regarding SQL proxy:

I use

beta_settings:
  cloud_sql_instances: connect-xxx:europe-west9:dev=tcp:3306

in my app.yaml. I also clearly use 3306 port in connection string. Cloud SQL Proxy is there and also listening on 3306:

docker logs yyy
2022/06/21 15:44:23 current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.
2022/06/21 15:44:24 Listening on 0.0.0.0:3306 for connect-xxx:europe-west9:dev
2022/06/21 15:44:24 Ready for new connections
2022/06/21 15:44:24 Generated RSA key in 731.70246ms
systemd-r    246 systemd-resolve   14u  IPv4  13413      0t0  TCP 127.0.0.53:53 (LISTEN)
container    345            root   11u  IPv4  14603      0t0  TCP 127.0.0.1:45795 (LISTEN)
sshd         422            root    3u  IPv4  14545      0t0  TCP *:22 (LISTEN)
sshd         422            root    4u  IPv6  14546      0t0  TCP *:22 (LISTEN)
dockerd      447            root   27u  IPv4  17276      0t0  TCP 172.17.0.1:3306 (LISTEN)

PG Socket factory dep:

<dependency>
            <groupId>com.google.cloud.sql</groupId>
            <artifactId>postgres-socket-factory</artifactId>
            <version>1.6.0</version>
        </dependency>

UPDATE 2: Google confirmed this is an issue on their side and they will roll out the fix in a few weeks.

nnl
  • 11
  • 3
  • Deleted my initial response since I misread your setup. You're not using the Cloud SQL Auth Proxy. You're using the Cloud SQL Java Connector (aka Socket Factory). What version are you using? – enocom Jun 21 '22 at 20:42
  • Please see my update about the version and other details @enocom. Now I'm confused even more. I thought I connect to local Proxy using sockets and it's do all the job with remote Cloud SQL instance. – nnl Jun 22 '22 at 15:15
  • The postgres-socket-factory connects directly to your Cloud SQL instance. It doesn't use the Cloud SQL Proxy. I see you're using a recent version of the socket factory. It's hard to say what the issue might be without a more specific error message. Have you tried downgrading to v1.5.0 to see if that helps? – enocom Jun 22 '22 at 16:09
  • 1st: downgrade didn't help. I see the same errors @enocom 2nd: you're right — I don't use cloud sql proxy. The whole doc is very misleading to me https://cloud.google.com/sql/docs/postgres/connect-app-engine-flexible#java – nnl Jun 28 '22 at 09:08
  • It's hard to debug on stackoverflow. Would you mind opening an issue on the repo? https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory – enocom Jun 28 '22 at 16:00
  • I've opened the ticket in google support @enocom – nnl Jul 04 '22 at 12:07
  • @enocom Has this issue been resolved yet? – babsher Oct 14 '22 at 18:42
  • I believe we've tracked this down to a race condition in the Google Auth Library. Track the status here: https://github.com/googleapis/google-auth-library-java/pull/1031. – enocom Oct 15 '22 at 02:16
  • @enocom yep, already looking at this as google support suggested. thanks – nnl Oct 18 '22 at 22:02

0 Answers0