0

Problem statement -

We are planning to store hexadecimal string data with length 64 in a BigQuery column. Will it affect the BigQuery query performance when queries are run with filter/join operations on these columns (with large string lengths) compared to when a smaller length string is stored?

Example -

Let's assume there is a BigQuery table - abc.HACKERNEWS.news

Columns -

id, time, time_ts, encrypted_data, news, status.

Known - encrypted_data column has String with length 32.

Query - SELECT time FROM abc.HackerNews.news where encrypted_data = 'abcdefghijklmnopqrstuvwxyz123deabcdefghijklmnopqrstuvwxyzabcde' LIMIT 1000

How will the performance impact with the change encrypted_data length? Will the query perform better if the length of the string length stored in encrypted_data column is shorter say 5?

Arijeet Saha
  • 1,118
  • 11
  • 23

1 Answers1

0

Refer to this documentation here in regards to data size calculation:

STRING (data types are equal to) 2 bytes + the UTF-8 encoded string size

So answering your question: yes, the longer the string, the more bytes the query will need to process, and the slower it will be. Therefore, choosing a shorter string length might improve the query performance.

Aleix CC
  • 1,601
  • 1
  • 7
  • 18