4

How would I got about passing an ARRAY of STRUCTS into my user-defined function (using standard SQL)?

Firstly, a bit of context:

Table schema:

id STRING
customer STRING
request STRUCT<
  headers STRING
  body STRING
  url STRING
>
response STRUCT<
  size INT64
  body STRING
>
outgoing ARRAY<
  STRUCT<
    request STRUCT<
      url STRING,
      body STRING,
      headers STRING
    >,
    response STRUCT<
      size INT64,
      body STRING
    >
  >
>

User-defined function:

CREATE TEMPORARY FUNCTION extractDetailed(
  customer STRING,
  request STRUCT<
    headers STRING,
    body STRING
  >,
  outgoing ARRAY<
    STRUCT<
      request STRUCT<url STRING>,
      response STRUCT<body STRING>
    >
  >
)
RETURNS STRING
LANGUAGE js AS """

""";

SELECT extractDetailed(customer, STRUCT(request.headers, request.body), outgoing)
FROM request_logs

As for my problem: I can't seem to figure out how to select part of the outgoing ARRAY, and pass it to the user-defined function as an array.

Effectively, I'm trying to simulate the following user-defined function call:

extractDetailed(
  "customer id",
  { "headers": "", "body": "" },
  [
    {
      "request": { "url": "" },
      "response": { "body": "" }
    },
    {
      "request": { "url": "" },
      "response": { "body": "" }
    }
  ]
);

I have recently stumbled across some documentation that might help unlock it, I just can't seem to figure out how to make it fit. I'm really struggling with this, and would appreciate any help in resolving it.

garbetjie
  • 579
  • 3
  • 10

1 Answers1

8

Try below. It parses needed peaces from your array and puts them back into new array before passing to function so it matches sugnature

CREATE TEMPORARY FUNCTION extractDetailed(
customer STRING,
request STRUCT<headers STRING, body STRING>,
outgoing ARRAY<STRUCT<request STRUCT<url STRING>, response STRUCT<body STRING>>>
)
RETURNS STRING
LANGUAGE js AS """

""";

SELECT 
  extractDetailed(
    customer, 
    STRUCT(request.headers, request.body), 
    ARRAY(
      SELECT STRUCT<request STRUCT<url STRING>,response STRUCT<body STRING>>
          (STRUCT(request.url), STRUCT(response.body)) 
      FROM UNNEST(outgoing)
    )
  ) AS details
FROM request_logs  

To further "optimize" above query and make it more portable, you can wrap extracting parts from original array to new array into separate SQL UDF

CREATE TEMPORARY FUNCTION extractParts (
  outgoing ARRAY<STRUCT<request STRUCT<url STRING, body STRING, headers STRING>,
                        response STRUCT<size INT64, body STRING>>>
)
RETURNS ARRAY<STRUCT<request STRUCT<url STRING>, response STRUCT<body STRING>>>
AS ((
  SELECT ARRAY(
      SELECT STRUCT<request STRUCT<url STRING>,response STRUCT<body STRING>>
          (struct(request.url), struct(response.body)) 
      FROM UNNEST(outgoing)
    )
));

CREATE TEMPORARY FUNCTION extractDetailed(
  customer STRING,
  request STRUCT<headers STRING, body STRING>,
  outgoing ARRAY<STRUCT<request STRUCT<url STRING>, response STRUCT<body STRING>>>
)
RETURNS STRING
LANGUAGE js AS """
  return outgoing.length;
""";

SELECT 
  extractDetailed(
    customer, 
    STRUCT(request.headers, request.body),
    extractParts(outgoing)
  ) as details
FROM request_logs
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • So, from a performance point of view, is there any difference between the two? My first thought around having two UDF's is that it would cause a slight slowdown. But I'm guessing it might be a tradeoff between portability and performance. – garbetjie Jan 18 '17 at 04:45
  • from my experience SQL UDF is not adding to slowness. JS UDF could. So, i feel two above should perform the same – Mikhail Berlyant Jan 18 '17 at 05:19