0

The What

Does anyone know a better way of identifying/differentiating between a view or query written in Legacy or Standard SQL, using the BigQuery API?

The only method that came to mind was to run the SQL with the dryRun property set to true (this would incur the minimum processing overhead) and if it fails with a message that included the phrase "Try using standard SQL" I could assume it was legacy SQL, otherwise it works and is Standard. E.g.

{
...
"code": 400,
"message": 
    "Invalid table name: `my-project.my_dataset.hello_world`
    [Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]."
}

The Why

I'm maintaining an integration layer (written in Cloud Functions - Node.js 8) that migrates data using the BigQuery API.

Unfortunately the user community has been permitted to write views and queries in either Standard or Legacy SQL. Due to the numerous limitations when dealing with Legacy SQL I'd like to detect which queries and views are written using it, allowing me to adjust the processing method accordingly.

FreeZey
  • 2,382
  • 3
  • 11
  • 23

1 Answers1

2

Does anyone know a better way of identifying/differentiating between a view or query written in Legacy or Standard SQL, using the BigQuery API?

You can try and use javascript regex to identify the type of the SQL.

You can use the below code snippet as a baseline

isStandardSql(idString) {
  let isStandard, fullId, partialId, projectId = '';
  // This 'if' checks if the provided idString is of type standard and makes sure there is only one ':' in the expression (as in legacy syntax)
  const splitted = idString.split(/[:.]/g);
  if (splitted.length > 3) {
    const __ret = this.try2findProjectId(idString, projectId);
    idString = __ret.idString;
    projectId = __ret.projectId;
  }
  if ((idString.match(/:/g))) {
    // Regex that checks if the format of the id match legacy
    let matched = idString.match(/([\[]([^[]|[\[][\]])*[\]])/g);
    if (matched && matched[0]) {
    fullId = projectId + matched[0].substring(1, idString.length - 1);
    isStandard = false;
    } else {
    this.errorMessage("First Regex", idString);
    }
    // Same as the first only that here instead of ':' we are looking for '.' and we want to make sure there is more than 1 (as in standard syntax)
  } else if ((idString.match(/\./g) && idString.match(/\./g).length === 2)) {
    // Regex that checks if the format of the id match standard
    let matched = idString.match(/(`([^`]|``)*`)/g);// ? idString.match(/(`([^`]|``)*`)/g) : [idString];
    if (matched && matched[0]) {
    fullId = projectId + matched[0].substring(1, idString.length - 1);
    isStandard = true
    } else if(!matched && idString){
    fullId = projectId + idString;
    isStandard = true;
    }
    else {
    this.errorMessage("Second Regex", idString);
    }
  }
  else {//projectID.dataset
    // In case of id without projectId of proxy "use" project.dataset
    if(splitted.length === 2) {
      fullId = '';
      if (idString[0] === '[' && idString[idString.length - 1] === ']') {
        isStandard = false;
      }
      else if (idString[0] === '`' && idString[idString.length - 1] === '`') {
        isStandard = true;
      }
      partialId = idString.replace(/`|\[|\]/g, '')
    }
    else {
      this.errorMessage("Third Regex", idString);
    }
  }
  // Return values is flag the determine the type (standard or legacy) and id without staring/ ending chars (``, [])
  return {
    isStandard,
    fullId: fullId,
    partialId: partialId
  };
  }


try2findProjectId(idString, projectId)
    {
        let numOfInstances = 0
        for (let i = idString.length; i > 0; i--) {
            const char = idString[i - 1]
            if (char === ':' || char === '.') {
                numOfInstances++
                if (numOfInstances === 2) {
                    projectId = idString.substring(1, i - 1)
                    idString = idString.substring(i - 1, idString.length)
                    idString = idString[idString.length - 1] === '`' ? '`' + idString : idString
                    idString = idString[idString.length - 1] === ']' ? '[' + idString : idString
                }
            }
        }
        return {idString, projectId}
    }
Tamir Klein
  • 3,514
  • 1
  • 20
  • 38
  • Nice! Do you know of any caveats to the regex? I.e. sneaky or weird SQL examples that might not fit? Just to get an approximate percentage for the risk? Or would you consider this complete coverage? – FreeZey Mar 05 '19 at 13:30
  • Works great for me up till now for a few months with no issues. If this answer help please mark it as the solution for others to follow – Tamir Klein Mar 05 '19 at 13:42
  • I've marked it as the answer but I've had to reconfigure things a little and also run into an issue with the first if statement "if (splitted.length > 3)" because if references an absent function "try2findProjectId". – FreeZey Mar 06 '19 at 09:33
  • @FreeZey thanks, I added the missing function, If there is anything else to improve in the code do let me know – Tamir Klein Mar 06 '19 at 12:06