0

BigQuery documentation describes path expressions, which look like this:

foo.bar
foo.bar/25
foo/bar:25
foo/bar/25-31
/foo/bar
/25/foo/bar

But it doesn't say a lot about how and where these path expressions are used. It only briefly mentions:

A path expression describes how to navigate to an object in a graph of objects.

  • But what is this graph of objects?
  • How would you use this syntax with a graph of objects?
  • What's the meaning of a path expression like foo/bar/25-31?

My question is: what are these Path Expressions the official documentation describes?

I've searched through BigQuery docs but haven't managed to find any other mention of these path expressions. Is this syntax actually part of BigQuery SQL at all?

What I've found out so far

There is an existing question, which asks roughly the same thing, but for some reason it's downvoted and none of the answers are correct. Though the question it asks is more about a specific detail of the path expression syntax.

Anyway, the answers there propose a few hypotheses as to what path expressions are:

It's not a syntax for referencing tables

The BigQuery Legacy SQL uses syntax that's similar to path expressions for referencing tables:

SELECT state, year FROM [bigquery-public-data:samples.natality]

But that syntax is only valid in BigQuery Legacy SQL. In the new Google Standard SQL it produces a syntax error. There's a separate documentation for table path syntax, which is different from path expression syntax.

It's not JSONPath syntax

JSONPath syntax is documented elsewhere and looks like:

SELECT JSON_QUERY(json_text, '$.class.students[0]')

It's not a syntax for accessing JSON object graph

There's a separate JSON subscript operator syntax, which looks like so:

SELECT json_value.class.students[0]['name']

My current hypothesis

My best guess is that BigQuery doesn't actually support such syntax, and the description in the docs is a mistake.

But please, prove me wrong. I'd really like to know because I'm trying to write a parser for BigQuery SQL, and to do so, I need to understand the whole syntax that BigQuery allows.

Rene Saarsoo
  • 13,580
  • 8
  • 57
  • 85
  • Can you clarify more on the issue you're facing? To use select statement in Standard SQL you will have to use the syntax `Select Column from 'Project.Dataset.Table'`. JSON_VALUE can be used as for eg: `SELECT json_value.class.students[0]['name'] AS first_student FROM table` and JSON_QUERY can be used as `SELECT JSON_QUERY(json, '$."a.x"'), JSON_EXTRACT(json, "$['a.x']") FROM table;` in Standard SQL. Graph objects are built to represent complex relationships. – Prajna Rai T Nov 23 '22 at 13:45
  • My question is: what are these "Path Expressions" that the official BigQuery documentation describes? I don't have a problem with writing a SELECT statement or using JSON functions and values - I only mentioned these to clarify that these are NOT the answers to my question. – Rene Saarsoo Nov 23 '22 at 17:17
  • People who worked with SQL before are familar to `Select * from Dataset.TableName`. In BigQuery this is a bit more comples with `Project.Dataset.Tablename`. Also fields can be nested or contain subelements. For explaining these special notification, the "path expression" is introduced. But only part of it is possible to be used in BigQuery. ":" and "-" and "/" are not supported. – Samuel Nov 24 '22 at 13:42

2 Answers2

1

I believe that a "path expression" is the combination of identifiers that points to specific objects/tables/columns/etc. So `project.dataset.table.struct.column` is a path expression comprising of 5 identifiers. I also think that alias.column within the context of a query is a path expression with 2 identifiers (although the alias is probably expanded behind the scenes).

If you scroll up a bit in your link, there is a section with some examples of valid path expressions, which also happens to be right after the identifiers section.

With this in mind, I think a JSON path expression is a certain type of path expression, as parsing JSON requires a specific set of identifiers to get to a specific data element.

As for the "graph" terminology, perhaps BQ parses the query and accesses data using a graph methodology behind the scenes, I can't really say. I would guess "path expressions" probably makes more sense to the team working on BigQuery rather than users using BigQuery. I don't think there is any special syntax for you to "use" path expressions.

If you are writing a parser, maybe take some inspiration from this ZetaSQL parser, which has several references to path expressions.

rtenha
  • 3,349
  • 1
  • 6
  • 19
  • Thanks for the answer. I agree it would make sense for path expressions to mean the syntax of `ident1.ident2.ident3.etc`. That's also the syntax that a path_expression rule describes in ZetaSQL grammar. However, it provides no explanation for the syntax that BigQuery documentation describes, notably the use of `/`, `:`, `-` characters inside path expressions. – Rene Saarsoo Nov 28 '22 at 15:06
  • It's quite possible that's is how BQ operates internally, or perhaps it refers to planned features that never made it to production, or could be for backward compatibility, or for a variety of other reasons . `/`s could easily be something where you can query against a file/folder structure in GCP. `:`s were used in LegacySQL as noted. `-` (and `:` for that matter) could be used to get to an element in a YAML file perhaps. Whatever you are trying to do with this information, I wouldn't worry too much. Worry about the 99% use-case and come back to it if needed. – rtenha Nov 28 '22 at 20:17
  • 1
    I'll award you the bounty. Although it didn't directly answer my question, it provided pointers to resources I found helpful, and they guided me towards a possible answer and better understanding of the BigQuery ecosystem. – Rene Saarsoo Nov 30 '22 at 10:08
1

Looks this syntax comes from ZetaSQL parser, which includes the exact same documentation. BigQuery most likely uses ZetaSQL internally as its parser (ZetaSQL supports all of BigQuery syntax and they're both from Google).

According to ZetaSQL grammar a path expression beginning with / and containing : and - can be used for referencing tables in FROM clause. Looks like the / and : are simply part of identifier names, like the - is part of identifier names in BigQuery.

But the support for the : and / characters in ZetaSQL path expressions can be toggled on or off, and it seems that in BigQuery it's been toggled off. BigQuery doesn't allow : and / characters in table names - not even when they're quoted.

ZetaSQL also allows to toggle the support of - in identifier names, which BigQuery does allow.

My conclusion: it's a ZetaSQL parser feature, the documentation of which has been mistakenly copy-pasted to BigQuery documentation.

Thanks to rtenha for pointing out the ZetaSQL parser, of which I wasn't aware before.

Rene Saarsoo
  • 13,580
  • 8
  • 57
  • 85