1

I tried to create and ExternalCatalog to use in Apache Flink Table. I created and added to the Flink table environment (here the official documentation). For some reason, the only external table present in the 'catalog', it is not found during the scan. What I missed in the code above?

  val catalogName = s"externalCatalog$fileNumber"
  val ec: ExternalCatalog = getExternalCatalog(catalogName, 1, tableEnv)
  tableEnv.registerExternalCatalog(catalogName, ec)
  val s1: Table = tableEnv.scan("S_EXT")

  def getExternalCatalog(catalogName: String, fileNumber: Int, tableEnv: BatchTableEnvironment): ExternalCatalog = {
    val cat = new InMemoryExternalCatalog(catalogName)
    // external Catalog table
    val externalCatalogTableS = getExternalCatalogTable("S")
    // add external Catalog table
    cat.createTable("S_EXT", externalCatalogTableS, ignoreIfExists = false)
    cat
  }

  private def getExternalCatalogTable(fileName: String): ExternalCatalogTable = {
    // connector descriptor
    val connectorDescriptor = new FileSystem()
    connectorDescriptor.path(getFilePath(fileNumber, fileName))
    // format
    val fd = new Csv()
    fd.field("X", Types.STRING)
    fd.field("Y", Types.STRING)
    fd.fieldDelimiter(",")
    // statistic
    val statistics = new Statistics()
    statistics.rowCount(0)
    // metadata
    val md = new Metadata()
    ExternalCatalogTable.builder(connectorDescriptor)
      .withFormat(fd)
      .withStatistics(statistics)
      .withMetadata(md)
      .asTableSource()
  }

The example above is part of this test file in git.

  • The reason for creating an ExternalCatalog is to provide Statistics for each table present in the 'Catalog'. This reason was explained in https://stackoverflow.com/questions/54101174/why-does-flink-sql-use-a-cardinality-estimate-of-100-rows-for-all-tables. – Salvatore Rapisarda Jan 21 '19 at 08:55

1 Answers1

2

This is probably a namespace issue. Tables in external catalogs are identified by a list of names of the catalog, (potentially schemas,) and finally the table name.

In your example, the following should work:

val s1: Table = tableEnv.scan("externalCatalog1", "S_EXT")

You can have a look at the ExternalCatalogTest to see how external catalogs can be used.

Fabian Hueske
  • 18,707
  • 2
  • 44
  • 49
  • Yes, it works but only if I don't include the statistics in the table which are necessary for https://stackoverflow.com/questions/54101174/why-does-flink-sql-use-a-cardinality-estimate-of-100-rows-for-all-tables Is it possible to use the object ***statistic*** in an external catalog table? This part of the error returned: Could not find a suitable table factory for 'org.apache.flink.table.factories.BatchTableSourceFactory' in the classpath. Reason: The matching factory 'org.apache.flink.table.sources.CsvBatchTableSourceFactory' doesn't support 'statistics.row-count'. ... – Salvatore Rapisarda Jan 23 '19 at 21:35
  • 1
    This seems to be an issue with the specific `CsvBatchTableSourceFactory`. You might want to fork and extend it to support statistics. – Fabian Hueske Jan 24 '19 at 09:19