1

Problem Definition

I am trying to integrate the data which exists in Confluent Schema Registry with Apache Atlas. For this purpose I have seen lots of links, they also talk about its possibility but they didn't give any technical information of how this integration was done.

Question

Would anyone help me to import the data (also metadata) from Schema Registry to Apache Atlas real-time? Is there any hook, even-listener or something like this to implement it?

Example

Here is what I have from Schema Registry:

{
   "subject":"order-value",
   "version":1,
   "id":101,
   "schema":"{\"type\":\"record\",\"name\":\"cart_closed\",\"namespace\":\"com.akbar.avro\",\"fields\":[{\"name\":\"_g\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"_s\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"_u\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"application_version\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"client_time\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"event_fingerprint\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"os\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"php_session_id\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"platform\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"server_time\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"site\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"user_agent\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"payment_method_id\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"page_view\",\"type\":[\"boolean\",\"null\"],\"default\":null},{\"name\":\"items\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"item\",\"fields\":[{\"name\":\"brand_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"category_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"discount\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"order_item_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"price\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"product_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"quantity\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"seller_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"variant_id\",\"type\":[\"long\",\"null\"],\"default\":null}]}}},{\"name\":\"cart_id\",\"type\":[\"long\",\"null\"],\"default\":null}]}"
}

How to import it in Apache Atlas?

What I have done

I checked the schema registry documentation in which it has the following architecture:

schema registry architecture

So I decided to set the Kafka url but I didn't find any where to set the Kafka configuration. I tried to change the atlas.kafka.bootstrap.servers variable in atlas-application.properties. I have also tried to call import-kafka.sh from hook-bin directory but it wasn't successful.

Error log

2021-04-25 15:48:34,162 ERROR - [main:] ~ Thread Thread[main,5,main] died (NIOServerCnxnFactory$1:92)
org.apache.atlas.exception.AtlasBaseException: EmbeddedServer.Start: failed!
    at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:115)
    at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: java.lang.NullPointerException
    at org.apache.atlas.util.BeanUtil.getBean(BeanUtil.java:36)
    at org.apache.atlas.web.service.EmbeddedServer.auditServerStatus(EmbeddedServer.java:128)
    at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:111)
    ... 1 more

Mostafa Ghadimi
  • 5,883
  • 8
  • 64
  • 102
  • 1
    Unclear what about the registry you are trying to get. What links are you referring to? The Atlas integration is for Kafka topic metadata, not the internal details of data within the topic. I think LinkedIn Datahub offers better support for the schemas themselves – OneCricketeer May 01 '21 at 13:17
  • @OneCricketeer The only thing I want to have is to integrate Kafka and Apache Atlas using hook as it is said in the [documentation](https://atlas.apache.org/#/HookKafka). The problem is I have Kafka on another server, so there isn't anything found describing how to do this! How can I pass the IP of the Kafka to Atlas? – Mostafa Ghadimi May 02 '21 at 00:30
  • 1
    It's `atlas.kafka.bootstrap.servers` like you've stated in the question... Atlas has no integration with Schema Registry. It's unclear what address you've currently configured – OneCricketeer May 02 '21 at 00:35
  • @OneCricketeer Yeah you are right. but whenever I want to run it, I faced to the same problem in error log! For more information: Schema Registry has a Kafka (like the picture above) in its core and want to integrate with that. – Mostafa Ghadimi May 02 '21 at 00:37
  • 1
    No, Schema Registry is not a broker and doesn't embed one "in its core". More specifically, you need to use the same bootstrap address between both Atlas and the Registry – OneCricketeer May 02 '21 at 00:53
  • @OneCricketeer Would you please give me more details? I have Kafka on different machine than Atlas. How can I integrate them using hook? I simply passed the IP of the machine that Kafka is running on that to Atlas, but nothing has but an error has happened. – Mostafa Ghadimi May 02 '21 at 00:56
  • 1
    I honestly don't know what your error means, but I suggest reading the Kafka documentation or some blogs mentioning the `listeners` and `advertised.listeners` properties to solve any network issues – OneCricketeer May 02 '21 at 01:00
  • @OneCricketeer Unfortunately its community is small. Thanks for your help! – Mostafa Ghadimi May 02 '21 at 01:01
  • 1
    You could also install Kafka clients directly on the Atlas machine and use CLI tools to see if you can consume or describe topics like Atlas would do itself – OneCricketeer May 02 '21 at 01:02

0 Answers0