0

We can create segments outside of Apache Pinot cluster. That's good. But I don't understand how it upload segments to offline server from deep storage. For example,

I write segment to Hdfs. Then I call segment tar push job. If I didn't understand wrong, segmenttarpushjob downloads created segment from deep storage(s3), then it uploads segments to the controller via rest api, and controller sends segments to offline servers.

Will not this process create bottleneck while sending segments to controller? What will happens if offline servers downloads segments from deep storage directly?

lifeisshort
  • 283
  • 4
  • 20
sparkless
  • 265
  • 1
  • 3
  • 10

2 Answers2

2

There are two way to push data to Pinot Controller

  1. URI based: In this mode the caller only provides the segment URI and the segment metadata. If the segment metadata is not provided controller will fetch the segment and extract the metadata. The controller needs the metadata for validation purposes. In this mode, servers will pull the segments directly from the deep store

  2. Payload based: In this mode, the caller sends the segment tarball as payload, and the controller stores this segment tarball in its dataDir (this can be NFS or deep store again). The controller updates the segment metadata in zookeeper to indicate the location of the segment. Servers use the location in metadata to fetch the segments

Kishore G
  • 626
  • 6
  • 4
0

Here's a video explaining details of segment assignment in Apache Pinot offline tables: https://youtu.be/HycNRCzkrjg It demonstrates the steps that happen when a segment is uploaded to the deep store: controller notification -> segment assignment computation -> server downloading segments.

nehapawar
  • 11
  • 2
  • 2
    A link to a solution is welcome, but please ensure your answer is useful without it: [add context around the link](//meta.stackexchange.com/a/8259) so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. [Answers that are little more than a link may be deleted.](//stackoverflow.com/help/deleted-answers) – Marco Bonelli Sep 17 '20 at 22:17