0

Receiving errors from the Blob extractor that files are too large for the current tier, which is basic. I will be upgrading to a higher tier, but I notice that the max size is currently 256MB.

When I have PPTX files that are mostly video and audio, but have text I'm interested in, is there a way to index those? What does the blob extractor max file size actually mean?

Can I tell the extractor to only take the first X MB or chars and just stop?

Paul Allen
  • 349
  • 1
  • 9

1 Answers1

2

There are two related limits in the blob indexer:

  1. Max file size limit that you are hitting. If file size exceeds that limit, indexer doesn't attempt to download it and produces an error to make sure you are aware of the issue. The reason we don't just take first N bytes is because for parsing many formats correctly, the entire file is needed. You can mark blobs as skipable or configure indexer to ignore a number of errors if you want it to make forward progress when encountering blobs that are too large.

  2. The max size of extracted text. In case file contains more text than that, indexer takes N characters up to the limit and includes a warning so you can be aware of the issue. Content that doesn't get extracted (such as video, at least today) doesn't contribute to this limit, of course.

How large are the PPTX you need indexed? I'll add my contact info in a comment.

Eugene Shvets
  • 4,561
  • 13
  • 19