0

I have a case where I need to peek ahead in the stream for the existence of a certain regular expression and then read data from the stream.

mark and reset allow me to do this but I am facing an issue where mark becomes invalid if the readAheadLimit goes beyond the size of the current buffer.

For example: I have a BufferedReader with buffer size of 1k.

Lets say I am at position 1000 (mark=1000) in the buffer and I need to check for the regex in the next 100 chars (readAheadLimit=100).

So while reading, the moment I cross the current buffer size (1024), a new buffer is allocated and the mark becomes invalid (not able to reset) and the data is streamed into the new buffer in a normal way.

I think this is the intended behavior but is there a way to get around this?

Appreciate your help.

regards

Anuj Kaushal
  • 61
  • 1
  • 6

2 Answers2

2

the moment I cross the current buffer size (1024), a new buffer is allocated

No it isn't. The existing buffer is cleared and readied for another use.

and the mark becomes invalid (not able to reset)

No it doesn't, unless you've gone beyond the read ahead limit.

You don't seem to have read the API. You call mark() with an argument that says how far ahead you want to go before calling reset(), in this case 100 bytes, and the API is required to allow you to do exactly that. So when you get up to 100 characters ahead, call reset(), and you are back where you were when you called mark(). How that happens internally isn't your problem, but it is certainly required to happen.

And how did you get a BufferedReader with a 1k buffer? The default is 4096.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Thanks for your response. Well... I understand what the api states. But reset throws an IOException and this was not due to the readAheadLimit. As to why the mark is invalidated (I am not sure if this is the intended behavior but I think even the javadocs are not too sure of this). "Subsequent calls to reset() will **attempt** to reposition the stream to this point. btw, BufferedReader(Reader in, int sz) allows you to set the buffer size which defaults to 8k if I am not wrong. – Anuj Kaushal Feb 22 '13 at 07:37
  • 1
    @falloficarus Well what I was getting at was why you reduced the buffer size so dramatically. There's no benefit. If you are getting an IOException you should have posted the stack trace in your question. First we've heard of it here, in a comment to an answer. – user207421 Feb 24 '13 at 21:00
1

There are at least two options:

  1. Set default cache size much more than 1k:

    new BufferedReader(originalReader, 1024 * 1024) // e.g. 1Mb

  2. Apply custom buffering to increase cache size as soon as limit was exceeded. In case if you are working with huge amount of data - custom buffering can store data it in database or file.

Raman
  • 887
  • 4
  • 12
  • 28
  • 1. May not satisfy all our cases since we stream large amount of data and these peeks may be frequently required. 2. Sounds interesting. I'll appreciate if you can elaborate. Are you suggesting that we read into a buffer and then resize it based on a load factor by copying over the previous contents into the new buffer? – Anuj Kaushal Feb 19 '13 at 06:56
  • Edited 2nd option in answer. Load factor can invoke OOM if your data is really big, so you will still need some limitation to do it in memory. – Raman Feb 19 '13 at 06:58
  • Is it possible to get the current position in the buffer - where the mark is set? I'd like to use this to determine if mark+readAheadLimit will spill over to the next buffer. – Anuj Kaushal Feb 19 '13 at 07:11
  • 1
    Hm, you can get this info via reflection. – Raman Feb 19 '13 at 07:12
  • 1
    @falloficarus You don't need it. The issue is already handled internally by BufferedInputStream/Reader. – user207421 Feb 21 '13 at 12:00