5

Is there a way, using the SML Basis library, to open a file at a specific position? That is, use an operating system call to change the position, rather than scan through the file and throw away the data.

sshine
  • 15,635
  • 1
  • 41
  • 66

2 Answers2

8

This is tricky. Unfortunately, seeking isn't directly supported. Moreover, file positions are only transparent for binary files, i.e., those that you have opened with the BinIO structure [1]. For this structure, the corresponding type BinIO.StreamIO.pos is defined to be Position.int, which is some integer type.

However, in an SML system that supports the complete I/O stack from the standard you should be able to synthesise the following seek function using the lower I/O layers:

(* seekIn : BinIO.instream * Position.int -> unit *)

fun seekIn(instream, pos) =
    case BinIO.StreamIO.getReader(BinIO.getInstream instream) of
      (reader as BinPrimIO.RD{setPos = SOME f, ...}, _) =>
        ( f pos;
          BinIO.setInstream(instream,
            BinIO.StreamIO.mkInstream(reader, Word8Vector.fromList[]))
        )
    | (BinPrimIO.RD{name, ...}, _) =>
        raise IO.Io{
          name = name,
          function = "seekIn",
          cause = IO.RandomAccessNotSupported
        }

Use it like:

val file = BinIO.openIn "filename"
val _    = seekIn(file, 200)
val bin  = BinIO.inputN(file, 1000)

If you need to convert from Word8Vector to string:

val s = Byte.bytesToString bin

You can do the equivalent for out streams as well.

[1] http://standardml.org/Basis/bin-io.html#BIN_IO:SIG:SPEC

Andreas Rossberg
  • 34,518
  • 3
  • 61
  • 72
  • 1
    Nice answer! But I just wondered... If I am going to read a file essentially in a random fashion (seeking relatively long distances and then reading small chunks), is traversing the whole IO stack (imperative/stream/primitive IO) back and forth really worth it, or perhaps it would be a better idea to simply use the `BinPrimIO.reader` directly? – isekaijin May 12 '13 at 01:55
  • 1
    @EduardoLeón, I don't see any particular advantage in using the low-level interface directly. The high-level one is both more convenient and more efficient (buffering and all). – Andreas Rossberg May 12 '13 at 07:15
5

If you can manage to get hold of the reader/writer, then they should have getPos, setPos and endPos functions, depending on which kind of reader/writer you are dealing with.

Jesper.Reenberg
  • 5,944
  • 23
  • 31