4

How to efficiently generate a list of million random elements in scheme? The following code hits maximum recursion depth with 0.1 million itself.

(unfold (lambda(x)(= x 1000000)) (lambda(x)(random 1000)) (lambda(x)(+ x 1)) 0)
Fakrudeen
  • 5,778
  • 7
  • 44
  • 70

6 Answers6

6

It really depends on the system you're using, but here's a common way to do that in plain scheme:

(let loop ([n 1000000] [r '()])
  (if (zero? n)
    r
    (loop (- n 1) (cons (random 1000) r))))

One note about running this code as is: if you just type it into a REPL, it will lead to printing the resulting list, and that will usually involve using much more memory than the list holds. So it's better to do something like

(define l ...same...)

There are many other tools that can be used to varying degrees of convenience. unfold is one of them, and another is for loops as can be found in PLT Scheme:

(for/list ([i (in-range 1000000)]) (random 1000))
Eli Barzilay
  • 29,301
  • 3
  • 67
  • 110
4

I don't know much scheme but couldn't you just use tail-recursion (which is really just looping) instead of unfold (or any other higher-order function)?

PeterM
  • 1,188
  • 1
  • 10
  • 15
2

Taking Chicken-Scheme as implementation, here is a try with some results.

(use srfi-1)
(use extras)

(time (unfold (lambda(x)(= x 1000000))
              (lambda(x)(random 1000))
              (lambda(x)(+ x 1)) 0))

(time (let loop ([n 1000000] [r '()])
        (if (zero? n)
            r
            (loop (- n 1) (cons (random 1000) r)))))

(define (range min max body)
  (let loop ((current min) (ret '()))
    (if (= current max)
      ret
      (loop (+ current 1) (cons (body current ret) ret)))))

(time (range 0 1000000 (lambda params (random 1000))))

The results are here with csc -O3 t.scm

0.331s CPU time, 0.17s GC time (major), 12/660 GCs (major/minor)
0.107s CPU time, 0.02s GC time (major), 1/290 GCs (major/minor)
0.124s CPU time, 0.022s GC time (major), 1/320 GCs (major/minor)

As you can see, the version of the author is much more slowlier than using plain tail recursive calls. It's hard to say why the unfold call is much more slowlier but I'd guess that it's because it taking a lot more time doing function calls.

The 2 other versions are quite similar. My version is almost the same thing with the exception that I'm creating a high order function that can be reused.

Unlike the plain loop, it could be reused to create a range of function. The position and current list is sent to the function in case they are needed.

The higher order version is probably the best way to do even if it takes a bit more time to execute. It is probably also because of the function calls. It could be optimized by removing parameters and it will get almost as fast as the named let.

The advantage of the higher order version is that the user doesn't have to write the loop itself and can be used with an abstract lambda function.

Edit

Looking at this specific case. Ef we are to create a million of element ranged between 0 and 999, we could possibly create a fixed length vector of a million and with values from 0 to 999 in it. Shuffle the thing back after. Then the whole random process would depend on the shuffle function which should not have to create new memory swapping values might get faster than generating random numbers. That said, the shuffle method somewhat still rely on random.

Edit 2

Unless you really need a list, you could get away with a vector instead.

Here is my second implementation with vector-map

(time (vector-map (lambda (x y) (random 1000)) (make-vector 1000000)))
# 0.07s CPU time, 0/262 GCs (major/minor)

As you can see, it is terribly faster than using a list.

Edit 3 fun

(define-syntax bigint
  (er-macro-transformer
    (lambda (exp rename compare)
      (let ((lst (map (lambda (x) (random 1000)) (make-list (cadr exp)))))
        (cons 'list lst)))))

100000
0.004s CPU time, 0/8888 GCs (major/minor)

It's probably not a good idea to use this but I felt it might be interesting. Since it's a macro, it will get executed at compile time. The compile time will be huge, but as you can see, the speed improvement is also huge. Unfortunately using chicken, I couldn't get it to build a list of a million. My guess is that the type it might use to build the list is overflowing and accessing invalid memory.

To answer the question in the comments:

I'm not a Scheme professional. I'm pretty new to it too and as I understand, the named loop or the high order function should be the way to go. The high order function is good because it's reusable. You could define a

(define (make-random-list quantity maxran)
 ...)

Then thats the other interesting part, since scheme is all about high order functions. You could then replace the implementation of make-random-list with anything you like. If you need some compile time execution, define the macro otherwise use a function. All that really matters is to be able to reuse it. It has to be fast and not use memory.

Common sense tells you that doing less execution it will be faster, tail recursive calls aren't suppose to consume memory. And when you're not sure, you can hide implementation into a function that can be optimized later.

Loïc Faure-Lacroix
  • 13,220
  • 6
  • 67
  • 99
  • +1 - Thanks - I want to know exactly what professional scheme developers use to generate a random number list. [I am a professional dev but not in scheme.] – Fakrudeen Nov 08 '13 at 07:42
2

Use the do-loop-construct as described here.

Community
  • 1
  • 1
amit kumar
  • 20,438
  • 23
  • 90
  • 126
  • Thanks! Please see my comment to Pessimist. – Fakrudeen Mar 13 '10 at 10:51
  • It would work. Not sure about "most efficient" but will be reasonably efficient. I mean what is simpler than a loop? – amit kumar Mar 13 '10 at 13:33
  • 1
    Function calls: You don't need to add loops to your language if you have `apply`. Scheme's `do` is added as a concession to usability but I believe is still defined in terms of tail-recursive calls. So, loops: efficient but not idiomatic. – Nathan Shively-Sanders Mar 13 '10 at 14:20
2

Some one correct me if I am wrong but the Fakrudeen's code should end up being optimized away since it is tail recursive. Or it should be with a proper implementation of unfold. It should never reach a maximum recursion depth.

What version of scheme are you using Fakrudeen? DrScheme does not choke on a mere million random numbers.

Davorak
  • 7,362
  • 1
  • 38
  • 48
1

MIT Scheme limits a computation's stack. Given the size of your problem, you are likely running out of stack size. Fortunately, you can provide a command-line option to change the stack size. Try:

$ mit-scheme --stack <number-of-1024-word-blocks>

There are other command-line options, check out mit-scheme --help

Note that MIT Scheme, in my experience, is one of the few schemes that has a limited stack size. This explains why trying your code in others Schemes will often succeed.

As to your question of efficiency. The routine unfold is probably not implemented with a tail-recursive/iterative algorithm. Here is a tail recursive version with a tail recursive version of 'list reverse in-place':

(define (unfold stop value incr n0)
  (let collecting ((n n0) (l '()))
    (if (stop n)
        (reverse! l)
        (collecting (incr n) (cons (value n) l)))))

(define (reverse! list)
  (let reving ((list list) (rslt '()))
    (if (null? list)
        rslt
        (let ((rest (cdr list)))
          (set-cdr! list rslt)
          (reving rest list)))))

Note:

$ mit-scheme --version
MIT/GNU Scheme microcode 15.3
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even
for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Image saved on Tuesday November 8, 2011 at 10:45:46 PM
  Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116

Moriturus te saluto.
GoZoner
  • 67,920
  • 20
  • 95
  • 145