Faster methods for random sampling

Communications of the ACM - Tập 27 Số 7 - Trang 703-718 - 1984
Jeffrey Scott Vitter1
1Brown University, Providence, RI

Tóm tắt

Several new methods are presented for selecting n records at random without replacement from a file containing N records. Each algorithm selects the records for the sample in a sequential manner—in the same order the records appear in the file. The algorithms are online in that the records for the sample are selected iteratively with no preprocessing. The algorithms require a constant amount of space and are short and easy to implement. The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O ( n ) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations (of the form a b , for real numbers a and b) are performed during the sampling. This solves an open problem in the literature. CPU timings on a large mainframe computer indicate that Algorithm D is significantly faster than the sampling algorithms in use today.

Từ khóa


Tài liệu tham khảo

10.1145/355900.355907

10.1093/comjnl/25.1.45

10.1080/01621459.1962.10480667

10.1145/367766.368159

Kawarasaki J. and Sibuya M. Random numbers for simple random sampling without replacement. Keio Math. Sem. Rep No. 7 (1982) 1- 9. Kawarasaki J. and Sibuya M. Random numbers for simple random sampling without replacement. Keio Math. Sem. Rep No. 7 (1982) 1- 9.

Knuth D.E. The Art of Computer Programming Vol. 2 Seminumerical Algorithms. Addison-Wesley Reading MA (second edition 1981). Knuth D.E. The Art of Computer Programming Vol. 2 Seminumerical Algorithms. Addison-Wesley Reading MA (second edition 1981).

Sedgewick R. Algorithms. Addison-Wesley Reading MA (1983). Sedgewick R. Algorithms. Addison-Wesley Reading MA (1983).

10.1109/SFCS.1983.43