Skip to content

Commit 6e1e56c

Browse files
committed
Update of readme and tests.
1 parent ce063b4 commit 6e1e56c

File tree

3 files changed

+26
-7
lines changed

3 files changed

+26
-7
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# StreamingSampling.jl
22

3-
StreamingSampling is a Julia-based proof-of-concept implementation of a streamed variant of maximum-entropy sampling ([UPmaxentropy](https://www.rdocumentation.org/packages/sampling/versions/2.11/topics/UPmaxentropy)). It is designed to process large datasets stored on disk with minimal impact on RAM. The method begins by computing first-order inclusion probabilities using a [DPP](https://dahtah.github.io/Determinantal.jl/dev/)-based heuristic, and then feeds these probabilities into the classical UPmaxentropy algorithm to produce diverse samples.
3+
StreamingSampling is a Julia-based proof-of-concept implementation of a streamed variants of maximum-entropy sampling ([UPmaxentropy](https://www.rdocumentation.org/packages/sampling/versions/2.11/topics/UPmaxentropy)) and weighted sampling. It is designed to process large datasets stored on disk with minimal impact on RAM. The method begins by computing first-order inclusion probabilities using a [DPP](https://dahtah.github.io/Determinantal.jl/dev/)-based heuristic, and then feeds these probabilities into classical sampling algorithms to produce diverse samples.
44

55
<a href="https://julia.mit.edu/StreamingSampling.jl/dev/">
66
<img alt="Development documentation" src="https://img.shields.io/badge/documentation-in%20development-orange?style=flat-square">
77
</a>
88
<a href="https://mit-license.org">
99
<img alt="MIT license" src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square">
1010
</a>
11-
<a href="https://github.com/cesmix-mit/PotentialLearning.jl/issues/new">
11+
<a href="https://github.com/JuliaLabs/StreamingSampling.jl/issues/new">
1212
<img alt="Ask us anything" src="https://img.shields.io/badge/Ask%20us-anything-1abc9c.svg?style=flat-square">
1313
</a>
1414
</a>

docs/src/index.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,18 @@ CurrentModule = StreamingSampling
44

55
# StreamingSampling.jl
66

7-
Documentation for [StreamingSampling](https://github.com/emmanuellujan/StreamingSampling.jl).
7+
StreamingSampling is a Julia-based proof-of-concept implementation of a streamed variants of maximum-entropy sampling ([UPmaxentropy](https://www.rdocumentation.org/packages/sampling/versions/2.11/topics/UPmaxentropy)) and weighted sampling. It is designed to process large datasets stored on disk with minimal impact on RAM. The method begins by computing first-order inclusion probabilities using a [DPP](https://dahtah.github.io/Determinantal.jl/dev/)-based heuristic, and then feeds these probabilities into classical sampling algorithms to produce diverse samples.
88

9-
StreamingSampling is a Julia-based proof-of-concept implementation of a streamed variant of maximum-entropy sampling ([UPmaxentropy](https://www.rdocumentation.org/packages/sampling/versions/2.11/topics/UPmaxentropy)). It is designed to process large datasets stored on disk with minimal impact on RAM. The method begins by computing first-order inclusion probabilities using a [DPP](https://dahtah.github.io/Determinantal.jl/dev/)-based heuristic, and then feeds these probabilities into the classical UPmaxentropy algorithm to produce diverse samples.
9+
<a href="https://julia.mit.edu/StreamingSampling.jl/dev/">
10+
<img alt="Development documentation" src="https://img.shields.io/badge/documentation-in%20development-orange?style=flat-square">
11+
</a>
12+
<a href="https://mit-license.org">
13+
<img alt="MIT license" src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square">
14+
</a>
15+
<a href="https://github.com/JuliaLabs/StreamingSampling.jl/issues/new">
16+
<img alt="Ask us anything" src="https://img.shields.io/badge/Ask%20us-anything-1abc9c.svg?style=flat-square">
17+
</a>
18+
</a>
19+
<br />
20+
<br />
1021

test/runtests.jl

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,18 @@ using Statistics
1212
"data/data3.txt",
1313
"data/data4.txt"]
1414
# Compute streaming weights
15-
ws = compute_weights(file_paths; chunksize=200, subchunksize=50)
15+
ws = compute_weights(file_paths;
16+
chunksize=200,
17+
subchunksize=50)
1618

1719
# Define sample size
1820
n = 1781
1921

2022
# Sample by weighted sampling
21-
inds = StatsBase.sample(1:length(ws), Weights(ws), n; replace=false)
23+
inds = StatsBase.sample(1:length(ws),
24+
Weights(ws),
25+
n;
26+
replace=false)
2227

2328
# Checks
2429
println("Checking sample size.")
@@ -32,7 +37,10 @@ end
3237
"data/data3.txt",
3338
"data/data4.txt"]
3439
# Compute streaming weights
35-
ws = compute_weights(file_paths; chunksize=500, subchunksize=100)
40+
ws = compute_weights(file_paths;
41+
chunksize=500,
42+
subchunksize=100,
43+
normalize=false)
3644

3745
# Define sample size
3846
n = 2351

0 commit comments

Comments
 (0)