forked from carlini/yet-another-applied-llm-benchmark
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathagg_to_text.py
84 lines (64 loc) · 1.8 KB
/
agg_to_text.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
from evaluator import *
question = """
Create a Rust CLI program that takes an asciinema dump via stdin and outputs plain text dump of the terminal state each frame to stdout. You must use the SAME frame choice algorithm that agg uses (it's source code will be attached later).
The output frame format should be as follows:
```
FRAME0_LINE0 (COL characters wide)
FRAME0_LINE1
...
FRAME0_LINEN (ROW characters tall)
----
FRAME1_LINE0
...
---- (trailing delimiter)
```
For example, suppose an asciinema dump is of a 2x2 window counting down from three. Then we output:
```
3
----
2
----
1
----
0
----
```
Use avt as your terminal emulator.
You will compile with the following Cargo.toml attached below; do not use any
other dependencies. After the Cargo.toml is the complete source code of agg,
which you should use as reference.
"""
extra_prompt = """
Use the same frame selection algorithm as agg:
- Batching events based on timing and FPS cap
- Accelerating playback based on speed parameter
- Limiting idle time between frames
Use the default settings for these parameters:
- Idle time limit (5.0 seconds)
- Playback speed (1.0)
- FPS cap (30)
"""
cargo = r"""
[package]
name = "agg_to_text"
version = "0.1.0"
edition = "2024"
[dependencies]
avt = "0.15.1"
anyhow = "1.0"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
"""
context = read_file(__file__, 'agg_to_text.context.md').decode()
test_term = read_file(__file__, 'agg_to_text.test.term')
expect = read_file(__file__, 'agg_to_text.expect.txt').decode()
TestAggToText = (
StringNode(question + cargo + context) >>
MultiShotLLMRun(
ExtractLongestCode() >> CargoRun(cargo, input=test_term),
max_iters=5,
) >>
EqualEvaluator(expect)
)
if __name__ == "__main__":
print(run_test(TestAggToText))