mirror of
https://github.com/xCyanGrizzly/DragonsStash.git
synced 2026-05-10 22:01:16 +00:00
add TG skill
This commit is contained in:
@@ -0,0 +1,46 @@
|
||||
{
|
||||
"skill_name": "tdlib-telegram",
|
||||
"iteration": 1,
|
||||
"configs": [
|
||||
{
|
||||
"name": "with_skill",
|
||||
"pass_rate": {"mean": 1.0, "stddev": 0.0},
|
||||
"tokens": {"mean": 53200, "stddev": 14800},
|
||||
"time_seconds": {"mean": 123.5, "stddev": 16.7}
|
||||
},
|
||||
{
|
||||
"name": "without_skill",
|
||||
"pass_rate": {"mean": 0.857, "stddev": 0.134},
|
||||
"tokens": {"mean": 56467, "stddev": 12100},
|
||||
"time_seconds": {"mean": 156.4, "stddev": 39.7}
|
||||
}
|
||||
],
|
||||
"delta": {
|
||||
"pass_rate": "+14.3%",
|
||||
"tokens": "-5.8%",
|
||||
"time": "-21.0%"
|
||||
},
|
||||
"evals": [
|
||||
{
|
||||
"name": "broadcast-to-all-users",
|
||||
"with_skill": {"pass_rate": 1.0, "passed": 5, "total": 5, "tokens": 35365, "time_seconds": 107.6},
|
||||
"without_skill": {"pass_rate": 0.6, "passed": 3, "total": 5, "tokens": 69214, "time_seconds": 200.2}
|
||||
},
|
||||
{
|
||||
"name": "flood-wait-during-scan",
|
||||
"with_skill": {"pass_rate": 1.0, "passed": 4, "total": 4, "tokens": 63079, "time_seconds": 140.9},
|
||||
"without_skill": {"pass_rate": 1.0, "passed": 4, "total": 4, "tokens": 45601, "time_seconds": 122.3}
|
||||
},
|
||||
{
|
||||
"name": "download-and-reupload-file",
|
||||
"with_skill": {"pass_rate": 1.0, "passed": 5, "total": 5, "tokens": 61157, "time_seconds": 122.1},
|
||||
"without_skill": {"pass_rate": 1.0, "passed": 5, "total": 5, "tokens": 54587, "time_seconds": 146.7}
|
||||
}
|
||||
],
|
||||
"analyst_notes": [
|
||||
"The skill's biggest impact was on Eval 1 (broadcast): the baseline MISSED both withFloodWait retry wrapping and inter-message delay — the two most critical patterns for avoiding rate limits during bulk sends. This is exactly the kind of bug the skill is designed to prevent.",
|
||||
"Eval 2 (FLOOD_WAIT debugging) was a near-tie. Both versions correctly diagnosed the problem and proposed adaptive backoff. The skill version was slightly more thorough: it added pagination-level retry with sleep(waitSec) instead of just re-throwing, meaning it can survive even after withFloodWait's retries are exhausted.",
|
||||
"Eval 3 (download/reupload) was also close. Both correctly composed existing primitives. The skill version was more explicit about WHY certain patterns matter (referencing the skill's documentation), which helps future maintainers understand the code.",
|
||||
"The skill version was faster on average (-21% time) and used fewer tokens (-5.8%), likely because the skill front-loaded the knowledge instead of requiring the agent to discover it by reading source files."
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user