Model
Performance of gemini-2.5-flash-thinking across tasks.
| Task | Attempt ID | Cost | Duration | Status / Error |
|---|---|---|---|---|
| coreutils-old-version-alpine | 38us456gg97t1 | $0.165 | 3m8s | Failure task failed: sha1sum binary does not exist |
| coreutils-old-version-alpine | 3kvv58u1s2o7s | $0.077 | 2m26s | Failure task failed: sha1sum binary does not exist |
| coreutils-old-version-alpine | gnjbi1l73fyfg | $1.484 | 22m41s | Failure task failed: sha1sum binary does not exist |
| coreutils-old-version | 02ofxvn1wnw8g | $0.034 | 2m24s | Failure task failed: df missing at /home/peter/result/df or not executable |
| coreutils-old-version | asuvr8gm3mxk1 | $0.132 | 19m5s | Failure context timeout: context deadline exceeded |
| coreutils-old-version | w2c8hsvl88jiw | $0.089 | 19m15s | Failure context timeout: context deadline exceeded |
| coreutils-static-alpine | 4lo9tks6n9tch | $0.009 | 1m15s | Success |
| coreutils-static-alpine | oiv05619mgyhw | $0.016 | 2m51s | Success |
| coreutils-static-alpine | qfi89k9nn1x5x | $0.120 | 14m55s | Failure LLM call failed: context deadline exceeded |
| coreutils-static | 8894aqd63cabz | $0.014 | 1m55s | Success |
| coreutils-static | cy7ku1xwju0xr | $0.009 | 2m5s | Success |
| coreutils-static | lh8iwss0pb2sj | $0.011 | 2m0s | Success |
| coreutils | 0jjoczrhw7t0d | $0.011 | 1m9s | Success |
| coreutils | 3vzn142dc7wzu | $0.011 | 1m7s | Success |
| coreutils | 9ughwpxv3ykiu | $0.010 | 1m8s | Success |
| cowsay | anorokmywf1rs | $0.006 | 21s | Success |
| cowsay | mp2isanv6lc8r | $0.004 | 17s | Success |
| cowsay | xbbddb0e2x7xm | $0.005 | 13s | Success |
| curl-ssl-arm64-static | nkb5mdzsrc5vx | $0.014 | 51s | Failure task failed: curl binary does not exist |
| curl-ssl-arm64-static | w2srvvch00wqt | $0.341 | 12m54s | Failure task failed: curl binary does not exist |
| curl-ssl-arm64-static | wqy7tm1gg0fvf | $0.204 | 7m29s | Failure task failed: curl-arm64 is not statically linked |
| curl-ssl-arm64-static2 | 8irl6e3lohykm | $1.167 | 28m52s | Failure exceeded max tool calls (150) |
| curl-ssl-arm64-static2 | oazj2i1mai0j1 | $0.051 | 1m52s | Failure task failed: curl binary does not exist |
| curl-ssl-arm64-static2 | v7aq6aahnzty3 | $1.412 | 39m18s | Failure exceeded max tool calls (150) |
| curl-ssl | 3syscgm1iscvm | $0.016 | 52s | Success |
| curl-ssl | ewmd3jtebe524 | $0.016 | 58s | Success |
| curl-ssl | vbxyokowq5aoy | $0.012 | 1m58s | Success |
| curl | 4mbumgeh26inc | $0.024 | 1m16s | Success |
| curl | ecp5uv5wyqr2r | $0.006 | 35s | Failure task failed: curl did not download the expected local file content, but instead: curl: (1) Protocol "file" not supported |
| curl | vw1lcqlbfzuhu | $0.005 | 31s | Success |
| jq-static-musl | jhh3934iqs1g8 | $0.126 | 3m46s | Failure task failed: jq is not statically linked |
| jq-static-musl | vd21pck4x3p21 | $0.297 | 5m48s | Failure task failed: jq binary does not exist |
| jq-static-musl | vwe53jv3bfcm4 | $0.165 | 4m33s | Success |
| jq-static | bdgx1d733wai9 | $0.019 | 1m4s | Failure task failed: jq is not statically linked |
| jq-static | hvj8b33c3neoc | $0.128 | 4m58s | Failure exceeded max tool calls (50) |
| jq-static | oh6v47v6mo2jl | $0.017 | 1m4s | Failure task failed: jq binary does not exist |
| jq-windows | 88vh2uc0rk21g | $0.011 | 54s | Failure task failed: jq.exe binary does not exist |
| jq-windows | 962o3wdtyixry | $0.015 | 1m9s | Failure task failed: jq help does not contain expected string |
| jq-windows | ulm2adrozbvpo | $0.014 | 55s | Failure task failed: jq help does not contain expected string |
| jq-windows2 | 61inpuunyal2v | $0.030 | 2m20s | Failure task failed: jq.exe binary does not exist |
| jq-windows2 | 8qz0u1ruhvp3l | $0.067 | 3m35s | Failure task failed: jq help does not contain expected string |
| jq-windows2 | dw05qdcz6mafj | $0.020 | 1m25s | Failure task failed: jq.exe binary does not exist |
| jq | 338s4eah51lk2 | $0.007 | 44s | Success |
| jq | 74t14ujo90avy | $0.006 | 42s | Success |
| jq | e6g5iw6pk03kh | $0.010 | 48s | Success |