-
Notifications
You must be signed in to change notification settings - Fork 15
Debug the Miniwob example #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NicolasAG
wants to merge
73
commits into
main
Choose a base branch
from
debug_miniwob
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
bb00d91
add README
NicolasAG dc81770
increase env session inactivity timout
NicolasAG e60d4c1
update readme
NicolasAG f9e45c2
move miniwob to domains/
NicolasAG 8cdbd06
fix
NicolasAG 5510982
fix path
NicolasAG 07e858c
return RuntimeError instead of HTTPException because not pickable
NicolasAG 5e56896
add env_call_timeout
NicolasAG c06b768
update gpu fractions
NicolasAG b1ad285
set kl coef to 0
NicolasAG 6bbe977
Merge remote-tracking branch 'origin/main' into debug_miniwob
NicolasAG c8ac64d
update max seq len
NicolasAG b87a6d1
revert to json instead of tool use agent
NicolasAG 824d841
update README
NicolasAG 8d170ec
debug overflow counter
NicolasAG 21a1b2a
fix prompts
NicolasAG 05b6794
update readme
NicolasAG ef6b2b0
flag tape as invalid instead of raising http errors
NicolasAG 0abc2b0
use redis
NicolasAG d3f6889
track task names instead of data splits
NicolasAG 9c319e3
fix
NicolasAG 92c8a93
remove unused var in new tapeagent remote_env
NicolasAG edf4d00
use BaseMetrics
NicolasAG 28749e0
fix
NicolasAG a4f9f79
keep track of time taken
NicolasAG 8a6120f
send per step times to wandb
ollmer d1d1836
Merge remote-tracking branch 'origin/main' into debug_miniwob
NicolasAG 5eb3a4e
use all miniwob tasks
NicolasAG 75d3c9c
default save checkpoints
NicolasAG 6b97c7b
update vllm max tokens
NicolasAG d3cf30b
assert group size is as expected
NicolasAG 4c50f1f
assert finetuning length is as much as vllm max length
NicolasAG ff61d73
update finetuning & vllm max lengths
NicolasAG a00e6e6
debug agent
NicolasAG 6f149c8
use ppo & upd config
NicolasAG 2ae2dd8
update readme
NicolasAG 913c8e2
stop training after 1k steps
NicolasAG 402eeb2
scale up env servers by llm_servers
NicolasAG 58f31cc
reweight actor/trainer
NicolasAG 4101d77
add massimo miniwob split
NicolasAG b00e476
cleanup
NicolasAG 0b56125
update agent reflection node
NicolasAG 9b0a74c
towards massimo setup
NicolasAG e6e735d
Merge remote-tracking branch 'origin/main' into debug_miniwob
NicolasAG ef46f39
upd configs
NicolasAG 1274748
upd
NicolasAG b16d45c
revert reward calculation
NicolasAG 9e61c35
update massimo cfg to grpo
NicolasAG ef884f2
test with ppo
NicolasAG 537ec7a
update configs
NicolasAG 7a4e73f
add retry mechanism for agent loop
NicolasAG 42e811e
add 30min timeout to rollout function
NicolasAG a4e8f5f
upd configs
NicolasAG 95b735b
upd
NicolasAG 8616303
upd configs
NicolasAG 923cf6a
reduce n_env
NicolasAG 44a033f
boost preprocess power
NicolasAG 2918d1f
pop old data
NicolasAG dacaa1f
do not save playwright traces & screenshots
NicolasAG fcee5ee
return empty aggregate stats if empty stats
NicolasAG 631389f
increase preprocessor power
NicolasAG f791211
better error handling
NicolasAG c54d900
fix
NicolasAG ea4918a
reduce timeouts
NicolasAG e5fca10
log number of groups done so far
NicolasAG df66a88
log everything if populate_rl_data fails
NicolasAG c8d0171
monitor env servers and reset if needed
NicolasAG 981cd85
better health message
NicolasAG 9c755ed
small fix
NicolasAG 0b8a24d
better logs
NicolasAG cd27e30
always check the worker before launching the agent on it + more detai…
NicolasAG f9ce99e
log stack trace
NicolasAG 60fb042
small cleanup
NicolasAG File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
defaults: | ||
- miniwob | ||
- override finetune: grpo | ||
- _self_ | ||
|
||
finetune: | ||
seq_length: 16384 # input + output tokens | ||
max_train_steps: 1000 # 1000 optim steps = 1000 * bs samples | ||
train_batch_size: 1 | ||
gradient_accumulation_passes: 1024 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
defaults: | ||
- miniwob_grpo | ||
- _self_ | ||
|
||
train_dataset_names: | ||
- massimo_train | ||
test_dataset_names: | ||
- massimo_test | ||
|
||
reward_computation: massimo | ||
|
||
finetune: | ||
gradient_accumulation_passes: 512 | ||
|
||
eval_every_n_versions: 5120 # 512 effective bs * 10 "optim steps" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
defaults: | ||
- miniwob | ||
- _self_ | ||
|
||
train_dataset_names: | ||
- massimo_train | ||
test_dataset_names: | ||
- massimo_test | ||
|
||
reward_computation: massimo | ||
|
||
finetune: | ||
gradient_accumulation_passes: 512 | ||
|
||
eval_every_n_versions: 5120 # 512 effective bs * 10 "optim steps" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Miniwob example | ||
|
||
## Prerequesites | ||
|
||
### TapeAgents | ||
|
||
Clone [TapeAgents](https://github.com/ServiceNow/TapeAgents/) in your parent folder and install it. | ||
```bash | ||
cd .. | ||
git clone [email protected]:ServiceNow/TapeAgents.git | ||
cd TapeAgents | ||
pip install -e . | ||
pip install 'tapeagents[finetune,converters]=0.1.12' | ||
cd ../PipelineRL | ||
``` | ||
|
||
Make sure to add the TapeAgent folder to your python path. | ||
```bash | ||
export PYTHONPATH="/path/to/TapeAgents:$PYTHONPATH" | ||
``` | ||
|
||
### Miniwob | ||
|
||
see setup here: https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/miniwob/README.md | ||
|
||
### Playwright | ||
|
||
The environment server will need to have playwright installed. | ||
|
||
`playwright install` | ||
|
||
## Launch Command | ||
|
||
`python -m pipelinerl.launch --config-name miniwob environment.miniwob_url=file:///PATH/TO/miniwob-plusplus/miniwob/html/miniwob/` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.