SFT reproduction issues

Hello,

Thank you for your great work on Android Lab! I’ve been working on replicating your fine-tuning results for GLM4-9B and Llama-3.1-8B, following the configurations outlined in [Issue #15](https://github.com/THUDM/Android-Lab/issues/15). While the fine-tuning process seemed to proceed smoothly, I encountered some challenges during evaluation.

Initially when I ran the evaluation, the model produced no output, or a very unstructured output with many actions chained together, which I suspect was due to the absence of an assistant tag appended to the model input. After manually adding the assistant tags, I was able to generate responses, but I still ran into some errors. Some cases I have encountered: the model outputs all actions needed to complete a task in one go, rather than providing a single action, the model immediately returns a “finish” action, pausing further interaction, and lastly the model navigates to the correct screen but cannot extract the correct info to answer the user task. I’m wondering if there might be a configuration detail or evaluation setting that could help me align with your results. I’d greatly appreciate any advice you might have.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SFT reproduction issues #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SFT reproduction issues #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions