Skip to content

SFT reproduction issues #18

@anshulv12

Description

@anshulv12

Hello,

Thank you for your great work on Android Lab! I’ve been working on replicating your fine-tuning results for GLM4-9B and Llama-3.1-8B, following the configurations outlined in Issue #15. While the fine-tuning process seemed to proceed smoothly, I encountered some challenges during evaluation.

Initially when I ran the evaluation, the model produced no output, or a very unstructured output with many actions chained together, which I suspect was due to the absence of an assistant tag appended to the model input. After manually adding the assistant tags, I was able to generate responses, but I still ran into some errors. Some cases I have encountered: the model outputs all actions needed to complete a task in one go, rather than providing a single action, the model immediately returns a “finish” action, pausing further interaction, and lastly the model navigates to the correct screen but cannot extract the correct info to answer the user task. I’m wondering if there might be a configuration detail or evaluation setting that could help me align with your results. I’d greatly appreciate any advice you might have.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions