-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hello,
Thank you for your great work on Android Lab! I’ve been working on replicating your fine-tuning results for GLM4-9B and Llama-3.1-8B, following the configurations outlined in Issue #15. While the fine-tuning process seemed to proceed smoothly, I encountered some challenges during evaluation.
Initially when I ran the evaluation, the model produced no output, or a very unstructured output with many actions chained together, which I suspect was due to the absence of an assistant tag appended to the model input. After manually adding the assistant tags, I was able to generate responses, but I still ran into some errors. Some cases I have encountered: the model outputs all actions needed to complete a task in one go, rather than providing a single action, the model immediately returns a “finish” action, pausing further interaction, and lastly the model navigates to the correct screen but cannot extract the correct info to answer the user task. I’m wondering if there might be a configuration detail or evaluation setting that could help me align with your results. I’d greatly appreciate any advice you might have.
Thanks!