·
AI & ML interests
LLMs
Recent Activity
Organizations
None yet
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-45step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-30step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-45step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-15step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-30step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-15step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-30step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-15step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-15step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-c1-15step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-c1-15step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-from-alfworld-50step-Llama-3.1-8B-Instruct-50step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-from-alfworld-50step-Llama-3.1-8B-Instruct-100step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-50step-Llama-3.1-8B-Instruct-50step
8B
•
Updated
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-50step-Llama-3.1-8B-Instruct-100step
8B
•
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-coef1.1-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_webshop-new-GRPO-coef0.9-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_webshop-new-GRPO-Llama-3.1-8B-Instruct-50step
ZHLiu627/verl_agent_webshop-new-GRPO-Llama-3.1-8B-Instruct-100step
ZHLiu627/verl_agent_webshop-new-GRPO-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-int-reward_False-Llama-3.1-8B-Instruct-2-150step
ZHLiu627/verl_agent_alfworld-GRPO-coef0.9-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_alfworld-GRPO-coef1.1-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_alfworld-GRPO-coef0.9-Llama-3.1-8B-Instruct-150step-150step
ZHLiu627/verl_agent_alfworld-GRPO-coef1.1-Llama-3.1-8B-Instruct-150step-150step
ZHLiu627/verl_agent_alfworld-GRPO-wo6-coef1.1-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_alfworld-GRPO-wo6-coef0.9-Llama-3.1-8B-Instruct-150step
ZHLiu627/verl_agent_alfworld-GRPO-wo6-Llama-3.1-8B-Instruct-50step
ZHLiu627/verl_agent_alfworld-GRPO-wo6-Llama-3.1-8B-Instruct-100step
ZHLiu627/verl_agent_alfworld-GRPO-wo6-Llama-3.1-8B-Instruct-150step