Deep RL Bootcamp Lecture 8 Derivative Free Methods-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

Deep RL Bootcamp Lecture 8 Derivative Free Methods

阅读量：4316 次

发布时间：2019-06-06

本文共 806 字，大约阅读时间需要 2 分钟。

you wouldn't try to explore any problem structure in DFO

low dimension policy

30 degrees of freedom

120 paramaters to tune

keep the positive results in a smooth way.

How does evolutionary method work well in high dimensional setting?

If you normalize the data well, evolutionary method could work well in MOJOCO, with random search.

Could always only get stuck at local minima.

humanoid 200k parameters need to be tuned, and it's learnt by evolutionary method.

The four videos are actually four different local minima, and once you get stuck on it, it can never get out of it.

evolutionary method is roughly 10 times worse than action space policy gradient.

evolutionary method is hard to tune because previously people didn't get it to work with deep net

转载于:https://www.cnblogs.com/ecoflex/p/8979721.html

你可能感兴趣的文章

文献笔记（一）

Linux(CentOS6.5)下修改Nginx初始化配置

windows 重写调试输出

反向代理服务器（Reverse Proxy）

[bzoj2783][JLOI2012]树_树的遍历

2018.10.20 bzoj1068: [SCOI2007]压缩（区间dp）

Perl的IO操作(2)：更多文件句柄模式

由拖库攻击谈口令字段的加密策略

Alpha 冲刺（4/10）

并发编程之线程池进程池

初始化 Flask 虚拟环境命令

脚本简介jQuery微信开放平台注册表单

将PHP数组输出为HTML表格

Java中的线程Thread方法之---suspend()和resume() 分类： ...

经典排序算法回顾：选择排序，快速排序

BZOJ2213 [Poi2011]Difference 【乱搞】

c# 对加密的MP4文件进行解密

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2024-10-06 09:21:25 当前IP: 18.221.254.61 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我