博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Deep RL Bootcamp Lecture 8 Derivative Free Methods
阅读量:4316 次
发布时间:2019-06-06

本文共 806 字,大约阅读时间需要 2 分钟。

 

 

 

 

 

 

 

you wouldn't try to explore any problem structure in DFO

 

 

 

 

 

 

 

 

 

 

 

 

 

low dimension policy

 

 

 

 

 

30 degrees of freedom

120 paramaters to tune 

 

 

 

 

 

 

 

 

 

 

 

 

keep the positive results in a smooth way.

 

 

 

 

 

 

 

 

How does evolutionary method work well in high dimensional setting?

If you normalize the data well, evolutionary method could work well in MOJOCO, with random search. 

 

 

Could always only get stuck at local minima.

 

humanoid 200k parameters need to be tuned, and it's learnt by evolutionary method.

The four videos are actually four different local minima, and once you get stuck on it, it can never get out of it.

 

 

 

 

evolutionary method is roughly 10 times worse than action space policy gradient.

evolutionary method is hard to tune because previously people didn't get it to work with deep net

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

转载于:https://www.cnblogs.com/ecoflex/p/8979721.html

你可能感兴趣的文章
Jedis的使用
查看>>
文献笔记(一)
查看>>
Linux(CentOS6.5)下修改Nginx初始化配置
查看>>
windows 重写调试输出
查看>>
反向代理服务器(Reverse Proxy)
查看>>
Android全屏
查看>>
HTML 标签。
查看>>
[bzoj2783][JLOI2012]树_树的遍历
查看>>
2018.10.20 bzoj1068: [SCOI2007]压缩(区间dp)
查看>>
Perl的IO操作(2):更多文件句柄模式
查看>>
由拖库攻击谈口令字段的加密策略
查看>>
Alpha 冲刺 (4/10)
查看>>
并发编程之线程池进程池
查看>>
初始化 Flask 虚拟环境 命令
查看>>
脚本简介jQuery微信开放平台注册表单
查看>>
将PHP数组输出为HTML表格
查看>>
Java中的线程Thread方法之---suspend()和resume() 分类: ...
查看>>
经典排序算法回顾:选择排序,快速排序
查看>>
BZOJ2213 [Poi2011]Difference 【乱搞】
查看>>
c# 对加密的MP4文件进行解密
查看>>