The "Father of the Alpha Dog" answers the question of what has been resolved in the new version?

[October 20 news] DeepMind chief researcher, "father of AlphaGo" David Silver and Julian Schrittwieser (author of AlphaGo Zero) answered questions from users in Reddit.

According to statistics, David Silver, the father of AlphaGo, graduated from Cambridge University and received the Addison Wesley Award. He then co-founded Elixir Studios, a video game company. In 2004, he became a lecturer at University College London. Silver was originally a DeepMind. The consultants formally joined DeepMind until 2013.

The following is our featured Q&A, please visit Reddit.com

When asked why AlphaGo Zero's training is so stable?

David Silver said that the algorithm used by AlphaGo Zero is different from traditional (modelless) algorithms such as strategy gradients and Q-learning. By using AlphaGo search, we greatly improved the results of strategy and self-play, and then we applied simple, Gradient-based updates train the next policy+value network. This is much more stable than gradual, gradient-based policy improvement.

Why did Zero train for 40 days this time? What will be the training for 3 months?

David Silver believes that this is a matter of human and resource priority. If you train for 3 months, I think you will ask what will happen in 6 months of training.

Why did you choose to train AlphaGo with human match data instead of starting from 0 through self-play? Where is the previous AlphaGo bottleneck?

David Silver said that creating a completely self-learning system has always been an open question in reinforcement learning. It was very unstable before. Afterwards, we did a lot of experiments and found that AlphaGo Zero's algorithm is the most efficient.

DeepMind and Facebook started to study this topic almost at the same time. Why can you reach this level?

David Silver said that Facebook is more focused on supervised learning, and we are focusing on reinforcement learning because we believe it will eventually surpass human knowledge. Studies have shown that using supervised learning alone can achieve surprising performance, but if it goes far beyond human level, Reinforcement learning is the key.

Is AlphaGo Zero the final version of AlphaGo?

David Silver: We are no longer actively studying how to make AlphaGo stronger, but we still use it to try new ideas.

Does AlphaGo have open source plans?

David Silver: We have open source a lot of code in the past, but AlphaGo is always a complicated process, it is a very complicated code.

Background reading:

The Google subsidiary DeepMind recently released a new version of the AlphaGo program, which is capable of learning a lot of games through self-study. The system is called "AlphaGo Zero" and it uses a machine learning technology called "enhanced learning." You can learn from your own game.

In just three days, AlphaGo Zero had mastered the game of Go and had invented a better move. During this period, in addition to being told the basic rules of Go, it did not get human help. As AlphaGo Zero was constantly trained, it began to learn advanced concepts in the Go game and pick out some favorable positions and sequences.

After three days of training, the system was able to beat AlphaGo Lee, who defeated South Korea's Lee Sedol's DeepMind software last year. The winning ratio was 100 to 0 after about 40 days of training (about 29 million games since playing) ), AlphaGo Zero defeated AlphaGo Master (beating world champion Ke Jie earlier this year).

Automotive Shoes And Clothing Staple

Upholstery Industrial Staple,Shoes Staple Nail,22 Gauge 71 Series Staple,Custom Shoes Staple

Zhejiang Best Nail Industrial Co., Ltd. , https://www.beststaple.com