Abstract
This paper discusses the history of AI systems in artificial intelligence for playing games. The part learning and self-play in AI cover how self-play can be applied in various games which may be deterministic such as Chess, Go and Checkers or games with hidden randomness like Poker, Bridge, and Backgammon. This has led to several questions, including: “how has deep learning been successfully applied to self-play?” and “to what extent have these learning and self-play issues changed through the history of AI?” To answer these questions, research and experiments of self-play will be discussed. This will be followed by addressing to what extent machine learning is important and the techniques of machine learning for developing high quality AI programs to play games. Finally, how these advancements changed the history of AI will be addressed as well.
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service
1. Introduction
The requirement for satisfying computing artificial intelligence (AI) is perceived as necessary by game players these days as the virtual environments have become increasingly realistic. Even today, the AI of virtually all games is predicated on a finite set of actions whose sequence may be simply expected by knowledgeable players (Fabio Aiolli, 2008). Instead, behaviour of the players in a game can be classified by using the machine learning techniques to the present aim. When a game is taken into consideration its machine can process/play in two ways. One of these may be a self-play, that is when the system plays itself repeatedly. Another way is learning from opponent moves where the player and the AI only have restricted data about the game state, and it is a part of the game to guess the knowledge hidden by the opponent.
The purpose of the present paper is to look at questions pertaining to three main areas. First, to learning issues, such as the importance of machine learning in the development of AI programs for games. Second, to self-play issues, such as the extent to which self-play can be applied to games, whether AI gains expertise in learning such games through self-play, or even whether deep learning has been successfully applied in self-play. Finally, a historical perspective will be taken, adding up all the questions and sub-questions discussed throughout. A brief explanation of the history of AI and gaming will be included in this section so as to introduce the topic
1.1 History of game-playing in AI
There is a long history of games and AI. A lot of research on AI for games is related to the creation of gameplay agents with or without a learning component. Historically, this is the first and, for a long time, the only way to use AI in games. Since artificial intelligence was recognized as a field, early pioneers of computer science wrote game-playing programs because they wanted to test whether “computers could solve tasks where intelligence is needed” (Togelius, 2018). The first software that managed to master a game was developed by A.S. Douglas (Togelius, 2018).
The digital version of the Tic-Tac-Toe game was programmed by Douglas in 1952 and as part of his doctoral dissertation at Cambridge. As he used a unique computer (EDSAC) for gaming this no one outside the university could play this game. This was the first graphical computer game which was played against a human and a computer. A mechanical telephone dialler was used to communicate what a player places (either the nought or cross) (Douglas, 1952). A few years later, Arthur Samuel invented the form of machine learning, now called reinforcement learning, using a program that had learned to play checkers by playing against itself. Reinforcement learning is defined as “including algorithms from the temporal difference learning problem” (Togelius, 2018).
According to Togelius (2018), most of the early research on game-playing AI focused on classic board games such as chess, checkers, and Go, which are beneficial to work with because they are simple to model in code and the developers can emulate them very fast. Essentially, He also says that the modern computers can make millions of tricks per second using the AI technologies.
2. Learning and Self-play in AI
This section concentrates on machine learning and its techniques in AI. Self-paly in AI and some examples of self-play and their outcomes. Addressing the questions related to learning and self-play. How the new techniques improved the self-play in AI.
2.1 Machine Learning
“Machine learning usually refers to the changes in systems that perform tasks associated with artificial intelligence (AI). Such tasks involve recognition, diagnosis, planning, robot control, prediction, etc” (Nilsson, 1998).
Machine learning techniques are classified into two main categories:
Supervised Learning
Unsupervised Learning
Supervised learning has the trained datasets that is this learning trains the data on existing or know input and output whereas in unsupervised learning the data sets are unknown, and it has hidden patterns in the input data so that the output cannot be determined (Nilsson, 1998).
Figure 1 taken from (in.mathworks.com, n.d.)
Apart from the above-mentioned techniques of machine learning there are some new techniques in machine learning such as self-supervised learning, reinforcement learning, artificial neural networks, support vector machines, Decision tree learning. Self- supervised learning is still a supervised learning, but the only difference is training data is automatically labelled without any human interaction.
2.2 Self-play
Self-play refers to the artificial game system that acquires skill at playing a game from the clones of the system play instead of acquiring the skill through a human expert turning of system (assignment specification). According to many studies, self-play is not completely understood until now because performance of the system is not guaranteed. To explain this claim, this section includes a few known examples of self-play and their results.
The AlphaZero algorithm learned to play go, chess and shogi in super-human. In other words, it played against itself through reinforcement learning. The performance of AlphaZero during self-play reinforcement learning is measured by Elo scale as a function of training steps. In chess, AlphaZero defeated the stockfish after only 4 hours (300,000 steps); In Shogi, AlphaZero first ejected Elmo after 2 hours (110,000 steps); And in Go, AlphaZero beat Alpha Go Lee for the first time after 30 hours (74,000 steps). The training algorithm suggested achieving equal performance across all independent runs suggesting that the high performance of AlphaZero’s training algorithm is repeatable (Silver, et al., 2018).
Figure 2 Taken from Silver, et al., 2018.
In 1959, a program to play checkers was written by Arthur L. Samuel (see Samuel, 1959). The system learned from samples that were gathered from both self-play and human play such as “board configuration, game results” (ibid.). The system used a combination of methods to evaluate the board to determine its next moves such as a lookup table, pruned (alpha-beta) search tree, limited depth and an evaluation task consisting of manually engineered features (ibid.). After figuring out some limitations in machine learning techniques that were used earlier such as limited progress and optimization of playing strategies, he concludes that the machine techniques with some advancements can now be efficiently used than the earlier stages and can be applied to many problems.
Gerald Tesauro created a neural network (NN) named TD-Gammon that plays backgammon by using temporal difference learning (Tesauro, 1992). These networks were trained from the starting position all the way to the end of the game. These networks were also tested in actual game play against sun microsystems and the results were plotted as the graph below (ibid.). The only difference between backgammon and the other games such as chess, checkers and go is it involves the concept of randomness as it uses dice.
Figure 3 taken from (Tesauro, 1992)
OpenAI was able to defeat humans in the game of Dota2 (Rodriguez, 2018). The OpenAI team has developed a system using five-coordinated neural networks that each represent a different player. Called the OpenAI Five, the model uses cutting-edge reinforcement learning techniques such as proximal policy optimization (PPO) to master the details of a game. The ultimate goal is to defeat the world’s top professional team. The statespace of Dota2 is large, continuous and only partially observable. Dota2’s game status includes 20,000 dimensions and the team used approximately 128,000 CPU cores on the Google Cloud Platform. One of the most notable outcomes of this training was the ability of OpenAI Five Agents to maximize long-term rewards over short-term gains. Comparing this to chess’s 8x8square board with 12 different pieces (768 bits) or 2 pieces of Go’s 19×19-square board. In chess and go the complete board is usually visible to each player. In Dota2 only a small part of the full game state appears. For each of the five players – and mix all of that with 1000 – and you have a very complicated game. OpenAI is solving this game through self-play (for all facts and figures see Rodriguez, 2018).
“OpenAI Five is actually five different neural networks that coordinate through a hyperparameter called team spirit” (Jones, 2019).Team spirit is what determines whether the AI will focus on their own individual rewards (e.g. killing other heroes and collecting treasure), or on the rewards of the whole team (Jones, 2019).
According to Sweet (2018), self-play is an exciting idea because it holds the promise of not only relieving the engineer to specify a solution to a problem (as it optimizes the parameters) but also to specify the purpose. This is a higher level of autonomy than we usually consider when studying reinforcement learning. While self-play is not fully understood, it certainly works. He says that training neural network against itself doesn’t make sense and it creates roshambo problem that is when we train a neural network by playing against the best result of the previous generation: the neural network learns a strategy B to defeat strategy A and becomes a “champion”. A new neural network then learns a strategy, which defeats B. According to Sweet (2018), this should be an interesting subject to watch for many years.
Self-play allows a player to acquire great knowledge on game despite of having prior information on the game, but it depends on the machine learning technique used. According to Togelius (2018), games using reinforcement learning take a lot of time to get trained that is they should play a thousand of times to play well. So, he says that reinforcement learning is applicable when there is a plenty of learning time. Togelius (2018) also says there are many games which learned to play without any starting information on the game such as TD-gammon which uses temporal difference learning, but the performance was limited due to the lack of optimisation functions.
Reinforcement learning will be successful if temporal difference learning is integrated with function approximators. Finally, the major breakthrough came in 2015, when a paper was released by google Deep-Mind stating that they developed deep neural networks to play many different games from the classic Atari 2600 managed to train game console (Togelius, 2018). Each network was trained to play a single game, with raw pixels of the input game’s visuals, along with scores and output controller instructions and fire buttons. The method used to train deep networks is deep cue networks, essentially applied to standard Q-learning neural networks that have multiple layers (some of the layers used in the architecture were conformable). Crucially, they managed to overcome the problems associated with using the Temporal Difference experience techniques combined with neural networks by a method. Here, short sequences of gameplay are stored in different order, and re-added to the network, to break and reward long chains of similar states (Togelius, 2018).
2.3 Deep Learning
Deep learning is a subset of machine earning techniques inspired by the concept of artificial neural networks. Deep learning can be supervised, unsupervised, and reinforcement learning. The key difference between the other machine learning techniques and deep learning is that the machine learning algorithms uses more structured format of data whereas deep learning uses neural networks and is having different layers of algorithms which is known as artificial neural networks. In deep learning the input is passed through different layers in the network and no human intervention is needed. Invention of these neural networks helped artificial intelligence in many areas such as game designing, voice assistants, self-learning, etc (Brownlee, 2019).
For many years, game developers have been wary of machine learning (ML) and this has limited usage in many games. In fact, there are no major game releases that have machine learning concepts. Some attribute this notion to the fact that ML techniques are not important for the development of the game. However, there are new possibilities that can cause many game development companies to create games that will match the player’s potential rather than improving their potential by using Deep learning (Shaleynikov, n.d.).
Figure 4 taken from Andrew Ng.
According to Yossi (2019), the reason for why deep learning is so successful that it does not require handholding, or as he puts it, “Unlike machine learning, we’re not trying to understand what’s inside, (…) We’re feeding it only raw images; we’re not writing any code”. Deep learning algorithms can do their thing with a few tens of lines of code. They learn by trial and error, moving at the speed of the machine to become more and more accurate. As just one example of how far an AI can get into learning to act with supernatural accuracy, he points to AlphaGo (“The impact deep learning is having on artificial intelligence”, 2019).
AlphaGo has no idea how to play the ancient Chinese game, Go. With a power of 10 of 170 possible configurations on a board with 64 squares, Go has far more possible configurations than Chess, making it an ideal proving ground for AI. After exposing the program to 160,000 amateur games to give the idea, AlphaGo’s developers then ran the computer more and more by themselves. By the end of the process, in 2015, the Machine had defeated the European Go Champion in a 5–0 match and then lost the World Champion 4–1. Yossi said, “After training against myself for three days, it became the best in the world.” Now, he said, “There is no game that a machine cannot play better than a human being” (“The impact deep learning is having on artificial intelligence”, 2019).
According to Pollack and Blair (1996), evidence of success in backgammon learning using simple hill-climbing indicates that the reinforcement and temporal difference method used by Tesaro in TD-Gammon was non-essential to its success. The success came from the setup of co-evolutionary self-play biased by the dynamics of backgammon. TD-Gammon could be a major milestone for a kind of biological process machine learning within which the initial specification of the model is way easier than expected as a result of the training atmosphere is clearly specified and emerges as a result of co-development between a learning system is. Its training environment.
The idea of machine learning based on evolution is often thought of by Holland in the context of a pioneering genetic algorithm field. However, this work focuses on the optimization of a certain target expressed as an absolute fitness function (Fitness Function is used to optimize the parameters of neural networks with CMA-ES, this is the value of the neural network we specify). Using the idea of co-development in learning, one recognizes the difference between optimization based on absolute fitness and one based on relative fitness (Pollack & Blair,1996).
From the above examples of self-play and their outcomes it is clearly identified that new advancements in machine learning such as reinforcement learning, temporal difference learning and deep learning created a great influence on self-play in AI. Apart from these advancements there is also some contribution made to improve the already existing technique such as selfsupervised learning but, according to Abshire (2018), it is found that this technique is used in self driving to compare the driver’s interaction using a video footage. So, self-play depended more on the new techniques of machine learning. The analysis and conclusions vary for different games and AI mechanisms as they depend on the game flow, behaviour of the player, the technique used and adaptability of the game features. As the key features are different to each game the mechanism and analysis need not be necessarily same. For example backgammon cannot be implemented with alpha-beta pruning and search tress like checkers and the techniques used for developing backgammon cannot be applied to develop AlphaGo, and other games.
Even though machine learning techniques have many advantages in games and there are many breakthroughs as stated in the above sections depending on machine learning Stephenson (2018) says that machine learning applications still have major challenges in gaming. A major challenge is the lack of data to learn from it. These algorithms will model complex systems and actions, and we do not have good historical data on these complex interactions. In addition, there is a need to fool the machine learning algorithms developed for the gaming industry. They cannot break the experience of the game or the player. This means that the algorithms must be correct, but they must also be fast and uninterrupted from the player’s point of view. Anything that slows or breaks the game saves the player from drowning in the game that the game has created. That said, most major game development studios have teams researching, refining and applying AI to their games (Stephenson, 2018).
2.4 Historical Changes in AI
From the above-mentioned examples, breakthroughs and some key points it is identified that there is a huge change in games from the past to present. Earlier gaming in AI used different strategies and techniques such as branching factors and tree search whereas at present the techniques used in gaming and self-play are more advances. They can even learn from the basic level without any initial information of the game. There is also an increase in the performance with the invention of function approximators such as fitness functions. With the discovery of advanced machine learning techniques AI in gaming is in a position that selflearning is more effective and creative than learning from the human and even defeating human players in competitions.
3. Conclusion
The advancements in machine learning techniques created a great impact on gameplay in AI. Gaming in AI started with the techniques of machine learning and there are newer techniques emerging day by day. As there are some limitations with the existing machine learning techniques (limited progress, optimised playing strategies, etc.) self-play depended more on advanced machine learning techniques such as neural networks, reinforcement learning, and more. There are some breakthrough depending on machine learning techniques such as invention of first digital version of tic-tac-toe by Douglas in 1952 and absolute fitness function.
AI is often depicted to alter or complement human abilities, but rarely as a full team member, performing tasks similar to humans. As these game experiments involve machine human collaboration, they offer a glimpse of the future. In the case study of DOTA2 capturing the flag’s human players considered bots to be more collaborative than other humans, but DOTA 2 players responded mixed to their AI teammates. Some people were quite excited, saying that they felt supported and learned to play with him. A professional DOTA 2 player, teamed up with Bots about his experience (Lavanchy, 2019).
Lavanchy (2019), arises the question, should AI learn from us or continue to teach itself? Selflearning can teach AI greater efficiency and creativity without mimicking humans, but it can also make algorithms more suitable for tasks that do not involve human collaboration, such as warehousing robots. On the other hand, one could argue that it would be more intuitive to have a machine trained than humans- humans using such AI can understand why a machine did this.
As AI gets smarter, we humans get more surprised.
4. References
Abshire, C. (2018, April 6). Self-Supervised Learning: A Key to Unlocking Self-Driving Cars? Retrieved from medium.com: https://medium.com/toyota-ai-ventures/selfsupervisedlearning-a-key-to-unlocking-self-driving-cars-408b7a6fd3bd
Brownlee, J. (2019, August 16). What is Deep Learning? Retrieved from machinelearningmastery.com: https://machinelearningmastery.com/what-isdeeplearning/
David Silver, T. H. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
Douglas, A. (1952). OXO.[EDSCAC ANALOG]. Cambridge: Cambridge University.
Fabio Aiolli, C. E. (2008). Enhancing Artificial Intelligence in Games by Learning the Opponent’s Playing Style . ECS. in.mathworks.com. (n.d.). Retrieved from What Is Machine Learning? 3 things you need to know.
Jones, M. T. (2019). Machine learning and gaming.
Lavanchy, M. (2019, June 11). An AI taught itself to play a video game – for the first time, it’s beating humans. Retrieved from weforum.org: https://www.weforum.org/agenda/2019/06/an-ai-taught-itself-to-play-a-video-gamefor-the-first-time-its-beating-humans/
Ng, A. (n.d.). What data scientists should know about deep learning [poer point presentation]. Retrieved from slishare.net: https://www.slideshare.net/ExtractConf
Nilsson, N. J. (1998). INTRODUCTION TO MACHINE LEARNING.
Pollack, J. &. (1996). Coevolution of a Backgammon Player.
Rodriguez, J. (2018, July 2). The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI. Retrieved from towardsdatascience.com: https://towardsdatascience.com/the-science-behind-openai-five-that-just-producedone-ofthe-greatest-breakthrough-in-the-history-b045bcdc2b69
Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers.
Shaleynikov, A. (n.d.). hackernoon.com. Retrieved from Integrating Machine Learning into Game Development: https://hackernoon.com/integrating-machine-learning-intogamedevelopment-c5a7f31ed839
Stephenson, J. (2018, November 29). 6 Ways Machine Learning will be used in Game Development. Retrieved from loggik.com: https://www.logikk.com/articles/machinelearning-in-game-development/
Sweet, D. (2018, December 1). Self-Play. Retrieved from Hackernoon.com: https://hackernoon.com/self-play-1f69ceb06a4d
Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Practical Issues in Temporal Difference Learning.
The impact deep learning is having on artificial intelligence. (2019, June 9). Retrieved from samsungnext.com: https://samsungnext.com/whats-next/the-impact-deep-learningishaving-on-artificial-intelligence/
Togelius, G. N. (2018). Artificial Intelligence and Games.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download