This repository contains the code of our CSC2515 research project "Learning to Play Go". 📜Report
In the project, we implement an agent that learns to play Go on
Our model vs baselines: Top row - our model plays black; Bottom row - our model plays white
You need to install following packages in order to run the code:
- PyTorch>=1.3.1
- tqdm
- dill
- fire
Unzip the sgf data.
unzip -q sgf.zip
python train_behaviour_cloning.py --steps=1280000
Several parameters can be specified:
Parameter | Default | |
---|---|---|
data_dir | "sgf" | Path to the data directory. |
steps | 400000 | Training steps. |
verbose_step | 1000 | Print verbose. |
batch_size | 2048 | Batch size. Recommend to be at least 128. |
learning_rate | 1e-3 | Learning rate. |
With the model that just trained, we can now deploy it with GTP to play with other Go programes.
We can start our pre-trained engine by:
chmod 777 dlgo.py
./dlgo.py --weights weights-name --playouts 1600 --resign-threshold 0.25
Several parameters can be specified:
Parameter | Default | |
---|---|---|
weights | Path to the pre-trained model. | |
playouts | 400 | MCTS playouts, |
resign-threshold | 0.1 | If the possibility of win is smaller than the specified value, then resign. |
You can interactively play it though GTP, e.g. genmove black
will generate a move for black.
Since the code supported GTP, it can be intergrated with other Go GUI like Sabaki, an elegant Go board and SGF editor. Go engines can be added to Sabaki to play offline. Sabaki then acts as a graphical UI for any Go software that supports GTP (Go Text Protocol).
Step 1: First we need to run selected some game position to run MCTS. It will take about 10 minutes, to dump 100K game positions.
python dump_position_from_datasets.py
Step 2: Run MCTS on these selected positions, this step is very slow. If you want to run it on distributed, you need to install horovod,
python -m horovod.runner.launch -np 16 python dump_self_play_results.py
It takes ~24 hours in a 2 1080ti machines to dump the results.
Step 3: Run refinement on the results. This step takes about 4 hours.
python train_self_play.py
This project was originally built based on CGLemon/pyDLGO and include codes from the following open-source projects:
- CGLemon/pyDLGO
mcts.py
: Monte Carlo Tree Search implementationgtp.py
: Go Text Protocol (GTP) support for playing against other Go programs
- (https://github.com/ymgaq/Pyaq)
board.py
: Go board representation and operations
- jtauber/sgf
sgf.py
: Smart Game Format (SGF) parser for reading game records