|
2 | 2 |
|
3 | 3 | This repository contains code for the [_PassGAN: A Deep Learning Approach for Password Guessing_](https://arxiv.org/abs/1709.00440) paper.
|
4 | 4 |
|
| 5 | +The model from PassGAN is taken from [_Improved Training of Wasserstein GANs_](https://arxiv.org/abs/1704.00028) and it is assumed that the authors of PassGAN used the [improved_wgan_training](https://github.com/igul222/improved_wgan_training) tensorflow implementation in their work. For this reason, I have modified that reference implementation in this repository to make it easy to train (`train.py`) and sample (`sample.py`) from. This repo contributes: |
| 6 | + |
| 7 | +- A command-line interface |
| 8 | +- A pretrained PassGAN models trained on the RockYou dataset |
| 9 | + |
| 10 | +## Getting Started |
| 11 | + |
| 12 | +```bash |
| 13 | +# requires CUDA to be pre-installed |
| 14 | +pip install -r requirements.txt |
| 15 | +``` |
| 16 | + |
| 17 | +### Generating password samples |
| 18 | + |
| 19 | +Use the pretrained model to generate 1,000,000 passwords, saving them to `gen_passwords.txt`. |
| 20 | + |
| 21 | +```bash |
| 22 | +python sample.py \ |
| 23 | + --input-dir pretrained \ |
| 24 | + --checkpoint pretrained/checkpoints/195000.ckpt \ |
| 25 | + --output gen_passwords.txt \ |
| 26 | + --batch-size 1024 \ |
| 27 | + --num-samples 1000000 |
| 28 | +``` |
| 29 | + |
| 30 | +### Training your own models |
| 31 | + |
| 32 | +Training a model on a large dataset (100MB+) can take several hours on a GTX 1080. |
| 33 | + |
| 34 | +```bash |
| 35 | +# download the rockyou training data |
| 36 | +# contains 80% of the full rockyou passwords (with repeats) |
| 37 | +# that are 10 characters or less |
| 38 | +curl -L -o data/train.txt https://github.com/brannondorsey/PassGAN/releases/download/data/rockyou-train.txt |
| 39 | + |
| 40 | +# train for 200000 iterations, saving checkpoints every 5000 |
| 41 | +# uses the default hyperparameters from the paper |
| 42 | +python train.py --output-dir output --training-data data/train.txt |
| 43 | +``` |
| 44 | + |
| 45 | +You are encouraged to train using your own password leaks and datasets. Some great places to find those include: |
| 46 | + |
| 47 | +- [LinkedIn leak](https://hashes.org/download.php?hashlistId=68&type=hfound)(2.9GB, direct download) |
| 48 | +- [Exploit.in torrent](https://thepiratebay.org/torrent/16016494/exploit.in) (10GB+, 800 million accounts. Infamous!) |
| 49 | +- [Hashes.org](https://hashes.org/leaks.php): a shared password recovery site. |
0 commit comments