Skip to content

Commit 6161da9

Browse files
committed
Add README
1 parent 042eaf2 commit 6161da9

File tree

5 files changed

+48
-2
lines changed

5 files changed

+48
-2
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
experiments/
2+
venv/
23
data/*.txt
34
*.pyc
45
*.txt

README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,48 @@
22

33
This repository contains code for the [_PassGAN: A Deep Learning Approach for Password Guessing_](https://arxiv.org/abs/1709.00440) paper.
44

5+
The model from PassGAN is taken from [_Improved Training of Wasserstein GANs_](https://arxiv.org/abs/1704.00028) and it is assumed that the authors of PassGAN used the [improved_wgan_training](https://github.com/igul222/improved_wgan_training) tensorflow implementation in their work. For this reason, I have modified that reference implementation in this repository to make it easy to train (`train.py`) and sample (`sample.py`) from. This repo contributes:
6+
7+
- A command-line interface
8+
- A pretrained PassGAN models trained on the RockYou dataset
9+
10+
## Getting Started
11+
12+
```bash
13+
# requires CUDA to be pre-installed
14+
pip install -r requirements.txt
15+
```
16+
17+
### Generating password samples
18+
19+
Use the pretrained model to generate 1,000,000 passwords, saving them to `gen_passwords.txt`.
20+
21+
```bash
22+
python sample.py \
23+
--input-dir pretrained \
24+
--checkpoint pretrained/checkpoints/195000.ckpt \
25+
--output gen_passwords.txt \
26+
--batch-size 1024 \
27+
--num-samples 1000000
28+
```
29+
30+
### Training your own models
31+
32+
Training a model on a large dataset (100MB+) can take several hours on a GTX 1080.
33+
34+
```bash
35+
# download the rockyou training data
36+
# contains 80% of the full rockyou passwords (with repeats)
37+
# that are 10 characters or less
38+
curl -L -o data/train.txt https://github.com/brannondorsey/PassGAN/releases/download/data/rockyou-train.txt
39+
40+
# train for 200000 iterations, saving checkpoints every 5000
41+
# uses the default hyperparameters from the paper
42+
python train.py --output-dir output --training-data data/train.txt
43+
```
44+
45+
You are encouraged to train using your own password leaks and datasets. Some great places to find those include:
46+
47+
- [LinkedIn leak](https://hashes.org/download.php?hashlistId=68&type=hfound)(2.9GB, direct download)
48+
- [Exploit.in torrent](https://thepiratebay.org/torrent/16016494/exploit.in) (10GB+, 800 million accounts. Infamous!)
49+
- [Hashes.org](https://hashes.org/leaks.php): a shared password recovery site.

data/.gitkeep

Whitespace-only changes.

sample.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def save(samples):
113113
save(samples)
114114
samples = [] # flush
115115

116-
print('wrote {} samples to {} in {:.2f} seconds. {} total.'.format(1000 * args.batch_size, 'samples.txt', time.time() - then, i * args.batch_size))
116+
print('wrote {} samples to {} in {:.2f} seconds. {} total.'.format(1000 * args.batch_size, args.output, time.time() - then, i * args.batch_size))
117117
then = time.time()
118118

119119
save(samples)

train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def parse_args():
2020
parser.add_argument('--training-data', '-i',
2121
default='data/train.txt',
2222
dest='training_data',
23-
help='Path to training data file (one password per line) (default: data/train.py)')
23+
help='Path to training data file (one password per line) (default: data/train.txt)')
2424

2525
parser.add_argument('--output-dir', '-o',
2626
required=True,

0 commit comments

Comments
 (0)