it seems to me from the code that the dataset is not generated on the fly as the original paper suggests. is there a reason for it?