Skip to content

Do we need random seed setting for DataLoader? #12

Open
doh16101 opened this issue Dec 11, 2023 · 1 comment
Open

Do we need random seed setting for DataLoader? #12

doh16101 opened this issue Dec 11, 2023 · 1 comment

Comments

@doh16101
Copy link
Collaborator

Dear Luis @lrm22005 ,

I saw in the end of this Pytorch official tutorial:
https://pytorch.org/docs/stable/notes/randomness.html

They set the random seed for DataLoader as well:

def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    numpy.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(0)

DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    worker_init_fn=seed_worker,
    generator=g,
)

According to Pytorch DataLoader documentation, the default shuffle is False.
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

So, do we need the pass the generator = g in the DataLoader to provide reproducible results?

@lrm22005
Copy link
Owner

Yes, I'm using random seed on most of the models, some models don't require to define it because the library by default is generating it, some others require it.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants