WOODS

_images/banner.png

WOODS is a project aimed at investigating the implications of Out-of-Distribution generalization problems in sequential data along with it’s possible solution. To that goal, we offer a DomainBed-like suite to test domain generalization algorithms on our WILDS-like set of sequential data benchmarks inspired from real world problems of a wide array of common modalities in modern machine learning.

Quick Installation

WOODS is still under active developpement so it is still only available by cloning the repository on your local machine.

Installing requirements

With Conda

First, have conda installed on your machine (see their installation page if that is not the case). Then create a conda environment with the following command:

conda create --name woods python=3.7

Then activate the environment with the following command:

conda activate woods

With venv

You can use the python virtual environment manager virtualenv to create a virtual environment for the project. IMPORTANT: Make sure you are using python >3.7.

virtualenv /path/to/woods/env

Then activate the virtual environment with the following command:

source /path/to/env/woods/bin/activate

Clone locally

Once you’ve created the virtual environment, clone the repository.

git clone https://github.com/jc-audet/WOODS.git
cd WOODS

Then install the requirements with the following command:

pip install -r requirements.txt

Run tests

Run the tests to make sure everything is in order. More tests are coming soon.

pytest

Downloading the data

Before running any training run, we need to make sure we have the data to train on.

Direct Preprocessed Download

The repository offers direct download to the preprocessed data which is the quickest and most efficient way to get started. To download the preprocessed data, run the download module of the woods.scripts package and specify the dataset you want to download:

python3 -m woods.scripts.download DATASET\
        --data_path ./path/to/data/directory

Source Download and Preprocess

For the sake of transparency, WOODS also offers the preprocessing scripts we took for all datasets in the preprecessing module of the woods.scripts package. You can also use the same module to download the raw data from the original source and run preprocessing yourself on it. DISCLAIMER: Some of the datasets take a long time to preprocess, especially the EEG datasets.

python3 -m woods.scripts.fetch_and_preprocess DATASET\
        --data_path ./path/to/data/directory

Datasets Info

The following table lists the available datasets and their corresponding raw and preprocessed sizes.

Datasets Modality Requires Download Preprocessed Size Raw Size
Basic_Fourier 1D Signal No - -
Spurious_Fourier 1D Signal No - -
TMNIST Video Yes, but done automatically 0.11 GB -
TCMNIST_seq Video Yes, but done automatically 0.11 GB -
TCMNSIT_step Video Yes, but done automatically 0.11 GB -
CAP EEG Yes 9.1 GB 40.1 GB
SEDFx EEG Yes 10.7 GB 8.1 GB
MI EEG Yes 3.0GB 13.5 GB
LSA64 Video Yes 0.26 GB 1.5 GB
HAR Sensor Yes 0.16 GB 3.1 GB

Running a Sweep

In WOODS, we evaluate the performance of a domain generalization algorithm by running a sweep over the hyper parameters definition space and then performing model selection on the training runs conducted during the sweep.

Running the sweep

Once we have the data, we can start running the sweep. The hparams_sweep module of the woods.scripts package provides the command line interface to create the list of jobs to run, which is then passed to the command launcher to launch all jobs. The list of jobs includes all of the necessary training runs to get the results from all trial seeds, and hyper parameter seeds for a given algorithm, dataset and test domain.

All datasets have the SWEEP_ENVS attributes that defines which test environments are included in the sweep. For example, the SWEEP_ENVS attribute for the Spurious Fourier dataset is only 1 test domain while for most real datasets SWEEP_ENVS consists of all domains.

In other words, for every combination of (algorithm, dataset, test environment) we train 20 different hyper parameter configurations on which we investigate 3 different trial seeds. This means that for every combination of (algorithm, dataset, test environment) we run 20 * 3 = 60 training runs.

python3 -m woods.scripts.hparams_sweep \
        --dataset Spurious_Fourier TCMNIST_seq \
        --objective ERM IRM \
        --save_path ./results \
        --launcher local

Here we are using the local launcher to run the jobs locally, which is the simplest launcher. We also offer other lauchers in the command_launcher module, such as slurm_launcher which is a parallel job launcher for the SLURM workload manager.

Compiling the results

Once the sweep is finished, we can compile the results. The compile_results module of the woods.scripts package provides the command line interface to compile the results. The –latex option is used to generate the latex table.

python3 -m woods.scripts.compile_results \
        --results_dir path/to/results \
        --latex

It is also possible to compile the results from multiple directories containing complementary sweeps results. This will put all of those results in the same table.

python3 -m woods.scripts.compile_results \
        --results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
        --latex

There are other mode of operation for the compile_results module, such as --mode IID which takes results from a sweep with no test environment and report the results for each test environment separately.

python3 -m woods.scripts.compile_results \
        --results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
        --mode IID

There is also --mode summary which reports the average results for every dataset of all objectives in the sweep.

python3 -m woods.scripts.compile_results \
        --results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
        --mode summary

You can also use the --mode hparams which reports the hparams of the model chosen by model selection

python3 -m woods.scripts.compile_results \
        --results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
        --mode hparams

Advanced usage

If 60 jobs is too many jobs for you available compute, or too few for you experiments you can change the number of seeds investigated, you can call the --n_hparams and --n_trials argument.

python3 -m woods.scripts.hparams_sweep \
        --dataset Spurious_Fourier TCMNIST_seq \
        --objective ERM IRM \
        --save_path ./results \
        --launcher local \
        --n_hparams 10 \
        --n_trials 1

If some of the test environment of a dataset is not of interest to you, you can specify which test environment you want to investigate using the --unique_test_env argument

python3 -m woods.scripts.hparams_sweep \
        --dataset Spurious_Fourier TCMNIST_seq \
        --objective ERM IRM \
        --save_path ./results \
        --launcher local \
        --unique_test_env 0

You can run a sweep with no test environment by specifying the --unique_test_env argument as None.

python3 -m woods.scripts.hparams_sweep \
        --dataset Spurious_Fourier TCMNIST_seq \
        --objective ERM IRM \
        --save_path ./results \
        --launcher local \
        --unique_test_env None

Adding an Algorithm

In this section, we will walk through the process of adding an algorithm to the framework.

Defining the Algorithm

We first define the algorithm by creating a new class in the objectives module. In this example we will add scaled_ERM which is simply ERM with a random scale factor between 0 and max_scale for each environment in a dataset, where max_scale is an hyperparameter of the objective.

Let’s first define the class and its int method to initialize the algorithm.

class scaled_ERM(ERM):
    """
    Scaled Empirical Risk Minimization (scaled ERM)
    """

    def __init__(self, model, dataset, loss_fn, optimizer, hparams):
        super(scaled_ERM, self).__init__(model, dataset, loss_fn, optimizer, hparams)

        self.model = model
        self.loss_fn = loss_fn
        self.optimizer = optimizer

        self.max_scale = hparams['max_scale']
        self.scaling_factor = self.max_scale * torch.rand(len(dataset.train_names)) 

We then need to define the update function, which take a minibatch of data and compute the loss and update the model according to the algorithm definition. Note here that we do not need to define the predict function, as it is already defined in the base class.

    def update(self, minibatches_device, dataset, device):

        ## Group all inputs and send to device
        all_x = torch.cat([x for x,y in minibatches_device]).to(device)
        all_y = torch.cat([y for x,y in minibatches_device]).to(device)
        
        ts = torch.tensor(dataset.PRED_TIME).to(device)
        out = self.predict(all_x, ts, device)

        ## Reshape the data so the first dimension are environments)
        out_split, labels_split = dataset.split_data(out, all_y)

        env_losses = torch.zeros(out_split.shape[0]).to(device)
        for i in range(out_split.shape[0]):
            for t_idx in range(out_split.shape[2]):     # Number of time steps
                env_losses[i] += self.scaling_factor[i] * self.loss_fn(out_split[i, :, t_idx, :], labels_split[i,:,t_idx])

        objective = env_losses.mean()

        # Back propagate
        self.optimizer.zero_grad()
        objective.backward()
        self.optimizer.step()

Adding necessary pieces

Now that our algorithm is defined, we can add it to the list of algorithms at the top of the objectives module.

OBJECTIVES = [
    'ERM',
    'IRM',
    'VREx',
    'SD',
    'ANDMask',
    'IGA',
    'scaled_ERM',
]

Before being able to use the algorithm, we need to add the hyper parameters related to this algorithm in the hyperparams module. Note: the name of the funtion needs to be the same as the name of the algorithm followed by _hyper.

def scaled_ERM_hyper(sample):
    """ scaled ERM objective hparam definition 
    
    Args:
        sample (bool): If ''True'', hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ''False'' where the default value is chosen.
    """
    if sample:
        return {
            'max_scale': lambda r: r.uniform(1.,10.)
        }
    else:
        return {
            'max_scale': lambda r: 2.
        }

Run some tests

We can now run a simple test to check that everything is working as expected

pytest

Try the algorithm

Then we can run a training run to see how the algorithm performs on any dataset

python3 -m woods.scripts.main train \
        --dataset Spurious_Fourier \
        --objective scaled_ERM \
        --test_env 0 \
        --data_path ./data

Run a sweep

Finally, we can run a sweep to see how the algorithm performs on all the datasets

python3 -m woods.scripts.hparams_sweep \
        --objective scaled_ERM \
        --dataset Spurious_Fourier \
        --data_path ./data \
        --launcher dummy

Adding a Dataset

In this section, we will walk through the process of adding an dataset to the framework.

Defining the Algorithm

We first define the dataset by creating a new class in the datasets module. In this example we will add flat_MNIST which is the MNIST dataset, but the image is fed to a sequential model pixel by pixel and the environments are different orders of the pixels.

First let’s define the dataset class and its init method.

class flat_MNIST(Multi_Domain_Dataset):
    """ Class for flat MNIST dataset

    Each sample is a sequence of 784 pixels.
    The task is to predict the digit

    Args:
        flags (argparse.Namespace): argparse of training arguments

    Note:
        The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn't in the given data_path
    """
    ## Dataset parameters
    SETUP = 'seq'
    TASK = 'classification'
    SEQ_LEN = 28*28
    PRED_TIME = [783]
    INPUT_SHAPE = [1]
    OUTPUT_SIZE = 10

    ## Environment parameters
    ENVS = ['forwards', 'backwards', 'scrambled']
    SWEEP_ENVS = list(range(len(ENVS)))

    def __init__(self, flags, training_hparams):
        super().__init__()

        if flags.test_env is not None:
            assert flags.test_env < len(self.ENVS), "Test environment chosen is not valid"
        else:
            warnings.warn("You don't have any test environment")

        # Save stuff
        self.test_env = flags.test_env
        self.class_balance = training_hparams['class_balance']
        self.batch_size = training_hparams['batch_size']

        ## Import original MNIST data
        MNIST_tfrm = transforms.Compose([ transforms.ToTensor() ])

        # Get MNIST data
        train_ds = datasets.MNIST(flags.data_path, train=True, download=True, transform=MNIST_tfrm) 
        test_ds = datasets.MNIST(flags.data_path, train=False, download=True, transform=MNIST_tfrm) 

        # Concatenate all data and labels
        MNIST_images = torch.cat((train_ds.data.float(), test_ds.data.float()))
        MNIST_labels = torch.cat((train_ds.targets, test_ds.targets))

        # Create sequences of 784 pixels
        self.TCMNIST_images = MNIST_images.reshape(-1, 28*28, 1)
        self.MNIST_labels = MNIST_labels.long().unsqueeze(1)

        # Make the color datasets
        self.train_names, self.train_loaders = [], [] 
        self.val_names, self.val_loaders = [], [] 
        for i, e in enumerate(self.ENVS):

            # Choose data subset
            images = self.TCMNIST_images[i::len(self.ENVS),...]
            labels = self.MNIST_labels[i::len(self.ENVS),...]

            # Apply environment definition
            if e == 'forwards':
                images = images
            elif e == 'backwards':
                images = torch.flip(images, dims=[1])
            elif e == 'scrambled':
                images = images[:, torch.randperm(28*28), :]

            # Make Tensor dataset and the split
            dataset = torch.utils.data.TensorDataset(images, labels)
            in_dataset, out_dataset = make_split(dataset, flags.holdout_fraction)

            if i != self.test_env:
                in_loader = InfiniteLoader(in_dataset, batch_size=training_hparams['batch_size'])
                self.train_names.append(str(e) + '_in')
                self.train_loaders.append(in_loader)
            
            fast_in_loader = torch.utils.data.DataLoader(in_dataset, batch_size=64, shuffle=False, num_workers=self.N_WORKERS, pin_memory=True)
            self.val_names.append(str(e) + '_in')
            self.val_loaders.append(fast_in_loader)
            fast_out_loader = torch.utils.data.DataLoader(out_dataset, batch_size=64, shuffle=False, num_workers=self.N_WORKERS, pin_memory=True)
            self.val_names.append(str(e) + '_out')
            self.val_loaders.append(fast_out_loader)

        # Define loss function
        self.log_prob = nn.LogSoftmax(dim=1)
        self.loss = nn.NLLLoss(weight=self.get_class_weight().to(training_hparams['device']))

Note: you are required to define the following variables: * SETUP * SEQ_LEN * PRED_TIME * INPUT_SHAPE * OUTPUT_SIZE * ENVS * SWEEP_ENVS you are also encouraged to redefine the following variables: * N_STEPS * N_WORKERS * CHECKPOINT_FREQ

Adding necessary pieces

Now that our algorithm is defined, we can add it to the list of algorithms at the top of the objectives module.

DATASETS = [
    # 1D datasets
    'Basic_Fourier',
    'Spurious_Fourier',
    # Small images
    "TMNIST",
    # Small correlation shift dataset
    "TCMNIST_seq",
    "TCMNIST_step",
    ## EEG Dataset
    "CAP_DB",
    "SEDFx_DB",
    ## Financial Dataset
    "StockVolatility",
    ## Sign Recognition
    "LSA64",
    ## Activity Recognition
    "HAR",
    ## Example
    "flat_MNIST",
]

Before being able to use the dataset, we need to add the hyper parameters related to this dataset in the hyperparams module. Note: the name of the funtion needs to be the same as the name of the dataset followed by _train and _model.

def flat_MNIST_train(sample):
    """ flat_MNIST model hparam definition 
    
    Args:
        sample (bool): If ''True'', hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ''False'' where the default value is chosen.
    """
    if sample:
        return {
            'class_balance': lambda r: True,
            'weight_decay': lambda r: 0.,
            'lr': lambda r: 10**r.uniform(-4.5, -2.5),
            'batch_size': lambda r: int(2**r.uniform(3, 9))
        }
    else:
        return {
            'class_balance': lambda r: True,
            'weight_decay': lambda r: 0,
            'lr': lambda r: 1e-3,
            'batch_size': lambda r: 64
        }

def flat_MNIST_model():
    """ flat_MNIST model hparam definition 
    
    Args:
        sample (bool): If ''True'', hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ''False'' where the default value is chosen.
    """
    return {
        'model': lambda r: 'LSTM',
        'hidden_depth': lambda r: 1, 
        'hidden_width': lambda r: 20,
        'recurrent_layers': lambda r: 2,
        'state_size': lambda r: 32
    }

Run some tests

We can now run a simple test to check that everything is working as expected

pytest

Try the algorithm

Then we can run a training run to see how algorithms performs on your dataset

python3 -m woods.scripts.main train \
        --dataset flat_MNIST \
        --objective ERM \
        --test_env 0 \
        --data_path ./data

Run a sweep

Finally, we can run a sweep to see how the algorithms performs on your dataset

python3 -m woods.scripts.hparams_sweep \
        --objective ERM \
        --dataset flat_MNIST \
        --data_path ./data \
        --launcher dummy

Contributing

Woods is still under developpement and is open to contributions. Just fork the repository and start coding! When you think you have something to contribute, open an issue or a pull request.

If you have a published algorithm that you want to be added as a benchmark please open a pull request we will be happy to add it to the list of available algorithms.

If you have a sequencial dataset that you think has a generalization problem, please open a pull request and we will be happy to add it to the list of available datasets.

API Documentation

woods

woods.command_launchers module

Set of functions used to launch lists of python scripts

Summary

Functions:

dummy_launcher

Doesn't launch any scripts in commands, it only prints the commands.

local_launcher

Launch all of the scripts in commands on the local machine serially.

slurm_launcher

Parallel job launcher for computationnal cluster using the SLURM workload manager.

Reference
woods.command_launchers.dummy_launcher(commands)

Doesn’t launch any scripts in commands, it only prints the commands. Useful for testing.

Taken from : https://github.com/facebookresearch/DomainBed/

Parameters

commands (List) – List of list of string that consists of a python script call

woods.command_launchers.local_launcher(commands)

Launch all of the scripts in commands on the local machine serially. If GPU is available it is gonna use it.

Taken from : https://github.com/facebookresearch/DomainBed/

Parameters

commands (List) – List of list of string that consists of a python script call

woods.command_launchers.slurm_launcher(commands)

Parallel job launcher for computationnal cluster using the SLURM workload manager.

Launches all the jobs in commands in parallel according to the number of tasks in the slurm allocation. An example of SBATCH options:

#!/bin/bash
#SBATCH --job-name=<job_name>
#SBATCH --output=<job_name>.out
#SBATCH --error=<job_name>_error.out
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:4
#SBATCH --time=1-00:00:00
#SBATCH --mem=81Gb

Note

–cpus-per-task should match the N_WORKERS defined in datasets.py (default 4)

Note

there should be equal number of –ntasks and –gres

Parameters

commands (List) – List of list of string that consists of a python script call

woods.datasets module

Defining the benchmarks for OoD generalization in time-series

Summary

Classes:

Basic_Fourier

Fourier_basic dataset

CAP

CAP Sleep stage dataset

EEG_DB

Class for Sleep Staging datasets with their data stored in a HDF5 file

H5_dataset

HDF5 dataset for EEG data

HHAR

Heterogeneity Acrivity Recognition Dataset (HHAR)

InfiniteLoader

InfiniteLoader is a torch.utils.data.IterableDataset that can be used to infinitely iterate over a finite dataset.

InfiniteSampler

Infinite Sampler for PyTorch.

LSA64

LSA64: A Dataset for Argentinian Sign Language dataset

Multi_Domain_Dataset

Abstract class of a multi domain dataset for OOD generalization.

PCL

PCL datasets

SEDFx

SEDFx Sleep stage dataset

Spurious_Fourier

Spurious_Fourier dataset

TCMNIST

Abstract class for Temporal Colored MNIST

TCMNIST_seq

Temporal Colored MNIST Sequence

TCMNIST_step

Temporal Colored MNIST Step

TMNIST

Temporal MNIST dataset

Video_dataset

Video dataset

Functions:

XOR

Returns a XOR b (the 'Exclusive or' gate)

bernoulli

Returns a tensor of 1.

get_dataset_class

Return the dataset class with the given name.

get_environments

Returns the environments of a dataset

get_setup

Returns the setup of a dataset

get_split

Generates the keys that are used to split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction.

get_sweep_envs

Returns the list of test environments to investigate in the hyper parameter sweep

make_split

Split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction.

num_environments

Returns the number of environments of a dataset

Reference
woods.datasets.get_dataset_class(dataset_name)

Return the dataset class with the given name.

Taken from : https://github.com/facebookresearch/DomainBed/

Parameters

dataset_name (str) – Name of the dataset to get the function of. (Must be a part of the DATASETS list)

Returns

The __init__ function of the desired dataset that takes as input ( flags: parser arguments of the train.py script, training_hparams: set of training hparams from hparams.py )

Return type

function

Raises

NotImplementedError – Dataset name not found in the datasets.py globals

woods.datasets.num_environments(dataset_name)

Returns the number of environments of a dataset

Parameters

dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)

Returns

Number of environments of the dataset

Return type

int

woods.datasets.get_sweep_envs(dataset_name)

Returns the list of test environments to investigate in the hyper parameter sweep

Parameters

dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)

Returns

List of environments to sweep across

Return type

list

woods.datasets.get_environments(dataset_name)

Returns the environments of a dataset

Parameters

dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)

Returns

list of environments of the dataset

Return type

list

woods.datasets.get_setup(dataset_name)

Returns the setup of a dataset

Parameters

dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)

Returns

The setup of the dataset (‘seq’ or ‘step’)

Return type

dict

woods.datasets.XOR(a, b)

Returns a XOR b (the ‘Exclusive or’ gate)

Parameters
  • a (bool) – First input

  • b (bool) – Second input

Returns

The output of the XOR gate

Return type

bool

woods.datasets.bernoulli(p, size)

Returns a tensor of 1. (True) or 0. (False) resulting from the outcome of a bernoulli random variable of parameter p.

Parameters
  • p (float) – Parameter p of the Bernoulli distribution

  • size (int...) – A sequence of integers defining hte shape of the output tensor

Returns

Tensor of Bernoulli random variables of parameter p

Return type

Tensor

woods.datasets.make_split(dataset, holdout_fraction, seed=0, sort=False)

Split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction.

Parameters
  • dataset (TensorDataset) – Tensor dataset that has 2 tensors -> data, targets

  • holdout_fraction (float) – Fraction of the dataset that is gonna be in the validation set

  • seed (int, optional) – seed used for the shuffling of the data before splitting. Defaults to 0.

  • sort (bool, optional) – If ‘’True’’ the dataset is gonna be sorted after splitting. Defaults to False.

Returns

1-holdout_fraction part of the split TensorDataset: holdout_fractoin part of the split

Return type

TensorDataset

woods.datasets.get_split(dataset, holdout_fraction, seed=0, sort=False)

Generates the keys that are used to split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction.

Parameters
  • dataset (TensorDataset) – TensorDataset to be split

  • holdout_fraction (float) – Fraction of the dataset that is gonna be in the out (validation) set

  • seed (int, optional) – seed used for the shuffling of the data before splitting. Defaults to 0.

  • sort (bool, optional) – If ‘’True’’ the dataset is gonna be sorted after splitting. Defaults to False.

Returns

in (1-holdout_fraction) keys of the split list: out (holdout_fraction) keys of the split

Return type

list

class woods.datasets.InfiniteSampler(sampler)

Bases: torch.utils.data.sampler.Sampler

Infinite Sampler for PyTorch.

Inspired from : https://github.com/facebookresearch/DomainBed

Parameters

sampler (torch.utils.data.Sampler) – Sampler to be used for the infinite sampling.

class woods.datasets.InfiniteLoader(dataset, batch_size, num_workers=0, pin_memory=False)

Bases: torch.utils.data.dataset.IterableDataset

InfiniteLoader is a torch.utils.data.IterableDataset that can be used to infinitely iterate over a finite dataset.

Inspired from : https://github.com/facebookresearch/DomainBed

Parameters
  • dataset (Dataset) – Dataset to be iterated over

  • batch_size (int) – Batch size of the dataset

  • num_workers (int, optional) – Number of workers to use for the data loading. Defaults to 0.

class woods.datasets.Multi_Domain_Dataset

Bases: object

Abstract class of a multi domain dataset for OOD generalization.

Every multi domain dataset must redefine the important attributes: SETUP, PRED_TIME, ENVS, INPUT_SHAPE, OUTPUT_SIZE, TASK The data dimension needs to be (batch_size, SEQ_LEN, *INPUT_SHAPE)

N_STEPS = 5001

The number of training steps taken for this dataset

Type

int

CHECKPOINT_FREQ = 100

The frequency of results update

Type

int

N_WORKERS = 4

The number of workers used for fast dataloaders used for validation

Type

int

SETUP = None

The setup of the dataset (‘seq’ or ‘step’)

Type

string

TASK = None

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = None

The sequence length of the dataset

Type

int

PRED_TIME = [None]

The time steps where predictions are made

Type

list

INPUT_SHAPE = None

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = None

The size of the output

Type

int

DATA_PATH = None

Path to the data

Type

str

ENVS = [None]

The environments of the dataset

Type

list

SWEEP_ENVS = [None]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

loss_fn(output, target)

Computes the loss

Parameters
  • output (Tensor) – prediction tensor

  • target (Tensor) – Target tensor

get_class_weight()

Compute class weight for class balanced training

Returns

list of weights of length OUTPUT_SIZE

Return type

list

get_train_loaders()

Fetch all training dataloaders and their ID

Returns

list of string names of the data splits used for training list: list of dataloaders of the data splits used for training

Return type

list

get_val_loaders()

Fetch all validation/test dataloaders and their ID

Returns

list of string names of the data splits used for validation and test list: list of dataloaders of the data splits used for validation and test

Return type

list

split_output(out)

Group data and prediction by environment

Parameters
  • out (Tensor) – output from a model of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)

  • labels (Tensor) – labels of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)

Returns

The reshaped output (n_train_env, batch_size, len(PRED_TIME), output_size) Tensor: The labels (n_train_env, batch_size, len(PRED_TIME))

Return type

Tensor

split_labels(labels)

Group data and prediction by environment

Parameters
  • out (Tensor) – output from a model of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)

  • labels (Tensor) – labels of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)

Returns

The reshaped output (n_train_env, batch_size, len(PRED_TIME), output_size) Tensor: The labels (n_train_env, batch_size, len(PRED_TIME))

Return type

Tensor

class woods.datasets.Basic_Fourier(flags, training_hparams)

Bases: woods.datasets.Multi_Domain_Dataset

Fourier_basic dataset

A dataset of 1D sinusoid signal to classify according to their Fourier spectrum.

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

No download is required as it is purely synthetic

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 50

The sequence length of the dataset

Type

int

PRED_TIME = [49]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [1]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 2

The size of the output

Type

int

ENVS = ['no_spur']

The environments of the dataset

Type

list

SWEEP_ENVS = [None]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

class woods.datasets.Spurious_Fourier(flags, training_hparams)

Bases: woods.datasets.Multi_Domain_Dataset

Spurious_Fourier dataset

A dataset of 1D sinusoid signal to classify according to their Fourier spectrum. Peaks in the fourier spectrum are added to the signal that are spuriously correlated to the label. Different environment have different correlation rates between the labels and the spurious peaks in the spectrum.

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

No download is required as it is purely synthetic

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 50

The sequence length of the dataset

Type

int

PRED_TIME = [49]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [1]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 2

The size of the output

Type

int

LABEL_NOISE = 0.25

Level of noise added to the labels

Type

float

ENVS = [0.1, 0.8, 0.9]

The correlation rate between the label and the spurious peaks

Type

list

SWEEP_ENVS = [0]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

super_sample(signal_0, signal_1)

Sample signals frames with a bunch of offsets

class woods.datasets.TMNIST(flags, training_hparams)

Bases: woods.datasets.Multi_Domain_Dataset

Temporal MNIST dataset

Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even.

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path

N_STEPS = 5001

The number of training steps taken for this dataset

Type

int

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 4

The sequence length of the dataset

Type

int

PRED_TIME = [1, 2, 3]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [1, 28, 28]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 2

The size of the output

Type

int

ENVS = ['grey']

The environments of the dataset

Type

list

SWEEP_ENVS = [None]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

plot_samples(TMNIST_labels)
class woods.datasets.TCMNIST(flags)

Bases: woods.datasets.Multi_Domain_Dataset

Abstract class for Temporal Colored MNIST

Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even. Color is added to the digits that is correlated with the label of the current step. The formulation of which is defined in the child of this class, either sequences-wise of step-wise

Parameters

flags (argparse.Namespace) – argparse of training arguments

Note

The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path

N_STEPS = 5001

The number of training steps taken for this dataset

Type

int

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 4

The sequence length of the dataset

Type

int

PRED_TIME = [1, 2, 3]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [2, 28, 28]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 2

The size of the output

Type

int

plot_samples(images, labels, name)
class woods.datasets.TCMNIST_seq(flags, training_hparams)

Bases: woods.datasets.TCMNIST

Temporal Colored MNIST Sequence

Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even. Color is added to the digits that is correlated with the label of the current step.

The correlation of the color to the label is constant across sequences and whole sequences are sampled from an environmnent definition

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

LABEL_NOISE = 0.25

Level of noise added to the labels

Type

float

ENVS = [0.1, 0.8, 0.9]

list of different correlation values between the color and the label

Type

list

SWEEP_ENVS = [0]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

color_dataset(images, labels, p, d)

Color the dataset

Parameters
  • images (Tensor) – 3 channel images to color

  • labels (Tensor) – labels of the images

  • p (float) – correlation between the color and the label

  • d (float) – level of noise added to the labels

Returns

colored images

Return type

colored_images (Tensor)

class woods.datasets.TCMNIST_step(flags, training_hparams)

Bases: woods.datasets.TCMNIST

Temporal Colored MNIST Step

Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even. Color is added to the digits that is correlated with the label of the current step.

The correlation of the color to the label is varying across sequences and time steps are sampled from an environmnent definition. By definition, the test environment is always the last time step in the sequence.

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path

SETUP = 'step'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

LABEL_NOISE = 0.25

Level of noise added to the labels

Type

float

ENVS = [0.9, 0.8, 0.1]

list of different correlation values between the color and the label

Type

list

SWEEP_ENVS = [2]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

color_dataset(images, labels, env_id, p, d)

Color a single step ‘env_id’ of the dataset

Parameters
  • images (Tensor) – 3 channel images to color

  • labels (Tensor) – labels of the images

  • env_id (int) – environment id

  • p (float) – correlation between the color and the label

  • d (float) – level of noise added to the labels

Returns

all dataset with a new step colored

Return type

colored_images (Tensor)

split_output(out)

Group data and prediction by environment

Parameters

labels (Tensor) – labels of the data (batch_size, len(PRED_TIME))

Returns

The reshaped data (n_env-1, batch_size, 1, n_classes)

Return type

Tensor

split_labels(labels)

Group data and prediction by environment

Parameters

labels (Tensor) – labels of the data (batch_size, len(PRED_TIME))

Returns

The reshaped labels (n_env-1, batch_size, 1)

Return type

Tensor

class woods.datasets.H5_dataset(h5_path, env_id, split=None)

Bases: torch.utils.data.dataset.Dataset

HDF5 dataset for EEG data

The HDF5 file is expected to have the following nested dict structure:

{'env0': {'data': np.array(n_samples, time_steps, input_size),
          'labels': np.array(n_samples, len(PRED_TIME))},
'env1': {'data': np.array(n_samples, time_steps, input_size),
         'labels': np.array(n_samples, len(PRED_TIME))},
...}

Good thing about this is that it imports data only when it needs to and thus saves ram space

Parameters
  • h5_path (str) – absolute path to the hdf5 file

  • env_id (int) – environment id key in the hdf5 file

  • split (list) – list of indices of the dataset the belong to the split. If ‘None’, all the data is used

close()

Close the hdf5 file link

class woods.datasets.EEG_DB(flags, training_hparams)

Bases: woods.datasets.Multi_Domain_Dataset

Class for Sleep Staging datasets with their data stored in a HDF5 file

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

CHECKPOINT_FREQ = 500

The frequency of results update

Type

int

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

DATA_PATH = None

realative path to the hdf5 file

Type

str

get_class_weight()

Compute class weight for class balanced training

Returns

list of weights of length OUTPUT_SIZE

Return type

list

class woods.datasets.CAP(flags, training_hparams)

Bases: woods.datasets.EEG_DB

CAP Sleep stage dataset

The task is to classify the sleep stage from EEG and other modalities of signals. This dataset only uses about half of the raw dataset because of the incompatibility of some measurements. We use the 5 most commonly used machines in the database to create the 5 seperate environment to train on. The machines that were used were infered by grouping together the recording that had the same channels, and the final preprocessed data only include the channels that were in common between those 5 machines.

You can read more on the data itself and it’s provenance on Physionet.org:

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

This dataset need to be downloaded and preprocessed. This can be done with the download.py script.

N_STEPS = 5001

The number of training steps taken for this dataset

Type

int

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 3000

The sequence length of the dataset

Type

int

PRED_TIME = [2999]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [19]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 6

The size of the output

Type

int

DATA_PATH = 'CAP/CAP.h5'

realative path to the hdf5 file

Type

str

ENVS = ['Machine0', 'Machine1', 'Machine2', 'Machine3', 'Machine4']

The environments of the dataset

Type

list

SWEEP_ENVS = [0, 1, 2, 3, 4]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

class woods.datasets.SEDFx(flags, training_hparams)

Bases: woods.datasets.EEG_DB

SEDFx Sleep stage dataset

The task is to classify the sleep stage from EEG and other modalities of signals. This dataset only uses about half of the raw dataset because of the incompatibility of some measurements. We split the dataset in 5 environments to train on, each of them containing the data taken from a given group age.

You can read more on the data itself and it’s provenance on Physionet.org:

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

This dataset need to be downloaded and preprocessed. This can be done with the download.py script

N_STEPS = 10001

The number of training steps taken for this dataset

Type

int

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 3000

The sequence length of the dataset

Type

int

PRED_TIME = [2999]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [4]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 6

The size of the output

Type

int

DATA_PATH = 'SEDFx/SEDFx.h5'

realative path to the hdf5 file

Type

str

ENVS = ['Age 20-40', 'Age 40-60', 'Age 60-80', 'Age 80-100']

The environments of the dataset

Type

list

SWEEP_ENVS = [0, 1, 2, 3]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

class woods.datasets.PCL(flags, training_hparams)

Bases: woods.datasets.EEG_DB

PCL datasets

The task is to classify the motor imaginary from EEG and other modalities of signals. The raw data comes from the three PCL Databases:

[ ‘PhysionetMI’, ‘Cho2017’, ‘Lee2019_MI’]

You can read more on the data itself and it’s provenance on:

This dataset need to be downloaded and preprocessed. This can be done with the download.py script

N_STEPS = 10001

The number of training steps taken for this dataset

Type

int

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 752

The sequence length of the dataset

Type

int

PRED_TIME = [751]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [48]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 2

The size of the output

Type

int

DATA_PATH = 'PCL/PCL.h5'

realative path to the hdf5 file

Type

str

ENVS = ['PhysionetMI', 'Cho2017', 'Lee2019_MI']

The environments of the dataset

Type

list

SWEEP_ENVS = [0, 1, 2]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

class woods.datasets.Video_dataset(data_paths, n_frames, transform=None, split=None)

Bases: torch.utils.data.dataset.Dataset

Video dataset

Folder structure:

data_path
    └── 001
        └─ 001
            ├── frame000001.jpg
            ├── ...
            └── frame0000{n_frames}.jpg
        └─ 002
        └─ (samples) ...
    └── 002
        └─ 001
        └─ 002
        └─ (samples) ...
    └── 003
    └── (labels) ...
Parameters
  • data_path (str) – path to the folder containing the data

  • n_frames (int) – number of frames in each video

  • transform (callable, optional) – Optional transform to be applied on a sample.

read_images(selected_folder, use_transform)

Read images from a folder (single video consisting of n_frames images)

Parameters
  • selected_folder (str) – path to the folder containing the images

  • use_transform (callable) – transform to apply on the images

Returns

images tensor (n_frames, 3, 224, 224)

Return type

Tensor

class woods.datasets.LSA64(flags, training_hparams)

Bases: woods.datasets.Multi_Domain_Dataset

LSA64: A Dataset for Argentinian Sign Language dataset

This dataset is composed of videos of different signers.

You can read more on the data itself and it’s provenance from it’s source:

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

This dataset need to be downloaded and preprocessed. This can be done with the download.py script

Ressources:
N_STEPS = 5001

The number of training steps taken for this dataset

Type

int

CHECKPOINT_FREQ = 500

The frequency of results update

Type

int

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 20

number of frames in each video

Type

int

PRED_TIME = [19]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [3, 224, 224]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 64

The size of the output

Type

int

DATA_PATH = 'LSA64'

path to the folder containing the data

Type

str

ENVS = ['001-002', '003-004', '005-006', '007-008', '009-010']

The environments of the dataset

Type

list

SWEEP_ENVS = [0, 1, 2, 3, 4]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

get_class_weight()

Compute class weight for class balanced training

Returns

list of weights of length OUTPUT_SIZE

Return type

list

class woods.datasets.HHAR(flags, training_hparams)

Bases: woods.datasets.Multi_Domain_Dataset

Heterogeneity Acrivity Recognition Dataset (HHAR)

This dataset is composed of wearables measurements during different activities. The goal is to classify those activities (stand, sit, walk, bike, stairs up, stairs down).

You can read more on the data itself and it’s provenance from it’s source:

Parameters
  • flags (argparse.Namespace) – argparse of training arguments

  • training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file

Note

This dataset need to be downloaded and preprocessed. This can be done with the download.py script

Ressources:
N_STEPS = 5001

The number of training steps taken for this dataset

Type

int

CHECKPOINT_FREQ = 100

The frequency of results update

Type

int

SETUP = 'seq'

The setup of the dataset (‘seq’ or ‘step’)

Type

string

TASK = 'classification'

The type of prediction task (‘classification’ of ‘regression’)

Type

string

SEQ_LEN = 500

The sequence length of the dataset

Type

int

PRED_TIME = [499]

The time steps where predictions are made

Type

list

INPUT_SHAPE = [6]

The shape of the input (excluding batch size and time dimension)

Type

int

OUTPUT_SIZE = 6

The size of the output

Type

int

DATA_PATH = 'HHAR/HHAR.h5'

Path to the file containing the data

Type

str

ENVS = ['nexus4', 's3', 's3mini', 'lgwatch', 'gear']

The environments of the dataset

Type

list

SWEEP_ENVS = [0, 1, 2, 3, 4]

The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps

Type

list

woods.hyperparams module

Defining hyper parameters and their distributions for HPO

Summary

Functions:

ANDMask_hyper

ANDMask objective hparam definition

Basic_Fourier_model

Spurious Fourier model hparam definition

Basic_Fourier_train

Basic Fourier model hparam definition

CAP_model

CAP model hparam definition

CAP_train

CAP model hparam definition

ERM_hyper

ERM objective hparam definition

Fish_hyper

Fish objective hparam definition

IGA_hyper

IGA objective hparam definition

IRM_hyper

IRM objective hparam definition

LSA64_model

LSA64 model hparam definition

LSA64_train

LSA64 model hparam definition

SANDMask_hyper

SANDMask objective hparam definition

SD_hyper

SD objective hparam definition

SEDFx_model

SEDFx model hparam definition

SEDFx_train

SEDFx model hparam definition

Spurious_Fourier_model

Spurious Fourier model hparam definition

Spurious_Fourier_train

Spurious Fourier model hparam definition

TCMNIST_seq_model

TCMNIST_seq model hparam definition

TCMNIST_seq_train

TCMNIST_seq model hparam definition

TCMNIST_step_model

TCMNIST_step model hparam definition

TCMNIST_step_train

TCMNIST_step model hparam definition

TMNIST_model

TMNIST model hparam definition

TMNIST_train

TMNIST model hparam definition

VREx_hyper

VREx objective hparam definition

get_model_hparams

Get the model related hyper parameters

get_objective_hparams

Get the objective related hyper parameters

get_training_hparams

Get training related hyper parameters (class_balance, weight_decay, lr, batch_size)

Reference
woods.hyperparams.get_training_hparams(dataset_name, seed, sample=False)

Get training related hyper parameters (class_balance, weight_decay, lr, batch_size)

Parameters
  • dataset_name (str) – dataset that is gonna be trained on for the run

  • seed (int) – seed used if hyper parameter is sampled

  • sample (bool, optional) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

Raises

NotImplementedError – Dataset name not found

Returns

Dictionnary with hyper parameters values

Return type

dict

woods.hyperparams.Basic_Fourier_train(sample)

Basic Fourier model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.Spurious_Fourier_train(sample)

Spurious Fourier model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.TMNIST_train(sample)

TMNIST model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.TCMNIST_seq_train(sample)

TCMNIST_seq model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.TCMNIST_step_train(sample)

TCMNIST_step model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.CAP_train(sample)

CAP model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.SEDFx_train(sample)

SEDFx model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.LSA64_train(sample)

LSA64 model hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.get_model_hparams(dataset_name)

Get the model related hyper parameters

Each dataset has their own model hyper parameters definition

Parameters
  • dataset_name (str) – dataset that is gonna be trained on for the run

  • seed (int) – seed used if hyper parameter is sampled

  • sample (bool, optional) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

Raises

NotImplementedError – Dataset name not found

Returns

Dictionnary with hyper parameters values

Return type

dict

woods.hyperparams.Basic_Fourier_model()

Spurious Fourier model hparam definition

woods.hyperparams.Spurious_Fourier_model()

Spurious Fourier model hparam definition

woods.hyperparams.TMNIST_model()

TMNIST model hparam definition

woods.hyperparams.TCMNIST_seq_model()

TCMNIST_seq model hparam definition

woods.hyperparams.TCMNIST_step_model()

TCMNIST_step model hparam definition

woods.hyperparams.CAP_model()

CAP model hparam definition

woods.hyperparams.SEDFx_model()

SEDFx model hparam definition

woods.hyperparams.LSA64_model()

LSA64 model hparam definition

woods.hyperparams.get_objective_hparams(objective_name, seed, sample=False)

Get the objective related hyper parameters

Each objective has their own model hyper parameters definitions

Parameters
  • objective_name (str) – objective that is gonna be trained on for the run

  • seed (int) – seed used if hyper parameter is sampled

  • sample (bool, optional) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

Raises

NotImplementedError – Objective name not found

Returns

Dictionnary with hyper parameters values

Return type

dict

woods.hyperparams.ERM_hyper(sample)

ERM objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.IRM_hyper(sample)

IRM objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.VREx_hyper(sample)

VREx objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.SD_hyper(sample)

SD objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.IGA_hyper(sample)

IGA objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.ANDMask_hyper(sample)

ANDMask objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.Fish_hyper(sample)

Fish objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.hyperparams.SANDMask_hyper(sample)

SANDMask objective hparam definition

Parameters

sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.

woods.model_selection module

Defining the model selection strategies

Summary

Functions:

IID_validation

Perform the IID validation model section on a single training run with NO TEST ENVIRONMENT and returns the results

ensure_dict_path

Ensure that a path of a nested dictionnary exists.

get_best_hparams

Get the best set of hyperparameters for a given a record from a sweep and a selection method

get_chosen_test_acc

Get the test accuracy that will be chosen through the selection method for a given a record from a sweep

test_domain_validation

Perform the test domain validation model section on a single training run and returns the results

train_domain_validation

Perform the train domain validation model section on a single training run and returns the results

Reference
woods.model_selection.ensure_dict_path(dict, key)

Ensure that a path of a nested dictionnary exists.

If it does, return the nested dictionnary within. If it does not, create a nested dictionnary and return it.

Parameters
  • dict (dict) – Nested dictionnary to ensure a path

  • key (str) – Key to ensure has a dictionnary in

Returns

nested dictionnary

Return type

dict

woods.model_selection.get_best_hparams(records, selection_method)

Get the best set of hyperparameters for a given a record from a sweep and a selection method

The way model selection is performed is by computing the validation accuracy of all training checkpoints. The definition of the validation accuracy is given by the selection method. Then using these validation accuracies, we choose the best checkpoint and report the corresponding hyperparameters.

Parameters
  • records (dict) – Dictionary of records from a sweep

  • selection_method (str) – Selection method to use

Returns

flags of the chosen model training run for the all trial seeds dict: hyperparameters of the chosen model for all trial seeds dict: validation accuracy of the chosen model run for all trial seeds dict: test accuracy of the chosen model run for all trial seeds

Return type

dict

woods.model_selection.get_chosen_test_acc(records, selection_method)

Get the test accuracy that will be chosen through the selection method for a given a record from a sweep

The way model selection is performed is by computing the validation accuracy of all training checkpoints. The definition of the validation accuracy is given by the selection method. Then using these validation accuracies, we choose the best checkpoint and report the test accuracy linked to that checkpoint.

Parameters
  • records (dict) – Dictionary of records from a sweep

  • selection_method (str) – Selection method to use

Returns

validation accuracy of the chosen models averaged over all trial seeds float: variance of the validation accuracy of the chosen models accross all trial seeds float: test accuracy of the chosen models averaged over all trial seeds float: variance of the test accuracy of the chosen models accross all trial seeds

Return type

float

woods.model_selection.IID_validation(records)

Perform the IID validation model section on a single training run with NO TEST ENVIRONMENT and returns the results

The model selection is performed by computing the average all domains accuracy of all training checkpoints and choosing the highest one.

best_step = argmax_{step in checkpoints}( mean(train_envs_acc) )

Parameters

records (dict) – Dictionary of records from a single training run

Returns

validation accuracy of the best checkpoint of the training run float: validation accuracy of the best checkpoint of the training run

Return type

float

Note

This is for ONLY for sweeps with no test environments.

woods.model_selection.train_domain_validation(records)

Perform the train domain validation model section on a single training run and returns the results

The model selection is performed by computing the average training domains accuracy of all training checkpoints and choosing the highest one.

best_step = argmax_{step in checkpoints}( mean(train_envs_acc) )

Parameters

records (dict) – Dictionary of records from a single training run

Returns

validation accuracy of the best checkpoint of the training run float: test accuracy of the best checkpoint (highest validation accuracy) of the training run

Return type

float

woods.model_selection.test_domain_validation(records)

Perform the test domain validation model section on a single training run and returns the results

The model selection is performed with the test accuracy of ONLY THE LAST CHECKPOINT OF A TRAINING RUN, so this function simply returns the test accuracy of the last checkpoint.

best_step = test_acc[-1]

Parameters

records (dict) – Dictionary of records from a single training run

Returns

validation accuracy of the training run, which is also the test accuracyof the last checkpoint float: test accuracy of the last checkpoint

Return type

float

woods.models module

Defining the architectures used for benchmarking algorithms

Summary

Classes:

ATTN_LSTM

A simple LSTM model with self attention

CRNN

Convolutional Recurrent Neural Network

EEGNet

The EEGNet model

LSTM

A simple LSTM model

MNIST_CNN

Hand-tuned architecture for extracting representation from MNIST images

MNIST_LSTM

A simple LSTM model taking inputs from a CNN.

deep4

The DEEP4 model

Functions:

get_model

Return the dataset class with the given name

Reference
woods.models.get_model(dataset, model_hparams)

Return the dataset class with the given name

Parameters
  • dataset (str) – name of the dataset

  • model_hparams (dict) – model hyperparameters

class woods.models.deep4(dataset, model_hparams)

Bases: torch.nn.modules.module.Module

The DEEP4 model

This is from the Braindecode package:

https://github.com/braindecode/braindecode

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

input_size

The size of the inputs to the model (for a single time step).

Type

int

output_size

The size of the outputs of the model (number of classes).

Type

int

seq_len

The length of the sequences.

Type

int

forward(input, time_pred)
training: bool
class woods.models.EEGNet(dataset, model_hparams)

Bases: torch.nn.modules.module.Module

The EEGNet model

This is a really small model ~3k parameters.

This is from the Braindecode package:

https://github.com/braindecode/braindecode

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

input_size

The size of the inputs to the model (for a single time step).

Type

int

output_size

The size of the outputs of the model (number of classes).

Type

int

seq_len

The length of the sequences.

Type

int

forward(input, time_pred)
training: bool
class woods.models.MNIST_CNN(input_shape)

Bases: torch.nn.modules.module.Module

Hand-tuned architecture for extracting representation from MNIST images

This was adapted from :

https://github.com/facebookresearch/DomainBed

In our context, it is used to extract the representation from the images which are fed to a recurrent model such as an LSTM

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

  • input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.

EMBED_DIM = 32

Size of the output respresentation

Type

int

CNN_OUT_DIM = 288

Size of the representation after convolution, but before FCC layers

Type

int

forward(x)

Forward pass through the model

Parameters

x (torch.Tensor) – The input to the model.

Returns

The output representation of the model.

Return type

torch.Tensor

training: bool
class woods.models.LSTM(dataset, model_hparams, input_size=None)

Bases: torch.nn.modules.module.Module

A simple LSTM model

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

  • input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.

state_size

The size of the hidden state of the LSTM.

Type

int

recurrent_layers

The number of recurrent layers stacked on each other.

Type

int

hidden_depth

The number of hidden layers of the classifier MLP (after LSTM).

Type

int

hidden_width

The width of the hidden layers of the classifier MLP (after LSTM).

Type

int

Notes

All attributes need to be in the model_hparams dictionary.

forward(input, time_pred)

Forward pass of the model

Parameters
  • input (torch.Tensor) – The input to the model.

  • time_pred (torch.Tensor) – The time prediction of the input.

Returns

The output of the model.

Return type

torch.Tensor

initHidden(batch_size, device)

Initialize the hidden state of the LSTM with a normal distribution

Parameters
  • batch_size (int) – The batch size of the model.

  • device (torch.device) – The device to use.

training: bool
class woods.models.MNIST_LSTM(dataset, model_hparams, input_size=None)

Bases: torch.nn.modules.module.Module

A simple LSTM model taking inputs from a CNN. (see: MNIST_CNN)

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

  • input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.

state_size

The size of the hidden state of the LSTM.

Type

int

recurrent_layers

The number of recurrent layers stacked on each other.

Type

int

hidden_depth

The number of hidden layers of the classifier MLP (after LSTM).

Type

int

hidden_width

The width of the hidden layers of the classifier MLP (after LSTM).

Type

int

Notes

All attributes need to be in the model_hparams dictionary.

forward(input, time_pred)

Forward pass of the model

Parameters
  • input (torch.Tensor) – The input to the model.

  • time_pred (torch.Tensor) – The time prediction of the input.

Returns

The output of the model.

Return type

torch.Tensor

initHidden(batch_size, device)

Initialize the hidden state of the LSTM with a normal distribution

Parameters
  • batch_size (int) – The batch size of the model.

  • device (torch.device) – The device to use.

training: bool
class woods.models.ATTN_LSTM(dataset, model_hparams, input_size=None)

Bases: torch.nn.modules.module.Module

A simple LSTM model with self attention

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

  • input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.

state_size

The size of the hidden state of the LSTM.

Type

int

recurrent_layers

The number of recurrent layers stacked on each other.

Type

int

hidden_depth

The number of hidden layers of the classifier MLP (after LSTM).

Type

int

hidden_width

The width of the hidden layers of the classifier MLP (after LSTM).

Type

int

Notes

All attributes need to be in the model_hparams dictionary.

forward(input, time_pred)

Forward pass of the model

Parameters
  • input (torch.Tensor) – The input to the model.

  • time_pred (torch.Tensor) – The time prediction of the input.

Returns

The output of the model.

Return type

torch.Tensor

initHidden(batch_size, device)

Initialize the hidden state of the LSTM with a normal distribution

Parameters
  • batch_size (int) – The batch size of the model.

  • device (torch.device) – The device to use.

training: bool
class woods.models.CRNN(dataset, model_hparams, input_size=None)

Bases: torch.nn.modules.module.Module

Convolutional Recurrent Neural Network

This is taken inspired from the repository:

https://github.com/HHTseng/video-classification/

But here we use the ResNet50 architecture pretrained on ImageNet, and we use the ATTN_LSTM model on top of the outputs of the ResNet50 to make predictions.

Parameters
  • dataset (Multi_Domain_Dataset) – dataset that we will be training on

  • model_hparams (dict) – The hyperparameters for the model.

fc_hidden1

The size of the first hidden layer of the CNN embedding.

Type

int

fc_hidden2

The size of the second hidden layer of the CNN embedding.

Type

int

CNN_embed_dim

The size of the CNN embedding.

Type

int

forward(input, time_pred)

Forward pass through CRNN :param input: Tensor, shape [batch_size, seq_len, input_size] :param time_pred: Tensor, time prediction indexes

training: bool

woods.objectives module

Defining domain generalization algorithms

Summary

Classes:

ANDMask

Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary]

ERM

Empirical Risk Minimization (ERM)

Fish

Implementation of Fish, as seen in Gradient Matching for Domain Generalization, Shi et al. 2021.

IGA

Inter-environmental Gradient Alignment From https://arxiv.org/abs/2008.01883v2

IRM

Invariant Risk Minimization (IRM)

Objective

A subclass of Objective implements a domain generalization Gradients.

SANDMask

Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary]

SD

Gradient Starvation: A Learning Proclivity in Neural Networks Equation 25 from [https://arxiv.org/pdf/2011.09468.pdf]

VREx

V-REx Objective from http://arxiv.org/abs/2003.00688

Functions:

get_objective_class

Return the objective class with the given name.

Reference
woods.objectives.get_objective_class(objective_name)

Return the objective class with the given name.

class woods.objectives.Objective(hparams)

Bases: torch.nn.modules.module.Module

A subclass of Objective implements a domain generalization Gradients. Subclasses should implement the following: - update - predict

backward(losses)

Computes the Gradients for model update

Admits a list of unlabeled losses from the test domains: losses

training: bool
class woods.objectives.ERM(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.Objective

Empirical Risk Minimization (ERM)

predict(all_x, ts, device)
update(minibatches_device, dataset, device)
training: bool
class woods.objectives.IRM(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

Invariant Risk Minimization (IRM)

update(minibatches_device, dataset, device)
training: bool
class woods.objectives.VREx(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

V-REx Objective from http://arxiv.org/abs/2003.00688

update(minibatches_device, dataset, device)
training: bool
class woods.objectives.SD(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

Gradient Starvation: A Learning Proclivity in Neural Networks Equation 25 from [https://arxiv.org/pdf/2011.09468.pdf]

update(minibatches_device, dataset, device)
training: bool
class woods.objectives.ANDMask(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary]

mask_grads(tau, gradients, params)
update(minibatches_device, dataset, device)
training: bool
class woods.objectives.IGA(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

Inter-environmental Gradient Alignment From https://arxiv.org/abs/2008.01883v2

update(minibatches_device, dataset, device)
training: bool
class woods.objectives.Fish(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

Implementation of Fish, as seen in Gradient Matching for Domain Generalization, Shi et al. 2021.

create_copy(device)
update(minibatches_device, dataset, device)
training: bool
class woods.objectives.SANDMask(model, dataset, loss_fn, optimizer, hparams)

Bases: woods.objectives.ERM

Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary]

mask_grads(tau, k, gradients, params, device)

Mask are ranged in [0,1] to form a set of updates for each parameter based on the agreement of gradients coming from different environments.

update(minibatches_device, dataset, device)
training: bool

woods.train module

Defining the training functions that are used to train and evaluate models

Summary

Functions:

get_accuracies

Get accuracies for all splits using fast loaders

get_split_accuracy_seq

Get accuracy and loss for a dataset that is of the seq setup

get_split_accuracy_step

Get accuracy and loss for a dataset that is of the step setup

train

Train a model on a given dataset with a given objective

train_step

Train a single training step for a model

Reference
woods.train.train_step(model, objective, dataset, in_loaders_iter, device)

Train a single training step for a model

Parameters
  • model – nn model defined in a models.py

  • objective – objective we are using for training

  • dataset – dataset object we are training on

  • in_loaders_iter – iterable of iterable of data loaders

  • device – device on which we are training

woods.train.train(flags, training_hparams, model, objective, dataset, device)

Train a model on a given dataset with a given objective

Parameters
  • flags – flags from argparse

  • training_hparams – training hyperparameters

  • model – nn model defined in a models.py

  • objective – objective we are using for training

  • dataset – dataset object we are training on

  • device – device on which we are training

woods.train.get_accuracies(objective, dataset, device)

Get accuracies for all splits using fast loaders

Parameters
  • objective – objective we are using for training

  • dataset – dataset object we are training on

  • device – device on which we are training

woods.train.get_split_accuracy_seq(objective, dataset, loader, device)

Get accuracy and loss for a dataset that is of the seq setup

Parameters
  • objective – objective we are using for training

  • dataset – dataset object we are training on

  • loader – data loader of which we want the accuracy

  • device – device on which we are training

woods.train.get_split_accuracy_step(objective, dataset, loader, device)

Get accuracy and loss for a dataset that is of the step setup

Parameters
  • objective – objective we are using for training

  • dataset – dataset object we are training on

  • loader – data loader of which we want the accuracy

  • device – device on which we are training

woods.utils module

Set of utility functions used throughout the package

Summary

Functions:

check_file_integrity

Check for integrity of files from a hyper parameter sweep

get_cmap

Returns a function that maps each index in 0, 1, ..., n-1 to a distinct RGB color; the keyword argument name must be a standard mpl colormap name.

get_job_name

Generates the name of the output file for a training run as a function of the config

get_latex_table

Construct and export a LaTeX table from a PrettyTable.

plot_results

Plot results - accuracy and loss - w.r.t.

print_results

Print results from a results json file :param results_path: path to a results json file coming from a training run :type results_path: str

setup_pretty_table

Setup the printed table that show the results at each checkpoints

Reference
woods.utils.get_cmap(n, name='hsv')

Returns a function that maps each index in 0, 1, …, n-1 to a distinct RGB color; the keyword argument name must be a standard mpl colormap name.

woods.utils.plot_results(results_path)

Plot results - accuracy and loss - w.r.t. training step

Parameters

results_path (str) – path to a results json file coming from a training run

woods.utils.print_results(results_path)

Print results from a results json file :param results_path: path to a results json file coming from a training run :type results_path: str

woods.utils.get_job_name(flags)

Generates the name of the output file for a training run as a function of the config

Seq setup: <objective>_<dataset>_<test_env>_H<hparams_seed>_T<trial_seed>.json Step setup: <objective>_<dataset>_<test_env>_H<hparams_seed>_T<trial_seed>_S<test_step>.json

Parameters

flags (dict) – dictionnary of the config for a training run

Returns

name of the output json file of the training run

Return type

str

woods.utils.check_file_integrity(results_dir)

Check for integrity of files from a hyper parameter sweep

Parameters

results_dir (str) – directory where sweep results are stored

Raises

AssertionError – If there is a sweep file missing

woods.utils.setup_pretty_table(flags)

Setup the printed table that show the results at each checkpoints

Parameters
  • flags (Namespace) – Namespace of the argparser containing the config of the training run

  • dataset (Multi_Domain_Dataset) – Dataset Object

Returns

an instance of prettytable.PrettyTable

Return type

PrettyTable

woods.utils.get_latex_table(table, caption=None, label=None)

Construct and export a LaTeX table from a PrettyTable.

Inspired from : https://github.com/adasilva/prettytable

Parameters
  • table (PrettyTable) –

  • caption (str, optional) – a caption for the table. Defaults to None.

  • label (str, optional) – a latex reference tag. Defaults to None.

Returns

printable latex string

Return type

str

woods.command_launchers

Set of functions used to launch lists of python scripts

woods.datasets

Defining the benchmarks for OoD generalization in time-series

woods.hyperparams

Defining hyper parameters and their distributions for HPO

woods.model_selection

Defining the model selection strategies

woods.models

Defining the architectures used for benchmarking algorithms

woods.objectives

Defining domain generalization algorithms

woods.train

Defining the training functions that are used to train and evaluate models

woods.utils

Set of utility functions used throughout the package

woods.scripts

woods.scripts.compile_results module

Compile resuls from a hyperparameter sweep and perform model selection strategies

See https://woods.readthedocs.io/en/latest/running_a_sweep.html to learn more about usage.

woods.scripts.download module

Directly download the preprocessed data

Summary

Functions:

CAP

Download the CAP dataset

HHAR

Download the HHAR dataset

LSA64

Download the LSA64 dataset

PCL

Download the PCL dataset

SEDFx

Download the SEDFx dataset

Reference
woods.scripts.download.CAP(data_path, mode)

Download the CAP dataset

woods.scripts.download.SEDFx(data_path, mode)

Download the SEDFx dataset

woods.scripts.download.PCL(data_path, mode)

Download the PCL dataset

woods.scripts.download.HHAR(data_path, mode)

Download the HHAR dataset

woods.scripts.download.LSA64(data_path, mode)

Download the LSA64 dataset

woods.scripts.fetch_and_preprocess module

This module is used to run yourself the raw download and preprocessing of the data

You can directly download the preprocessed data with the download.py module. This module is used only for transparancy of how the datasets are preprocessed. It also gives the opportunity to the most curageous to change the preprocessing approaches of the data for curiosity.

Note

The intention of releasing the benchmarks of woods is to investigate the performance of domain generalization techniques. Although some preprocessing tricks could lead to better OoD performance, this approach is not encouraged when using the WOODS benchmarks.

Summary

Classes:

CAP

Fetch the data from the PhysioNet website and preprocess it

PCL

Fetch the data using moabb and preprocess it

SEDFx

Fetch the PhysioNet Sleep-EDF Database Expanded Dataset and preprocess it

Functions:

HHAR

Fetch and preprocess the HHAR dataset

LSA64

Fetch the LSA64 dataset and preprocess it

Reference
class woods.scripts.fetch_and_preprocess.CAP(flags)

Bases: object

Fetch the data from the PhysioNet website and preprocess it

The download is automatic but if you want to manually download:

wget -r -N -c -np https://physionet.org/files/capslpdb/1.0.0/
Parameters

flags (argparse.Namespace) – The flags of the script

files = [['physionet.org/files/capslpdb/1.0.0/nfle29', 'physionet.org/files/capslpdb/1.0.0/nfle7', 'physionet.org/files/capslpdb/1.0.0/nfle1', 'physionet.org/files/capslpdb/1.0.0/nfle5', 'physionet.org/files/capslpdb/1.0.0/n11', 'physionet.org/files/capslpdb/1.0.0/rbd18', 'physionet.org/files/capslpdb/1.0.0/plm9', 'physionet.org/files/capslpdb/1.0.0/nfle35', 'physionet.org/files/capslpdb/1.0.0/nfle36', 'physionet.org/files/capslpdb/1.0.0/nfle2', 'physionet.org/files/capslpdb/1.0.0/nfle38', 'physionet.org/files/capslpdb/1.0.0/nfle39', 'physionet.org/files/capslpdb/1.0.0/nfle21'], ['physionet.org/files/capslpdb/1.0.0/nfle10', 'physionet.org/files/capslpdb/1.0.0/nfle11', 'physionet.org/files/capslpdb/1.0.0/nfle19', 'physionet.org/files/capslpdb/1.0.0/nfle26', 'physionet.org/files/capslpdb/1.0.0/nfle23'], ['physionet.org/files/capslpdb/1.0.0/rbd8', 'physionet.org/files/capslpdb/1.0.0/rbd5', 'physionet.org/files/capslpdb/1.0.0/rbd11', 'physionet.org/files/capslpdb/1.0.0/ins8', 'physionet.org/files/capslpdb/1.0.0/rbd10'], ['physionet.org/files/capslpdb/1.0.0/n3', 'physionet.org/files/capslpdb/1.0.0/nfle30', 'physionet.org/files/capslpdb/1.0.0/nfle13', 'physionet.org/files/capslpdb/1.0.0/nfle18', 'physionet.org/files/capslpdb/1.0.0/nfle24', 'physionet.org/files/capslpdb/1.0.0/nfle4', 'physionet.org/files/capslpdb/1.0.0/nfle14', 'physionet.org/files/capslpdb/1.0.0/nfle22', 'physionet.org/files/capslpdb/1.0.0/n5', 'physionet.org/files/capslpdb/1.0.0/nfle37'], ['physionet.org/files/capslpdb/1.0.0/nfle3', 'physionet.org/files/capslpdb/1.0.0/nfle40', 'physionet.org/files/capslpdb/1.0.0/nfle15', 'physionet.org/files/capslpdb/1.0.0/nfle12', 'physionet.org/files/capslpdb/1.0.0/nfle28', 'physionet.org/files/capslpdb/1.0.0/nfle34', 'physionet.org/files/capslpdb/1.0.0/nfle16', 'physionet.org/files/capslpdb/1.0.0/nfle17']]
remove_useless(flags)

Remove useless files

string_2_label(string)

Convert string to label

read_annotation(txt_path)

Read annotation file for the CAP dataset

gather_EEG(flags)

Gets the intersection of common channels across all machines

Returns

list of channels (strings)

Return type

list

class woods.scripts.fetch_and_preprocess.SEDFx(flags)

Bases: object

Fetch the PhysioNet Sleep-EDF Database Expanded Dataset and preprocess it

The download is automatic but if you want to manually download:

wget -r -N -c -np https://physionet.org/files/sleep-edfx/1.0.0/
Parameters

flags (argparse.Namespace) – The flags of the script

remove_useless(flags)

Remove useless files

string_2_label(string)

Convert string to label

read_annotation(txt_path)

Read annotation file

gather_EEG(flags)

Gets the intersection of common channels across all machines

Returns

list of channels (strings)

Return type

list

woods.scripts.fetch_and_preprocess.HHAR(flags)

Fetch and preprocess the HHAR dataset

Note

You need to manually download the HHAR dataset from the source and place it in the data folder in order to preprocess it yourself:

Parameters

flags (argparse.Namespace) – The flags of the script

woods.scripts.fetch_and_preprocess.LSA64(flags)

Fetch the LSA64 dataset and preprocess it

Note

You need to manually download the HHAR dataset from the source and place it in the data folder in order to preprocess it yourself:

Parameters

flags (argparse.Namespace) – The flags of the script

class woods.scripts.fetch_and_preprocess.PCL(flags)

Bases: object

Fetch the data using moabb and preprocess it

Source of MOABB:

http://moabb.neurotechx.com/docs/index.html

Parameters

flags (argparse.Namespace) – The flags of the script

Note

This is hell to run. It takes a while to download and requires a lot of RAM.

relabel(l)

Converts labels from str to int

woods.scripts.hparams_sweep module

Perform an hyper parameter sweep

See https://woods.readthedocs.io/en/latest/running_a_sweep.html for usage.

Summary

Functions:

make_args_list

Creates a list of commands to launch all of the training runs in the hyper parameter sweep

Reference
woods.scripts.hparams_sweep.make_args_list(flags)

Creates a list of commands to launch all of the training runs in the hyper parameter sweep

Heavily inspired from https://github.com/facebookresearch/DomainBed/blob/9e864cc4057d1678765ab3ecb10ae37a4c75a840/domainbed/scripts/sweep.py#L98

Parameters

flags (dict) – arguments of the hyper parameter sweep

Returns

list of strings terminal commands that calls the training runs of the sweep list: list of dict where dicts are the arguments for the training runs of the sweep

Return type

list

woods.scripts.main module

Script used for the main functionnalities of the woods package

There is 2 mode of operation:
  • training mode: trains a model on a given dataset with a given test environment using a given algorithm

  • test mode: tests an existing model on a given dataset with a given test environment using a given algorithm

raises NotImplementedError

Some part of the code is not implemented yet

woods.scripts.visualize_results module

Visualize logs from a training run

woods.scripts.compile_results

Compile resuls from a hyperparameter sweep and perform model selection strategies

woods.scripts.download

Directly download the preprocessed data

woods.scripts.fetch_and_preprocess

This module is used to run yourself the raw download and preprocessing of the data

woods.scripts.hparams_sweep

Perform an hyper parameter sweep

woods.scripts.main

Script used for the main functionnalities of the woods package

woods.scripts.visualize_results

Visualize logs from a training run

woods.scripts

Library

woods

woods.objectives

Defining domain generalization algorithms

woods.datasets

Defining the benchmarks for OoD generalization in time-series

woods.hyperparams

Defining hyper parameters and their distributions for HPO

woods.train

Defining the training functions that are used to train and evaluate models

woods.models

Defining the architectures used for benchmarking algorithms

woods.model_selection

Defining the model selection strategies

woods.command_launchers

Set of functions used to launch lists of python scripts

woods.utils

Set of utility functions used throughout the package

Scripts

woods.scripts

woods.scripts.compile_results

Compile resuls from a hyperparameter sweep and perform model selection strategies

woods.scripts.download

Directly download the preprocessed data

woods.scripts.fetch_and_preprocess

This module is used to run yourself the raw download and preprocessing of the data

woods.scripts.hparams_sweep

Perform an hyper parameter sweep

woods.scripts.main

Script used for the main functionnalities of the woods package

woods.scripts.visualize_results

Visualize logs from a training run

Indices and tables