WOODS
WOODS is a project aimed at investigating the implications of Out-of-Distribution generalization problems in sequential data along with it’s possible solution. To that goal, we offer a DomainBed-like suite to test domain generalization algorithms on our WILDS-like set of sequential data benchmarks inspired from real world problems of a wide array of common modalities in modern machine learning.
Quick Installation
WOODS is still under active developpement so it is still only available by cloning the repository on your local machine.
Installing requirements
With Conda
First, have conda installed on your machine (see their installation page if that is not the case). Then create a conda environment with the following command:
conda create --name woods python=3.7
Then activate the environment with the following command:
conda activate woods
With venv
You can use the python virtual environment manager virtualenv to create a virtual environment for the project. IMPORTANT: Make sure you are using python >3.7.
virtualenv /path/to/woods/env
Then activate the virtual environment with the following command:
source /path/to/env/woods/bin/activate
Clone locally
Once you’ve created the virtual environment, clone the repository.
git clone https://github.com/jc-audet/WOODS.git
cd WOODS
Then install the requirements with the following command:
pip install -r requirements.txt
Run tests
Run the tests to make sure everything is in order. More tests are coming soon.
pytest
Downloading the data
Before running any training run, we need to make sure we have the data to train on.
Direct Preprocessed Download
The repository offers direct download to the preprocessed data which is the quickest and most efficient way to get started. To download the preprocessed data, run the download module of the woods.scripts package and specify the dataset you want to download:
python3 -m woods.scripts.download DATASET\
--data_path ./path/to/data/directory
Source Download and Preprocess
For the sake of transparency, WOODS also offers the preprocessing scripts we took for all datasets in the preprecessing module of the woods.scripts package. You can also use the same module to download the raw data from the original source and run preprocessing yourself on it. DISCLAIMER: Some of the datasets take a long time to preprocess, especially the EEG datasets.
python3 -m woods.scripts.fetch_and_preprocess DATASET\
--data_path ./path/to/data/directory
Datasets Info
The following table lists the available datasets and their corresponding raw and preprocessed sizes.
Datasets | Modality | Requires Download | Preprocessed Size | Raw Size |
---|---|---|---|---|
Basic_Fourier | 1D Signal | No | - | - |
Spurious_Fourier | 1D Signal | No | - | - |
TMNIST | Video | Yes, but done automatically | 0.11 GB | - |
TCMNIST_seq | Video | Yes, but done automatically | 0.11 GB | - |
TCMNSIT_step | Video | Yes, but done automatically | 0.11 GB | - |
CAP | EEG | Yes | 9.1 GB | 40.1 GB |
SEDFx | EEG | Yes | 10.7 GB | 8.1 GB |
MI | EEG | Yes | 3.0GB | 13.5 GB |
LSA64 | Video | Yes | 0.26 GB | 1.5 GB |
HAR | Sensor | Yes | 0.16 GB | 3.1 GB |
Running a Sweep
In WOODS, we evaluate the performance of a domain generalization algorithm by running a sweep over the hyper parameters definition space and then performing model selection on the training runs conducted during the sweep.
Running the sweep
Once we have the data, we can start running the sweep. The hparams_sweep module of the woods.scripts package provides the command line interface to create the list of jobs to run, which is then passed to the command launcher to launch all jobs. The list of jobs includes all of the necessary training runs to get the results from all trial seeds, and hyper parameter seeds for a given algorithm, dataset and test domain.
All datasets have the SWEEP_ENVS
attributes that defines which test environments are included in the sweep. For example, the SWEEP_ENVS
attribute for the Spurious Fourier
dataset is only 1 test domain while for most real datasets SWEEP_ENVS
consists of all domains.
In other words, for every combination of (algorithm, dataset, test environment) we train 20 different hyper parameter configurations on which we investigate 3 different trial seeds. This means that for every combination of (algorithm, dataset, test environment) we run 20 * 3 = 60 training runs.
python3 -m woods.scripts.hparams_sweep \
--dataset Spurious_Fourier TCMNIST_seq \
--objective ERM IRM \
--save_path ./results \
--launcher local
Here we are using the local launcher to run the jobs locally, which is the simplest launcher. We also offer other lauchers in the command_launcher module, such as slurm_launcher which is a parallel job launcher for the SLURM workload manager.
Compiling the results
Once the sweep is finished, we can compile the results. The compile_results module of the woods.scripts package provides the command line interface to compile the results. The –latex option is used to generate the latex table.
python3 -m woods.scripts.compile_results \
--results_dir path/to/results \
--latex
It is also possible to compile the results from multiple directories containing complementary sweeps results. This will put all of those results in the same table.
python3 -m woods.scripts.compile_results \
--results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
--latex
There are other mode of operation for the compile_results module, such as --mode IID
which takes results from a sweep with no test environment and report the results for each test environment separately.
python3 -m woods.scripts.compile_results \
--results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
--mode IID
There is also --mode summary
which reports the average results for every dataset of all objectives in the sweep.
python3 -m woods.scripts.compile_results \
--results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
--mode summary
You can also use the --mode hparams
which reports the hparams of the model chosen by model selection
python3 -m woods.scripts.compile_results \
--results_dir path/to/results/1 path/to/results/2 path/to/results/3 \
--mode hparams
Advanced usage
If 60 jobs is too many jobs for you available compute, or too few for you experiments you can change the number of seeds investigated, you can call the --n_hparams
and --n_trials
argument.
python3 -m woods.scripts.hparams_sweep \
--dataset Spurious_Fourier TCMNIST_seq \
--objective ERM IRM \
--save_path ./results \
--launcher local \
--n_hparams 10 \
--n_trials 1
If some of the test environment of a dataset is not of interest to you, you can specify which test environment you want to investigate using the --unique_test_env
argument
python3 -m woods.scripts.hparams_sweep \
--dataset Spurious_Fourier TCMNIST_seq \
--objective ERM IRM \
--save_path ./results \
--launcher local \
--unique_test_env 0
You can run a sweep with no test environment by specifying the --unique_test_env
argument as None
.
python3 -m woods.scripts.hparams_sweep \
--dataset Spurious_Fourier TCMNIST_seq \
--objective ERM IRM \
--save_path ./results \
--launcher local \
--unique_test_env None
Adding an Algorithm
In this section, we will walk through the process of adding an algorithm to the framework.
Defining the Algorithm
We first define the algorithm by creating a new class in the objectives module. In this example we will add scaled_ERM which is simply ERM with a random scale factor between 0 and max_scale for each environment in a dataset, where max_scale is an hyperparameter of the objective.
Let’s first define the class and its int method to initialize the algorithm.
class scaled_ERM(ERM):
"""
Scaled Empirical Risk Minimization (scaled ERM)
"""
def __init__(self, model, dataset, loss_fn, optimizer, hparams):
super(scaled_ERM, self).__init__(model, dataset, loss_fn, optimizer, hparams)
self.model = model
self.loss_fn = loss_fn
self.optimizer = optimizer
self.max_scale = hparams['max_scale']
self.scaling_factor = self.max_scale * torch.rand(len(dataset.train_names))
We then need to define the update function, which take a minibatch of data and compute the loss and update the model according to the algorithm definition. Note here that we do not need to define the predict function, as it is already defined in the base class.
def update(self, minibatches_device, dataset, device):
## Group all inputs and send to device
all_x = torch.cat([x for x,y in minibatches_device]).to(device)
all_y = torch.cat([y for x,y in minibatches_device]).to(device)
ts = torch.tensor(dataset.PRED_TIME).to(device)
out = self.predict(all_x, ts, device)
## Reshape the data so the first dimension are environments)
out_split, labels_split = dataset.split_data(out, all_y)
env_losses = torch.zeros(out_split.shape[0]).to(device)
for i in range(out_split.shape[0]):
for t_idx in range(out_split.shape[2]): # Number of time steps
env_losses[i] += self.scaling_factor[i] * self.loss_fn(out_split[i, :, t_idx, :], labels_split[i,:,t_idx])
objective = env_losses.mean()
# Back propagate
self.optimizer.zero_grad()
objective.backward()
self.optimizer.step()
Adding necessary pieces
Now that our algorithm is defined, we can add it to the list of algorithms at the top of the objectives module.
OBJECTIVES = [
'ERM',
'IRM',
'VREx',
'SD',
'ANDMask',
'IGA',
'scaled_ERM',
]
Before being able to use the algorithm, we need to add the hyper parameters related to this algorithm in the hyperparams module. Note: the name of the funtion needs to be the same as the name of the algorithm followed by _hyper.
def scaled_ERM_hyper(sample):
""" scaled ERM objective hparam definition
Args:
sample (bool): If ''True'', hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ''False'' where the default value is chosen.
"""
if sample:
return {
'max_scale': lambda r: r.uniform(1.,10.)
}
else:
return {
'max_scale': lambda r: 2.
}
Run some tests
We can now run a simple test to check that everything is working as expected
pytest
Try the algorithm
Then we can run a training run to see how the algorithm performs on any dataset
python3 -m woods.scripts.main train \
--dataset Spurious_Fourier \
--objective scaled_ERM \
--test_env 0 \
--data_path ./data
Run a sweep
Finally, we can run a sweep to see how the algorithm performs on all the datasets
python3 -m woods.scripts.hparams_sweep \
--objective scaled_ERM \
--dataset Spurious_Fourier \
--data_path ./data \
--launcher dummy
Adding a Dataset
In this section, we will walk through the process of adding an dataset to the framework.
Defining the Algorithm
We first define the dataset by creating a new class in the datasets module. In this example we will add flat_MNIST which is the MNIST dataset, but the image is fed to a sequential model pixel by pixel and the environments are different orders of the pixels.
First let’s define the dataset class and its init method.
class flat_MNIST(Multi_Domain_Dataset):
""" Class for flat MNIST dataset
Each sample is a sequence of 784 pixels.
The task is to predict the digit
Args:
flags (argparse.Namespace): argparse of training arguments
Note:
The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn't in the given data_path
"""
## Dataset parameters
SETUP = 'seq'
TASK = 'classification'
SEQ_LEN = 28*28
PRED_TIME = [783]
INPUT_SHAPE = [1]
OUTPUT_SIZE = 10
## Environment parameters
ENVS = ['forwards', 'backwards', 'scrambled']
SWEEP_ENVS = list(range(len(ENVS)))
def __init__(self, flags, training_hparams):
super().__init__()
if flags.test_env is not None:
assert flags.test_env < len(self.ENVS), "Test environment chosen is not valid"
else:
warnings.warn("You don't have any test environment")
# Save stuff
self.test_env = flags.test_env
self.class_balance = training_hparams['class_balance']
self.batch_size = training_hparams['batch_size']
## Import original MNIST data
MNIST_tfrm = transforms.Compose([ transforms.ToTensor() ])
# Get MNIST data
train_ds = datasets.MNIST(flags.data_path, train=True, download=True, transform=MNIST_tfrm)
test_ds = datasets.MNIST(flags.data_path, train=False, download=True, transform=MNIST_tfrm)
# Concatenate all data and labels
MNIST_images = torch.cat((train_ds.data.float(), test_ds.data.float()))
MNIST_labels = torch.cat((train_ds.targets, test_ds.targets))
# Create sequences of 784 pixels
self.TCMNIST_images = MNIST_images.reshape(-1, 28*28, 1)
self.MNIST_labels = MNIST_labels.long().unsqueeze(1)
# Make the color datasets
self.train_names, self.train_loaders = [], []
self.val_names, self.val_loaders = [], []
for i, e in enumerate(self.ENVS):
# Choose data subset
images = self.TCMNIST_images[i::len(self.ENVS),...]
labels = self.MNIST_labels[i::len(self.ENVS),...]
# Apply environment definition
if e == 'forwards':
images = images
elif e == 'backwards':
images = torch.flip(images, dims=[1])
elif e == 'scrambled':
images = images[:, torch.randperm(28*28), :]
# Make Tensor dataset and the split
dataset = torch.utils.data.TensorDataset(images, labels)
in_dataset, out_dataset = make_split(dataset, flags.holdout_fraction)
if i != self.test_env:
in_loader = InfiniteLoader(in_dataset, batch_size=training_hparams['batch_size'])
self.train_names.append(str(e) + '_in')
self.train_loaders.append(in_loader)
fast_in_loader = torch.utils.data.DataLoader(in_dataset, batch_size=64, shuffle=False, num_workers=self.N_WORKERS, pin_memory=True)
self.val_names.append(str(e) + '_in')
self.val_loaders.append(fast_in_loader)
fast_out_loader = torch.utils.data.DataLoader(out_dataset, batch_size=64, shuffle=False, num_workers=self.N_WORKERS, pin_memory=True)
self.val_names.append(str(e) + '_out')
self.val_loaders.append(fast_out_loader)
# Define loss function
self.log_prob = nn.LogSoftmax(dim=1)
self.loss = nn.NLLLoss(weight=self.get_class_weight().to(training_hparams['device']))
Note: you are required to define the following variables: * SETUP * SEQ_LEN * PRED_TIME * INPUT_SHAPE * OUTPUT_SIZE * ENVS * SWEEP_ENVS you are also encouraged to redefine the following variables: * N_STEPS * N_WORKERS * CHECKPOINT_FREQ
Adding necessary pieces
Now that our algorithm is defined, we can add it to the list of algorithms at the top of the objectives module.
DATASETS = [
# 1D datasets
'Basic_Fourier',
'Spurious_Fourier',
# Small images
"TMNIST",
# Small correlation shift dataset
"TCMNIST_seq",
"TCMNIST_step",
## EEG Dataset
"CAP_DB",
"SEDFx_DB",
## Financial Dataset
"StockVolatility",
## Sign Recognition
"LSA64",
## Activity Recognition
"HAR",
## Example
"flat_MNIST",
]
Before being able to use the dataset, we need to add the hyper parameters related to this dataset in the hyperparams module. Note: the name of the funtion needs to be the same as the name of the dataset followed by _train and _model.
def flat_MNIST_train(sample):
""" flat_MNIST model hparam definition
Args:
sample (bool): If ''True'', hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ''False'' where the default value is chosen.
"""
if sample:
return {
'class_balance': lambda r: True,
'weight_decay': lambda r: 0.,
'lr': lambda r: 10**r.uniform(-4.5, -2.5),
'batch_size': lambda r: int(2**r.uniform(3, 9))
}
else:
return {
'class_balance': lambda r: True,
'weight_decay': lambda r: 0,
'lr': lambda r: 1e-3,
'batch_size': lambda r: 64
}
def flat_MNIST_model():
""" flat_MNIST model hparam definition
Args:
sample (bool): If ''True'', hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ''False'' where the default value is chosen.
"""
return {
'model': lambda r: 'LSTM',
'hidden_depth': lambda r: 1,
'hidden_width': lambda r: 20,
'recurrent_layers': lambda r: 2,
'state_size': lambda r: 32
}
Run some tests
We can now run a simple test to check that everything is working as expected
pytest
Try the algorithm
Then we can run a training run to see how algorithms performs on your dataset
python3 -m woods.scripts.main train \
--dataset flat_MNIST \
--objective ERM \
--test_env 0 \
--data_path ./data
Run a sweep
Finally, we can run a sweep to see how the algorithms performs on your dataset
python3 -m woods.scripts.hparams_sweep \
--objective ERM \
--dataset flat_MNIST \
--data_path ./data \
--launcher dummy
Contributing
Woods is still under developpement and is open to contributions. Just fork the repository and start coding! When you think you have something to contribute, open an issue or a pull request.
If you have a published algorithm that you want to be added as a benchmark please open a pull request we will be happy to add it to the list of available algorithms.
If you have a sequencial dataset that you think has a generalization problem, please open a pull request and we will be happy to add it to the list of available datasets.
API Documentation
woods
woods.command_launchers module
Set of functions used to launch lists of python scripts
Summary
Functions:
Doesn't launch any scripts in commands, it only prints the commands. |
|
Launch all of the scripts in commands on the local machine serially. |
|
Parallel job launcher for computationnal cluster using the SLURM workload manager. |
Reference
- woods.command_launchers.dummy_launcher(commands)
Doesn’t launch any scripts in commands, it only prints the commands. Useful for testing.
Taken from : https://github.com/facebookresearch/DomainBed/
- Parameters
commands (List) – List of list of string that consists of a python script call
- woods.command_launchers.local_launcher(commands)
Launch all of the scripts in commands on the local machine serially. If GPU is available it is gonna use it.
Taken from : https://github.com/facebookresearch/DomainBed/
- Parameters
commands (List) – List of list of string that consists of a python script call
- woods.command_launchers.slurm_launcher(commands)
Parallel job launcher for computationnal cluster using the SLURM workload manager.
Launches all the jobs in commands in parallel according to the number of tasks in the slurm allocation. An example of SBATCH options:
#!/bin/bash #SBATCH --job-name=<job_name> #SBATCH --output=<job_name>.out #SBATCH --error=<job_name>_error.out #SBATCH --ntasks=4 #SBATCH --cpus-per-task=8 #SBATCH --gres=gpu:4 #SBATCH --time=1-00:00:00 #SBATCH --mem=81Gb
Note
–cpus-per-task should match the N_WORKERS defined in datasets.py (default 4)
Note
there should be equal number of –ntasks and –gres
- Parameters
commands (List) – List of list of string that consists of a python script call
woods.datasets module
Defining the benchmarks for OoD generalization in time-series
Summary
Classes:
Fourier_basic dataset |
|
CAP Sleep stage dataset |
|
Class for Sleep Staging datasets with their data stored in a HDF5 file |
|
HDF5 dataset for EEG data |
|
Heterogeneity Acrivity Recognition Dataset (HHAR) |
|
InfiniteLoader is a torch.utils.data.IterableDataset that can be used to infinitely iterate over a finite dataset. |
|
Infinite Sampler for PyTorch. |
|
LSA64: A Dataset for Argentinian Sign Language dataset |
|
Abstract class of a multi domain dataset for OOD generalization. |
|
PCL datasets |
|
SEDFx Sleep stage dataset |
|
Spurious_Fourier dataset |
|
Abstract class for Temporal Colored MNIST |
|
Temporal Colored MNIST Sequence |
|
Temporal Colored MNIST Step |
|
Temporal MNIST dataset |
|
Video dataset |
Functions:
Returns a XOR b (the 'Exclusive or' gate) |
|
Returns a tensor of 1. |
|
Return the dataset class with the given name. |
|
Returns the environments of a dataset |
|
Returns the setup of a dataset |
|
Generates the keys that are used to split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction. |
|
Returns the list of test environments to investigate in the hyper parameter sweep |
|
Split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction. |
|
Returns the number of environments of a dataset |
Reference
- woods.datasets.get_dataset_class(dataset_name)
Return the dataset class with the given name.
Taken from : https://github.com/facebookresearch/DomainBed/
- Parameters
dataset_name (str) – Name of the dataset to get the function of. (Must be a part of the DATASETS list)
- Returns
The __init__ function of the desired dataset that takes as input ( flags: parser arguments of the train.py script, training_hparams: set of training hparams from hparams.py )
- Return type
function
- Raises
NotImplementedError – Dataset name not found in the datasets.py globals
- woods.datasets.num_environments(dataset_name)
Returns the number of environments of a dataset
- Parameters
dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)
- Returns
Number of environments of the dataset
- Return type
int
- woods.datasets.get_sweep_envs(dataset_name)
Returns the list of test environments to investigate in the hyper parameter sweep
- Parameters
dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)
- Returns
List of environments to sweep across
- Return type
list
- woods.datasets.get_environments(dataset_name)
Returns the environments of a dataset
- Parameters
dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)
- Returns
list of environments of the dataset
- Return type
list
- woods.datasets.get_setup(dataset_name)
Returns the setup of a dataset
- Parameters
dataset_name (str) – Name of the dataset to get the number of environments of. (Must be a part of the DATASETS list)
- Returns
The setup of the dataset (‘seq’ or ‘step’)
- Return type
dict
- woods.datasets.XOR(a, b)
Returns a XOR b (the ‘Exclusive or’ gate)
- Parameters
a (bool) – First input
b (bool) – Second input
- Returns
The output of the XOR gate
- Return type
bool
- woods.datasets.bernoulli(p, size)
Returns a tensor of 1. (True) or 0. (False) resulting from the outcome of a bernoulli random variable of parameter p.
- Parameters
p (float) – Parameter p of the Bernoulli distribution
size (int...) – A sequence of integers defining hte shape of the output tensor
- Returns
Tensor of Bernoulli random variables of parameter p
- Return type
Tensor
- woods.datasets.make_split(dataset, holdout_fraction, seed=0, sort=False)
Split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction.
- Parameters
dataset (TensorDataset) – Tensor dataset that has 2 tensors -> data, targets
holdout_fraction (float) – Fraction of the dataset that is gonna be in the validation set
seed (int, optional) – seed used for the shuffling of the data before splitting. Defaults to 0.
sort (bool, optional) – If ‘’True’’ the dataset is gonna be sorted after splitting. Defaults to False.
- Returns
1-holdout_fraction part of the split TensorDataset: holdout_fractoin part of the split
- Return type
TensorDataset
- woods.datasets.get_split(dataset, holdout_fraction, seed=0, sort=False)
Generates the keys that are used to split a Torch TensorDataset into (1-holdout_fraction) / holdout_fraction.
- Parameters
dataset (TensorDataset) – TensorDataset to be split
holdout_fraction (float) – Fraction of the dataset that is gonna be in the out (validation) set
seed (int, optional) – seed used for the shuffling of the data before splitting. Defaults to 0.
sort (bool, optional) – If ‘’True’’ the dataset is gonna be sorted after splitting. Defaults to False.
- Returns
in (1-holdout_fraction) keys of the split list: out (holdout_fraction) keys of the split
- Return type
list
- class woods.datasets.InfiniteSampler(sampler)
Bases:
torch.utils.data.sampler.Sampler
Infinite Sampler for PyTorch.
Inspired from : https://github.com/facebookresearch/DomainBed
- Parameters
sampler (torch.utils.data.Sampler) – Sampler to be used for the infinite sampling.
- class woods.datasets.InfiniteLoader(dataset, batch_size, num_workers=0, pin_memory=False)
Bases:
torch.utils.data.dataset.IterableDataset
InfiniteLoader is a torch.utils.data.IterableDataset that can be used to infinitely iterate over a finite dataset.
Inspired from : https://github.com/facebookresearch/DomainBed
- Parameters
dataset (Dataset) – Dataset to be iterated over
batch_size (int) – Batch size of the dataset
num_workers (int, optional) – Number of workers to use for the data loading. Defaults to 0.
- class woods.datasets.Multi_Domain_Dataset
Bases:
object
Abstract class of a multi domain dataset for OOD generalization.
Every multi domain dataset must redefine the important attributes: SETUP, PRED_TIME, ENVS, INPUT_SHAPE, OUTPUT_SIZE, TASK The data dimension needs to be (batch_size, SEQ_LEN, *INPUT_SHAPE)
- N_STEPS = 5001
The number of training steps taken for this dataset
- Type
int
- CHECKPOINT_FREQ = 100
The frequency of results update
- Type
int
- N_WORKERS = 4
The number of workers used for fast dataloaders used for validation
- Type
int
- SETUP = None
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- TASK = None
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = None
The sequence length of the dataset
- Type
int
- PRED_TIME = [None]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = None
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = None
The size of the output
- Type
int
- DATA_PATH = None
Path to the data
- Type
str
- ENVS = [None]
The environments of the dataset
- Type
list
- SWEEP_ENVS = [None]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- loss_fn(output, target)
Computes the loss
- Parameters
output (Tensor) – prediction tensor
target (Tensor) – Target tensor
- get_class_weight()
Compute class weight for class balanced training
- Returns
list of weights of length OUTPUT_SIZE
- Return type
list
- get_train_loaders()
Fetch all training dataloaders and their ID
- Returns
list of string names of the data splits used for training list: list of dataloaders of the data splits used for training
- Return type
list
- get_val_loaders()
Fetch all validation/test dataloaders and their ID
- Returns
list of string names of the data splits used for validation and test list: list of dataloaders of the data splits used for validation and test
- Return type
list
- split_output(out)
Group data and prediction by environment
- Parameters
out (Tensor) – output from a model of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)
labels (Tensor) – labels of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)
- Returns
The reshaped output (n_train_env, batch_size, len(PRED_TIME), output_size) Tensor: The labels (n_train_env, batch_size, len(PRED_TIME))
- Return type
Tensor
- split_labels(labels)
Group data and prediction by environment
- Parameters
out (Tensor) – output from a model of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)
labels (Tensor) – labels of shape ((n_env-1)*batch_size, len(PRED_TIME), output_size)
- Returns
The reshaped output (n_train_env, batch_size, len(PRED_TIME), output_size) Tensor: The labels (n_train_env, batch_size, len(PRED_TIME))
- Return type
Tensor
- class woods.datasets.Basic_Fourier(flags, training_hparams)
Bases:
woods.datasets.Multi_Domain_Dataset
Fourier_basic dataset
A dataset of 1D sinusoid signal to classify according to their Fourier spectrum.
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
No download is required as it is purely synthetic
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 50
The sequence length of the dataset
- Type
int
- PRED_TIME = [49]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [1]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 2
The size of the output
- Type
int
- ENVS = ['no_spur']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [None]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- class woods.datasets.Spurious_Fourier(flags, training_hparams)
Bases:
woods.datasets.Multi_Domain_Dataset
Spurious_Fourier dataset
A dataset of 1D sinusoid signal to classify according to their Fourier spectrum. Peaks in the fourier spectrum are added to the signal that are spuriously correlated to the label. Different environment have different correlation rates between the labels and the spurious peaks in the spectrum.
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
No download is required as it is purely synthetic
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 50
The sequence length of the dataset
- Type
int
- PRED_TIME = [49]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [1]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 2
The size of the output
- Type
int
- LABEL_NOISE = 0.25
Level of noise added to the labels
- Type
float
- ENVS = [0.1, 0.8, 0.9]
The correlation rate between the label and the spurious peaks
- Type
list
- SWEEP_ENVS = [0]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- super_sample(signal_0, signal_1)
Sample signals frames with a bunch of offsets
- class woods.datasets.TMNIST(flags, training_hparams)
Bases:
woods.datasets.Multi_Domain_Dataset
Temporal MNIST dataset
Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even.
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path
- N_STEPS = 5001
The number of training steps taken for this dataset
- Type
int
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 4
The sequence length of the dataset
- Type
int
- PRED_TIME = [1, 2, 3]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [1, 28, 28]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 2
The size of the output
- Type
int
- ENVS = ['grey']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [None]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- plot_samples(TMNIST_labels)
- class woods.datasets.TCMNIST(flags)
Bases:
woods.datasets.Multi_Domain_Dataset
Abstract class for Temporal Colored MNIST
Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even. Color is added to the digits that is correlated with the label of the current step. The formulation of which is defined in the child of this class, either sequences-wise of step-wise
- Parameters
flags (argparse.Namespace) – argparse of training arguments
Note
The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path
- N_STEPS = 5001
The number of training steps taken for this dataset
- Type
int
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 4
The sequence length of the dataset
- Type
int
- PRED_TIME = [1, 2, 3]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [2, 28, 28]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 2
The size of the output
- Type
int
- plot_samples(images, labels, name)
- class woods.datasets.TCMNIST_seq(flags, training_hparams)
Bases:
woods.datasets.TCMNIST
Temporal Colored MNIST Sequence
Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even. Color is added to the digits that is correlated with the label of the current step.
The correlation of the color to the label is constant across sequences and whole sequences are sampled from an environmnent definition
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- LABEL_NOISE = 0.25
Level of noise added to the labels
- Type
float
- ENVS = [0.1, 0.8, 0.9]
list of different correlation values between the color and the label
- Type
list
- SWEEP_ENVS = [0]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- color_dataset(images, labels, p, d)
Color the dataset
- Parameters
images (Tensor) – 3 channel images to color
labels (Tensor) – labels of the images
p (float) – correlation between the color and the label
d (float) – level of noise added to the labels
- Returns
colored images
- Return type
colored_images (Tensor)
- class woods.datasets.TCMNIST_step(flags, training_hparams)
Bases:
woods.datasets.TCMNIST
Temporal Colored MNIST Step
Each sample is a sequence of 4 MNIST digits. The task is to predict at each step if the sum of the current digit and the previous one is odd or even. Color is added to the digits that is correlated with the label of the current step.
The correlation of the color to the label is varying across sequences and time steps are sampled from an environmnent definition. By definition, the test environment is always the last time step in the sequence.
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
The MNIST dataset needs to be downloaded, this is automaticaly done if the dataset isn’t in the given data_path
- SETUP = 'step'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- LABEL_NOISE = 0.25
Level of noise added to the labels
- Type
float
- ENVS = [0.9, 0.8, 0.1]
list of different correlation values between the color and the label
- Type
list
- SWEEP_ENVS = [2]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- color_dataset(images, labels, env_id, p, d)
Color a single step ‘env_id’ of the dataset
- Parameters
images (Tensor) – 3 channel images to color
labels (Tensor) – labels of the images
env_id (int) – environment id
p (float) – correlation between the color and the label
d (float) – level of noise added to the labels
- Returns
all dataset with a new step colored
- Return type
colored_images (Tensor)
- split_output(out)
Group data and prediction by environment
- Parameters
labels (Tensor) – labels of the data (batch_size, len(PRED_TIME))
- Returns
The reshaped data (n_env-1, batch_size, 1, n_classes)
- Return type
Tensor
- split_labels(labels)
Group data and prediction by environment
- Parameters
labels (Tensor) – labels of the data (batch_size, len(PRED_TIME))
- Returns
The reshaped labels (n_env-1, batch_size, 1)
- Return type
Tensor
- class woods.datasets.H5_dataset(h5_path, env_id, split=None)
Bases:
torch.utils.data.dataset.Dataset
HDF5 dataset for EEG data
The HDF5 file is expected to have the following nested dict structure:
{'env0': {'data': np.array(n_samples, time_steps, input_size), 'labels': np.array(n_samples, len(PRED_TIME))}, 'env1': {'data': np.array(n_samples, time_steps, input_size), 'labels': np.array(n_samples, len(PRED_TIME))}, ...}
Good thing about this is that it imports data only when it needs to and thus saves ram space
- Parameters
h5_path (str) – absolute path to the hdf5 file
env_id (int) – environment id key in the hdf5 file
split (list) – list of indices of the dataset the belong to the split. If ‘None’, all the data is used
- close()
Close the hdf5 file link
- class woods.datasets.EEG_DB(flags, training_hparams)
Bases:
woods.datasets.Multi_Domain_Dataset
Class for Sleep Staging datasets with their data stored in a HDF5 file
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
- CHECKPOINT_FREQ = 500
The frequency of results update
- Type
int
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- DATA_PATH = None
realative path to the hdf5 file
- Type
str
- get_class_weight()
Compute class weight for class balanced training
- Returns
list of weights of length OUTPUT_SIZE
- Return type
list
- class woods.datasets.CAP(flags, training_hparams)
Bases:
woods.datasets.EEG_DB
CAP Sleep stage dataset
The task is to classify the sleep stage from EEG and other modalities of signals. This dataset only uses about half of the raw dataset because of the incompatibility of some measurements. We use the 5 most commonly used machines in the database to create the 5 seperate environment to train on. The machines that were used were infered by grouping together the recording that had the same channels, and the final preprocessed data only include the channels that were in common between those 5 machines.
You can read more on the data itself and it’s provenance on Physionet.org:
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
This dataset need to be downloaded and preprocessed. This can be done with the download.py script.
- N_STEPS = 5001
The number of training steps taken for this dataset
- Type
int
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 3000
The sequence length of the dataset
- Type
int
- PRED_TIME = [2999]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [19]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 6
The size of the output
- Type
int
- DATA_PATH = 'CAP/CAP.h5'
realative path to the hdf5 file
- Type
str
- ENVS = ['Machine0', 'Machine1', 'Machine2', 'Machine3', 'Machine4']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [0, 1, 2, 3, 4]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- class woods.datasets.SEDFx(flags, training_hparams)
Bases:
woods.datasets.EEG_DB
SEDFx Sleep stage dataset
The task is to classify the sleep stage from EEG and other modalities of signals. This dataset only uses about half of the raw dataset because of the incompatibility of some measurements. We split the dataset in 5 environments to train on, each of them containing the data taken from a given group age.
You can read more on the data itself and it’s provenance on Physionet.org:
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
This dataset need to be downloaded and preprocessed. This can be done with the download.py script
- N_STEPS = 10001
The number of training steps taken for this dataset
- Type
int
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 3000
The sequence length of the dataset
- Type
int
- PRED_TIME = [2999]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [4]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 6
The size of the output
- Type
int
- DATA_PATH = 'SEDFx/SEDFx.h5'
realative path to the hdf5 file
- Type
str
- ENVS = ['Age 20-40', 'Age 40-60', 'Age 60-80', 'Age 80-100']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [0, 1, 2, 3]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- class woods.datasets.PCL(flags, training_hparams)
Bases:
woods.datasets.EEG_DB
PCL datasets
The task is to classify the motor imaginary from EEG and other modalities of signals. The raw data comes from the three PCL Databases:
[ ‘PhysionetMI’, ‘Cho2017’, ‘Lee2019_MI’]
You can read more on the data itself and it’s provenance on:
This dataset need to be downloaded and preprocessed. This can be done with the download.py script
- N_STEPS = 10001
The number of training steps taken for this dataset
- Type
int
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 752
The sequence length of the dataset
- Type
int
- PRED_TIME = [751]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [48]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 2
The size of the output
- Type
int
- DATA_PATH = 'PCL/PCL.h5'
realative path to the hdf5 file
- Type
str
- ENVS = ['PhysionetMI', 'Cho2017', 'Lee2019_MI']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [0, 1, 2]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- class woods.datasets.Video_dataset(data_paths, n_frames, transform=None, split=None)
Bases:
torch.utils.data.dataset.Dataset
Video dataset
Folder structure:
data_path └── 001 └─ 001 ├── frame000001.jpg ├── ... └── frame0000{n_frames}.jpg └─ 002 └─ (samples) ... └── 002 └─ 001 └─ 002 └─ (samples) ... └── 003 └── (labels) ...
- Parameters
data_path (str) – path to the folder containing the data
n_frames (int) – number of frames in each video
transform (callable, optional) – Optional transform to be applied on a sample.
- read_images(selected_folder, use_transform)
Read images from a folder (single video consisting of n_frames images)
- Parameters
selected_folder (str) – path to the folder containing the images
use_transform (callable) – transform to apply on the images
- Returns
images tensor (n_frames, 3, 224, 224)
- Return type
Tensor
- class woods.datasets.LSA64(flags, training_hparams)
Bases:
woods.datasets.Multi_Domain_Dataset
LSA64: A Dataset for Argentinian Sign Language dataset
This dataset is composed of videos of different signers.
You can read more on the data itself and it’s provenance from it’s source:
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
This dataset need to be downloaded and preprocessed. This can be done with the download.py script
- Ressources:
- N_STEPS = 5001
The number of training steps taken for this dataset
- Type
int
- CHECKPOINT_FREQ = 500
The frequency of results update
- Type
int
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 20
number of frames in each video
- Type
int
- PRED_TIME = [19]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [3, 224, 224]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 64
The size of the output
- Type
int
- DATA_PATH = 'LSA64'
path to the folder containing the data
- Type
str
- ENVS = ['001-002', '003-004', '005-006', '007-008', '009-010']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [0, 1, 2, 3, 4]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
- get_class_weight()
Compute class weight for class balanced training
- Returns
list of weights of length OUTPUT_SIZE
- Return type
list
- class woods.datasets.HHAR(flags, training_hparams)
Bases:
woods.datasets.Multi_Domain_Dataset
Heterogeneity Acrivity Recognition Dataset (HHAR)
This dataset is composed of wearables measurements during different activities. The goal is to classify those activities (stand, sit, walk, bike, stairs up, stairs down).
You can read more on the data itself and it’s provenance from it’s source:
- Parameters
flags (argparse.Namespace) – argparse of training arguments
training_hparams (dict) – dictionnary of training hyper parameters coming from the hyperparams.py file
Note
This dataset need to be downloaded and preprocessed. This can be done with the download.py script
- Ressources:
- N_STEPS = 5001
The number of training steps taken for this dataset
- Type
int
- CHECKPOINT_FREQ = 100
The frequency of results update
- Type
int
- SETUP = 'seq'
The setup of the dataset (‘seq’ or ‘step’)
- Type
string
- TASK = 'classification'
The type of prediction task (‘classification’ of ‘regression’)
- Type
string
- SEQ_LEN = 500
The sequence length of the dataset
- Type
int
- PRED_TIME = [499]
The time steps where predictions are made
- Type
list
- INPUT_SHAPE = [6]
The shape of the input (excluding batch size and time dimension)
- Type
int
- OUTPUT_SIZE = 6
The size of the output
- Type
int
- DATA_PATH = 'HHAR/HHAR.h5'
Path to the file containing the data
- Type
str
- ENVS = ['nexus4', 's3', 's3mini', 'lgwatch', 'gear']
The environments of the dataset
- Type
list
- SWEEP_ENVS = [0, 1, 2, 3, 4]
The environments that should be used for testing (One at a time). These will be the test environments used in the sweeps
- Type
list
woods.hyperparams module
Defining hyper parameters and their distributions for HPO
Summary
Functions:
ANDMask objective hparam definition |
|
Spurious Fourier model hparam definition |
|
Basic Fourier model hparam definition |
|
CAP model hparam definition |
|
CAP model hparam definition |
|
ERM objective hparam definition |
|
Fish objective hparam definition |
|
IGA objective hparam definition |
|
IRM objective hparam definition |
|
LSA64 model hparam definition |
|
LSA64 model hparam definition |
|
SANDMask objective hparam definition |
|
SD objective hparam definition |
|
SEDFx model hparam definition |
|
SEDFx model hparam definition |
|
Spurious Fourier model hparam definition |
|
Spurious Fourier model hparam definition |
|
TCMNIST_seq model hparam definition |
|
TCMNIST_seq model hparam definition |
|
TCMNIST_step model hparam definition |
|
TCMNIST_step model hparam definition |
|
TMNIST model hparam definition |
|
TMNIST model hparam definition |
|
VREx objective hparam definition |
|
Get the model related hyper parameters |
|
Get the objective related hyper parameters |
|
Get training related hyper parameters (class_balance, weight_decay, lr, batch_size) |
Reference
- woods.hyperparams.get_training_hparams(dataset_name, seed, sample=False)
Get training related hyper parameters (class_balance, weight_decay, lr, batch_size)
- Parameters
dataset_name (str) – dataset that is gonna be trained on for the run
seed (int) – seed used if hyper parameter is sampled
sample (bool, optional) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- Raises
NotImplementedError – Dataset name not found
- Returns
Dictionnary with hyper parameters values
- Return type
dict
- woods.hyperparams.Basic_Fourier_train(sample)
Basic Fourier model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.Spurious_Fourier_train(sample)
Spurious Fourier model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.TMNIST_train(sample)
TMNIST model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.TCMNIST_seq_train(sample)
TCMNIST_seq model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.TCMNIST_step_train(sample)
TCMNIST_step model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.CAP_train(sample)
CAP model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.SEDFx_train(sample)
SEDFx model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.LSA64_train(sample)
LSA64 model hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.get_model_hparams(dataset_name)
Get the model related hyper parameters
Each dataset has their own model hyper parameters definition
- Parameters
dataset_name (str) – dataset that is gonna be trained on for the run
seed (int) – seed used if hyper parameter is sampled
sample (bool, optional) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- Raises
NotImplementedError – Dataset name not found
- Returns
Dictionnary with hyper parameters values
- Return type
dict
- woods.hyperparams.Basic_Fourier_model()
Spurious Fourier model hparam definition
- woods.hyperparams.Spurious_Fourier_model()
Spurious Fourier model hparam definition
- woods.hyperparams.TMNIST_model()
TMNIST model hparam definition
- woods.hyperparams.TCMNIST_seq_model()
TCMNIST_seq model hparam definition
- woods.hyperparams.TCMNIST_step_model()
TCMNIST_step model hparam definition
- woods.hyperparams.CAP_model()
CAP model hparam definition
- woods.hyperparams.SEDFx_model()
SEDFx model hparam definition
- woods.hyperparams.LSA64_model()
LSA64 model hparam definition
- woods.hyperparams.get_objective_hparams(objective_name, seed, sample=False)
Get the objective related hyper parameters
Each objective has their own model hyper parameters definitions
- Parameters
objective_name (str) – objective that is gonna be trained on for the run
seed (int) – seed used if hyper parameter is sampled
sample (bool, optional) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- Raises
NotImplementedError – Objective name not found
- Returns
Dictionnary with hyper parameters values
- Return type
dict
- woods.hyperparams.ERM_hyper(sample)
ERM objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.IRM_hyper(sample)
IRM objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.VREx_hyper(sample)
VREx objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.SD_hyper(sample)
SD objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.IGA_hyper(sample)
IGA objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.ANDMask_hyper(sample)
ANDMask objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.Fish_hyper(sample)
Fish objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
- woods.hyperparams.SANDMask_hyper(sample)
SANDMask objective hparam definition
- Parameters
sample (bool) – If ‘’True’’, hyper parameters are gonna be sampled randomly according to their given distributions. Defaults to ‘’False’’ where the default value is chosen.
woods.model_selection module
Defining the model selection strategies
Summary
Functions:
Perform the IID validation model section on a single training run with NO TEST ENVIRONMENT and returns the results |
|
Ensure that a path of a nested dictionnary exists. |
|
Get the best set of hyperparameters for a given a record from a sweep and a selection method |
|
Get the test accuracy that will be chosen through the selection method for a given a record from a sweep |
|
Perform the test domain validation model section on a single training run and returns the results |
|
Perform the train domain validation model section on a single training run and returns the results |
Reference
- woods.model_selection.ensure_dict_path(dict, key)
Ensure that a path of a nested dictionnary exists.
If it does, return the nested dictionnary within. If it does not, create a nested dictionnary and return it.
- Parameters
dict (dict) – Nested dictionnary to ensure a path
key (str) – Key to ensure has a dictionnary in
- Returns
nested dictionnary
- Return type
dict
- woods.model_selection.get_best_hparams(records, selection_method)
Get the best set of hyperparameters for a given a record from a sweep and a selection method
The way model selection is performed is by computing the validation accuracy of all training checkpoints. The definition of the validation accuracy is given by the selection method. Then using these validation accuracies, we choose the best checkpoint and report the corresponding hyperparameters.
- Parameters
records (dict) – Dictionary of records from a sweep
selection_method (str) – Selection method to use
- Returns
flags of the chosen model training run for the all trial seeds dict: hyperparameters of the chosen model for all trial seeds dict: validation accuracy of the chosen model run for all trial seeds dict: test accuracy of the chosen model run for all trial seeds
- Return type
dict
- woods.model_selection.get_chosen_test_acc(records, selection_method)
Get the test accuracy that will be chosen through the selection method for a given a record from a sweep
The way model selection is performed is by computing the validation accuracy of all training checkpoints. The definition of the validation accuracy is given by the selection method. Then using these validation accuracies, we choose the best checkpoint and report the test accuracy linked to that checkpoint.
- Parameters
records (dict) – Dictionary of records from a sweep
selection_method (str) – Selection method to use
- Returns
validation accuracy of the chosen models averaged over all trial seeds float: variance of the validation accuracy of the chosen models accross all trial seeds float: test accuracy of the chosen models averaged over all trial seeds float: variance of the test accuracy of the chosen models accross all trial seeds
- Return type
float
- woods.model_selection.IID_validation(records)
Perform the IID validation model section on a single training run with NO TEST ENVIRONMENT and returns the results
- The model selection is performed by computing the average all domains accuracy of all training checkpoints and choosing the highest one.
best_step = argmax_{step in checkpoints}( mean(train_envs_acc) )
- Parameters
records (dict) – Dictionary of records from a single training run
- Returns
validation accuracy of the best checkpoint of the training run float: validation accuracy of the best checkpoint of the training run
- Return type
float
Note
This is for ONLY for sweeps with no test environments.
- woods.model_selection.train_domain_validation(records)
Perform the train domain validation model section on a single training run and returns the results
- The model selection is performed by computing the average training domains accuracy of all training checkpoints and choosing the highest one.
best_step = argmax_{step in checkpoints}( mean(train_envs_acc) )
- Parameters
records (dict) – Dictionary of records from a single training run
- Returns
validation accuracy of the best checkpoint of the training run float: test accuracy of the best checkpoint (highest validation accuracy) of the training run
- Return type
float
- woods.model_selection.test_domain_validation(records)
Perform the test domain validation model section on a single training run and returns the results
- The model selection is performed with the test accuracy of ONLY THE LAST CHECKPOINT OF A TRAINING RUN, so this function simply returns the test accuracy of the last checkpoint.
best_step = test_acc[-1]
- Parameters
records (dict) – Dictionary of records from a single training run
- Returns
validation accuracy of the training run, which is also the test accuracyof the last checkpoint float: test accuracy of the last checkpoint
- Return type
float
woods.models module
Defining the architectures used for benchmarking algorithms
Summary
Classes:
A simple LSTM model with self attention |
|
Convolutional Recurrent Neural Network |
|
The EEGNet model |
|
A simple LSTM model |
|
Hand-tuned architecture for extracting representation from MNIST images |
|
A simple LSTM model taking inputs from a CNN. |
|
The DEEP4 model |
Functions:
Return the dataset class with the given name |
Reference
- woods.models.get_model(dataset, model_hparams)
Return the dataset class with the given name
- Parameters
dataset (str) – name of the dataset
model_hparams (dict) – model hyperparameters
- class woods.models.deep4(dataset, model_hparams)
Bases:
torch.nn.modules.module.Module
The DEEP4 model
- This is from the Braindecode package:
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
- input_size
The size of the inputs to the model (for a single time step).
- Type
int
- output_size
The size of the outputs of the model (number of classes).
- Type
int
- seq_len
The length of the sequences.
- Type
int
- forward(input, time_pred)
- training: bool
- class woods.models.EEGNet(dataset, model_hparams)
Bases:
torch.nn.modules.module.Module
The EEGNet model
This is a really small model ~3k parameters.
- This is from the Braindecode package:
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
- input_size
The size of the inputs to the model (for a single time step).
- Type
int
- output_size
The size of the outputs of the model (number of classes).
- Type
int
- seq_len
The length of the sequences.
- Type
int
- forward(input, time_pred)
- training: bool
- class woods.models.MNIST_CNN(input_shape)
Bases:
torch.nn.modules.module.Module
Hand-tuned architecture for extracting representation from MNIST images
- This was adapted from :
In our context, it is used to extract the representation from the images which are fed to a recurrent model such as an LSTM
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.
- EMBED_DIM = 32
Size of the output respresentation
- Type
int
- CNN_OUT_DIM = 288
Size of the representation after convolution, but before FCC layers
- Type
int
- forward(x)
Forward pass through the model
- Parameters
x (torch.Tensor) – The input to the model.
- Returns
The output representation of the model.
- Return type
torch.Tensor
- training: bool
- class woods.models.LSTM(dataset, model_hparams, input_size=None)
Bases:
torch.nn.modules.module.Module
A simple LSTM model
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.
- state_size
The size of the hidden state of the LSTM.
- Type
int
- recurrent_layers
The number of recurrent layers stacked on each other.
- Type
int
The number of hidden layers of the classifier MLP (after LSTM).
- Type
int
The width of the hidden layers of the classifier MLP (after LSTM).
- Type
int
Notes
All attributes need to be in the model_hparams dictionary.
- forward(input, time_pred)
Forward pass of the model
- Parameters
input (torch.Tensor) – The input to the model.
time_pred (torch.Tensor) – The time prediction of the input.
- Returns
The output of the model.
- Return type
torch.Tensor
- initHidden(batch_size, device)
Initialize the hidden state of the LSTM with a normal distribution
- Parameters
batch_size (int) – The batch size of the model.
device (torch.device) – The device to use.
- training: bool
- class woods.models.MNIST_LSTM(dataset, model_hparams, input_size=None)
Bases:
torch.nn.modules.module.Module
A simple LSTM model taking inputs from a CNN. (see: MNIST_CNN)
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.
- state_size
The size of the hidden state of the LSTM.
- Type
int
- recurrent_layers
The number of recurrent layers stacked on each other.
- Type
int
The number of hidden layers of the classifier MLP (after LSTM).
- Type
int
The width of the hidden layers of the classifier MLP (after LSTM).
- Type
int
Notes
All attributes need to be in the model_hparams dictionary.
- forward(input, time_pred)
Forward pass of the model
- Parameters
input (torch.Tensor) – The input to the model.
time_pred (torch.Tensor) – The time prediction of the input.
- Returns
The output of the model.
- Return type
torch.Tensor
- initHidden(batch_size, device)
Initialize the hidden state of the LSTM with a normal distribution
- Parameters
batch_size (int) – The batch size of the model.
device (torch.device) – The device to use.
- training: bool
- class woods.models.ATTN_LSTM(dataset, model_hparams, input_size=None)
Bases:
torch.nn.modules.module.Module
A simple LSTM model with self attention
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
input_size (int, optional) – The size of the input to the model. Defaults to None. If None, the input size is calculated from the dataset.
- state_size
The size of the hidden state of the LSTM.
- Type
int
- recurrent_layers
The number of recurrent layers stacked on each other.
- Type
int
The number of hidden layers of the classifier MLP (after LSTM).
- Type
int
The width of the hidden layers of the classifier MLP (after LSTM).
- Type
int
Notes
All attributes need to be in the model_hparams dictionary.
- forward(input, time_pred)
Forward pass of the model
- Parameters
input (torch.Tensor) – The input to the model.
time_pred (torch.Tensor) – The time prediction of the input.
- Returns
The output of the model.
- Return type
torch.Tensor
- initHidden(batch_size, device)
Initialize the hidden state of the LSTM with a normal distribution
- Parameters
batch_size (int) – The batch size of the model.
device (torch.device) – The device to use.
- training: bool
- class woods.models.CRNN(dataset, model_hparams, input_size=None)
Bases:
torch.nn.modules.module.Module
Convolutional Recurrent Neural Network
- This is taken inspired from the repository:
But here we use the ResNet50 architecture pretrained on ImageNet, and we use the ATTN_LSTM model on top of the outputs of the ResNet50 to make predictions.
- Parameters
dataset (Multi_Domain_Dataset) – dataset that we will be training on
model_hparams (dict) – The hyperparameters for the model.
The size of the first hidden layer of the CNN embedding.
- Type
int
The size of the second hidden layer of the CNN embedding.
- Type
int
- CNN_embed_dim
The size of the CNN embedding.
- Type
int
- forward(input, time_pred)
Forward pass through CRNN :param input: Tensor, shape [batch_size, seq_len, input_size] :param time_pred: Tensor, time prediction indexes
- training: bool
woods.objectives module
Defining domain generalization algorithms
Summary
Classes:
Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary] |
|
Empirical Risk Minimization (ERM) |
|
Implementation of Fish, as seen in Gradient Matching for Domain Generalization, Shi et al. 2021. |
|
Inter-environmental Gradient Alignment From https://arxiv.org/abs/2008.01883v2 |
|
Invariant Risk Minimization (IRM) |
|
A subclass of Objective implements a domain generalization Gradients. |
|
Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary] |
|
Gradient Starvation: A Learning Proclivity in Neural Networks Equation 25 from [https://arxiv.org/pdf/2011.09468.pdf] |
|
V-REx Objective from http://arxiv.org/abs/2003.00688 |
Functions:
Return the objective class with the given name. |
Reference
- woods.objectives.get_objective_class(objective_name)
Return the objective class with the given name.
- class woods.objectives.Objective(hparams)
Bases:
torch.nn.modules.module.Module
A subclass of Objective implements a domain generalization Gradients. Subclasses should implement the following: - update - predict
- backward(losses)
Computes the Gradients for model update
Admits a list of unlabeled losses from the test domains: losses
- training: bool
- class woods.objectives.ERM(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.Objective
Empirical Risk Minimization (ERM)
- predict(all_x, ts, device)
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.IRM(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
Invariant Risk Minimization (IRM)
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.VREx(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
V-REx Objective from http://arxiv.org/abs/2003.00688
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.SD(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
Gradient Starvation: A Learning Proclivity in Neural Networks Equation 25 from [https://arxiv.org/pdf/2011.09468.pdf]
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.ANDMask(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary]
- mask_grads(tau, gradients, params)
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.IGA(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
Inter-environmental Gradient Alignment From https://arxiv.org/abs/2008.01883v2
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.Fish(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
Implementation of Fish, as seen in Gradient Matching for Domain Generalization, Shi et al. 2021.
- create_copy(device)
- update(minibatches_device, dataset, device)
- training: bool
- class woods.objectives.SANDMask(model, dataset, loss_fn, optimizer, hparams)
Bases:
woods.objectives.ERM
Learning Explanations that are Hard to Vary [https://arxiv.org/abs/2009.00329] AND-Mask implementation from [https://github.com/gibipara92/learning-explanations-hard-to-vary]
- mask_grads(tau, k, gradients, params, device)
Mask are ranged in [0,1] to form a set of updates for each parameter based on the agreement of gradients coming from different environments.
- update(minibatches_device, dataset, device)
- training: bool
woods.train module
Defining the training functions that are used to train and evaluate models
Summary
Functions:
Get accuracies for all splits using fast loaders |
|
Get accuracy and loss for a dataset that is of the seq setup |
|
Get accuracy and loss for a dataset that is of the step setup |
|
Train a model on a given dataset with a given objective |
|
Train a single training step for a model |
Reference
- woods.train.train_step(model, objective, dataset, in_loaders_iter, device)
Train a single training step for a model
- Parameters
model – nn model defined in a models.py
objective – objective we are using for training
dataset – dataset object we are training on
in_loaders_iter – iterable of iterable of data loaders
device – device on which we are training
- woods.train.train(flags, training_hparams, model, objective, dataset, device)
Train a model on a given dataset with a given objective
- Parameters
flags – flags from argparse
training_hparams – training hyperparameters
model – nn model defined in a models.py
objective – objective we are using for training
dataset – dataset object we are training on
device – device on which we are training
- woods.train.get_accuracies(objective, dataset, device)
Get accuracies for all splits using fast loaders
- Parameters
objective – objective we are using for training
dataset – dataset object we are training on
device – device on which we are training
- woods.train.get_split_accuracy_seq(objective, dataset, loader, device)
Get accuracy and loss for a dataset that is of the seq setup
- Parameters
objective – objective we are using for training
dataset – dataset object we are training on
loader – data loader of which we want the accuracy
device – device on which we are training
- woods.train.get_split_accuracy_step(objective, dataset, loader, device)
Get accuracy and loss for a dataset that is of the step setup
- Parameters
objective – objective we are using for training
dataset – dataset object we are training on
loader – data loader of which we want the accuracy
device – device on which we are training
woods.utils module
Set of utility functions used throughout the package
Summary
Functions:
Check for integrity of files from a hyper parameter sweep |
|
Returns a function that maps each index in 0, 1, ..., n-1 to a distinct RGB color; the keyword argument name must be a standard mpl colormap name. |
|
Generates the name of the output file for a training run as a function of the config |
|
Construct and export a LaTeX table from a PrettyTable. |
|
Plot results - accuracy and loss - w.r.t. |
|
Print results from a results json file :param results_path: path to a results json file coming from a training run :type results_path: str |
|
Setup the printed table that show the results at each checkpoints |
Reference
- woods.utils.get_cmap(n, name='hsv')
Returns a function that maps each index in 0, 1, …, n-1 to a distinct RGB color; the keyword argument name must be a standard mpl colormap name.
- woods.utils.plot_results(results_path)
Plot results - accuracy and loss - w.r.t. training step
- Parameters
results_path (str) – path to a results json file coming from a training run
- woods.utils.print_results(results_path)
Print results from a results json file :param results_path: path to a results json file coming from a training run :type results_path: str
- woods.utils.get_job_name(flags)
Generates the name of the output file for a training run as a function of the config
Seq setup: <objective>_<dataset>_<test_env>_H<hparams_seed>_T<trial_seed>.json Step setup: <objective>_<dataset>_<test_env>_H<hparams_seed>_T<trial_seed>_S<test_step>.json
- Parameters
flags (dict) – dictionnary of the config for a training run
- Returns
name of the output json file of the training run
- Return type
str
- woods.utils.check_file_integrity(results_dir)
Check for integrity of files from a hyper parameter sweep
- Parameters
results_dir (str) – directory where sweep results are stored
- Raises
AssertionError – If there is a sweep file missing
- woods.utils.setup_pretty_table(flags)
Setup the printed table that show the results at each checkpoints
- Parameters
flags (Namespace) – Namespace of the argparser containing the config of the training run
dataset (Multi_Domain_Dataset) – Dataset Object
- Returns
an instance of prettytable.PrettyTable
- Return type
PrettyTable
- woods.utils.get_latex_table(table, caption=None, label=None)
Construct and export a LaTeX table from a PrettyTable.
Inspired from : https://github.com/adasilva/prettytable
- Parameters
table (PrettyTable) –
caption (str, optional) – a caption for the table. Defaults to None.
label (str, optional) – a latex reference tag. Defaults to None.
- Returns
printable latex string
- Return type
str
Set of functions used to launch lists of python scripts |
|
Defining the benchmarks for OoD generalization in time-series |
|
Defining hyper parameters and their distributions for HPO |
|
Defining the model selection strategies |
|
Defining the architectures used for benchmarking algorithms |
|
Defining domain generalization algorithms |
|
Defining the training functions that are used to train and evaluate models |
|
Set of utility functions used throughout the package |
woods.scripts
woods.scripts.compile_results module
Compile resuls from a hyperparameter sweep and perform model selection strategies
See https://woods.readthedocs.io/en/latest/running_a_sweep.html to learn more about usage.
woods.scripts.download module
Directly download the preprocessed data
Summary
Functions:
Download the CAP dataset |
|
Download the HHAR dataset |
|
Download the LSA64 dataset |
|
Download the PCL dataset |
|
Download the SEDFx dataset |
Reference
- woods.scripts.download.CAP(data_path, mode)
Download the CAP dataset
- woods.scripts.download.SEDFx(data_path, mode)
Download the SEDFx dataset
- woods.scripts.download.PCL(data_path, mode)
Download the PCL dataset
- woods.scripts.download.HHAR(data_path, mode)
Download the HHAR dataset
- woods.scripts.download.LSA64(data_path, mode)
Download the LSA64 dataset
woods.scripts.fetch_and_preprocess module
This module is used to run yourself the raw download and preprocessing of the data
You can directly download the preprocessed data with the download.py module. This module is used only for transparancy of how the datasets are preprocessed. It also gives the opportunity to the most curageous to change the preprocessing approaches of the data for curiosity.
Note
The intention of releasing the benchmarks of woods is to investigate the performance of domain generalization techniques. Although some preprocessing tricks could lead to better OoD performance, this approach is not encouraged when using the WOODS benchmarks.
Summary
Classes:
Fetch the data from the PhysioNet website and preprocess it |
|
Fetch the data using moabb and preprocess it |
|
Fetch the PhysioNet Sleep-EDF Database Expanded Dataset and preprocess it |
Functions:
Fetch and preprocess the HHAR dataset |
|
Fetch the LSA64 dataset and preprocess it |
Reference
- class woods.scripts.fetch_and_preprocess.CAP(flags)
Bases:
object
Fetch the data from the PhysioNet website and preprocess it
The download is automatic but if you want to manually download:
wget -r -N -c -np https://physionet.org/files/capslpdb/1.0.0/
- Parameters
flags (argparse.Namespace) – The flags of the script
- files = [['physionet.org/files/capslpdb/1.0.0/nfle29', 'physionet.org/files/capslpdb/1.0.0/nfle7', 'physionet.org/files/capslpdb/1.0.0/nfle1', 'physionet.org/files/capslpdb/1.0.0/nfle5', 'physionet.org/files/capslpdb/1.0.0/n11', 'physionet.org/files/capslpdb/1.0.0/rbd18', 'physionet.org/files/capslpdb/1.0.0/plm9', 'physionet.org/files/capslpdb/1.0.0/nfle35', 'physionet.org/files/capslpdb/1.0.0/nfle36', 'physionet.org/files/capslpdb/1.0.0/nfle2', 'physionet.org/files/capslpdb/1.0.0/nfle38', 'physionet.org/files/capslpdb/1.0.0/nfle39', 'physionet.org/files/capslpdb/1.0.0/nfle21'], ['physionet.org/files/capslpdb/1.0.0/nfle10', 'physionet.org/files/capslpdb/1.0.0/nfle11', 'physionet.org/files/capslpdb/1.0.0/nfle19', 'physionet.org/files/capslpdb/1.0.0/nfle26', 'physionet.org/files/capslpdb/1.0.0/nfle23'], ['physionet.org/files/capslpdb/1.0.0/rbd8', 'physionet.org/files/capslpdb/1.0.0/rbd5', 'physionet.org/files/capslpdb/1.0.0/rbd11', 'physionet.org/files/capslpdb/1.0.0/ins8', 'physionet.org/files/capslpdb/1.0.0/rbd10'], ['physionet.org/files/capslpdb/1.0.0/n3', 'physionet.org/files/capslpdb/1.0.0/nfle30', 'physionet.org/files/capslpdb/1.0.0/nfle13', 'physionet.org/files/capslpdb/1.0.0/nfle18', 'physionet.org/files/capslpdb/1.0.0/nfle24', 'physionet.org/files/capslpdb/1.0.0/nfle4', 'physionet.org/files/capslpdb/1.0.0/nfle14', 'physionet.org/files/capslpdb/1.0.0/nfle22', 'physionet.org/files/capslpdb/1.0.0/n5', 'physionet.org/files/capslpdb/1.0.0/nfle37'], ['physionet.org/files/capslpdb/1.0.0/nfle3', 'physionet.org/files/capslpdb/1.0.0/nfle40', 'physionet.org/files/capslpdb/1.0.0/nfle15', 'physionet.org/files/capslpdb/1.0.0/nfle12', 'physionet.org/files/capslpdb/1.0.0/nfle28', 'physionet.org/files/capslpdb/1.0.0/nfle34', 'physionet.org/files/capslpdb/1.0.0/nfle16', 'physionet.org/files/capslpdb/1.0.0/nfle17']]
- remove_useless(flags)
Remove useless files
- string_2_label(string)
Convert string to label
- read_annotation(txt_path)
Read annotation file for the CAP dataset
- gather_EEG(flags)
Gets the intersection of common channels across all machines
- Returns
list of channels (strings)
- Return type
list
- class woods.scripts.fetch_and_preprocess.SEDFx(flags)
Bases:
object
Fetch the PhysioNet Sleep-EDF Database Expanded Dataset and preprocess it
The download is automatic but if you want to manually download:
wget -r -N -c -np https://physionet.org/files/sleep-edfx/1.0.0/
- Parameters
flags (argparse.Namespace) – The flags of the script
- remove_useless(flags)
Remove useless files
- string_2_label(string)
Convert string to label
- read_annotation(txt_path)
Read annotation file
- gather_EEG(flags)
Gets the intersection of common channels across all machines
- Returns
list of channels (strings)
- Return type
list
- woods.scripts.fetch_and_preprocess.HHAR(flags)
Fetch and preprocess the HHAR dataset
Note
You need to manually download the HHAR dataset from the source and place it in the data folder in order to preprocess it yourself:
- Parameters
flags (argparse.Namespace) – The flags of the script
- woods.scripts.fetch_and_preprocess.LSA64(flags)
Fetch the LSA64 dataset and preprocess it
Note
You need to manually download the HHAR dataset from the source and place it in the data folder in order to preprocess it yourself:
- Parameters
flags (argparse.Namespace) – The flags of the script
- class woods.scripts.fetch_and_preprocess.PCL(flags)
Bases:
object
Fetch the data using moabb and preprocess it
- Source of MOABB:
- Parameters
flags (argparse.Namespace) – The flags of the script
Note
This is hell to run. It takes a while to download and requires a lot of RAM.
- relabel(l)
Converts labels from str to int
woods.scripts.hparams_sweep module
Perform an hyper parameter sweep
See https://woods.readthedocs.io/en/latest/running_a_sweep.html for usage.
Summary
Functions:
Creates a list of commands to launch all of the training runs in the hyper parameter sweep |
Reference
- woods.scripts.hparams_sweep.make_args_list(flags)
Creates a list of commands to launch all of the training runs in the hyper parameter sweep
Heavily inspired from https://github.com/facebookresearch/DomainBed/blob/9e864cc4057d1678765ab3ecb10ae37a4c75a840/domainbed/scripts/sweep.py#L98
- Parameters
flags (dict) – arguments of the hyper parameter sweep
- Returns
list of strings terminal commands that calls the training runs of the sweep list: list of dict where dicts are the arguments for the training runs of the sweep
- Return type
list
woods.scripts.main module
Script used for the main functionnalities of the woods package
- There is 2 mode of operation:
training mode: trains a model on a given dataset with a given test environment using a given algorithm
test mode: tests an existing model on a given dataset with a given test environment using a given algorithm
- raises NotImplementedError
Some part of the code is not implemented yet
woods.scripts.visualize_results module
Visualize logs from a training run
Compile resuls from a hyperparameter sweep and perform model selection strategies |
|
Directly download the preprocessed data |
|
This module is used to run yourself the raw download and preprocessing of the data |
|
Perform an hyper parameter sweep |
|
Script used for the main functionnalities of the woods package |
|
Visualize logs from a training run |
Library
Defining domain generalization algorithms |
|
Defining the benchmarks for OoD generalization in time-series |
|
Defining hyper parameters and their distributions for HPO |
|
Defining the training functions that are used to train and evaluate models |
|
Defining the architectures used for benchmarking algorithms |
|
Defining the model selection strategies |
|
Set of functions used to launch lists of python scripts |
|
Set of utility functions used throughout the package |
Scripts
Compile resuls from a hyperparameter sweep and perform model selection strategies |
|
Directly download the preprocessed data |
|
This module is used to run yourself the raw download and preprocessing of the data |
|
Perform an hyper parameter sweep |
|
Script used for the main functionnalities of the woods package |
|
Visualize logs from a training run |