get started with gans for image-to-image translation
an image domain is a set of images with a similar characteristics. for example, an image domain can be a group of images acquired in certain lighting conditions or images with a common set of noise distortions.
image-to-image translation is the task of transferring styles and characteristics from one image domain to another. the source domain is the domain of the starting image. the target domain is the desired domain after translation. applications of domain translation for three sample image domains include:
application | source domain | target domain |
---|---|---|
day-to-dusk style conversion | images acquired in the daytime | images acquired at dusk |
image denoising | images with noise distortion | images without visible noise |
super-resolution | low resolution images | high resolution images |
select a gan
you can perform image-to-image translation using deep learning generative adversarial networks (gans). a gan consists of a generator network and one or more discriminator networks that are trained simultaneously to maximize the overall performance. the objective of the generator network is to generate realistic images in the translated domain that cannot be distinguished from images in the original domain. the objective of discriminator networks is to correctly classify original training data as real and generator-synthesized images as fake.
the type of gan depends on the training data.
supervised gans have a one-to-one mapping between images in the source and target domains. for an example, see (computer vision toolbox). in this example, the source domain consists of images captured of street scenes. the target domain consists of categorical images representing the semantic segmentation maps. the data set provides a ground truth segmentation map for every input training image.
unsupervised gans do not have a one-to-one mapping between images in the source and target domains. for an example, see . in this example, the source and target domains consist of images captured in daytime and dusk conditions, respectively. however, the scene content of the daytime and dusk images differs, so the daytime images do not have a corresponding dusk image with identical scene content.
create gan networks
image processing toolbox™ offers functions that enable you to create popular gan networks. you can optionally modify the networks by changing properties such as the number of downsampling operations and the type of activation and normalization. the table describes the functions that enable you to create and modify gan networks.
network | creation and modification functions |
---|---|
pix2pixhd generator network [1] | a pix2pixhd gan performs supervised learning. the network consists of a single generator and single discriminator. create a pix2pixhd generator network using the . add a local enhancer to a pix2pixhd network using the function. |
cyclegan generator network [2] | a cyclegan network performs unsupervised learning. the network consists of two generators and two discriminators. the first generator takes images from domain a and generates images in domain b. the corresponding discriminator takes images generated by the first generator and real images in domain b, and attempts to correctly classify the images as real and fake. conversely, the second generator takes images from domain b and generates images in domain a. the corresponding discriminator takes images generated by the second generator and real images in domain a, and attempts to correctly classify the images as fake and real. create a cyclegan generator network using the function. |
unit generator network [3] | an unsupervised image-to-image translation (unit) gan performs unsupervised learning. the network consists of one generator and two discriminators. the generator takes images in both domains, a and b. the generator returns four output images: two translated images (a-to-b and b-to-a), and two self-reconstructed images (a-to-a and b-to-b). the first discriminator takes a real and a generated image from domain a and returns the likelihood that the image is real. similarly, the second discriminator takes a real and a generated image from domain b and returns the likelihood that the image is real. create a unit generator network using the function. perform image-to-image translation on a trained unit network using the function. |
patchgan discriminator network [4] | a patchgan discriminator network can serve as the discriminator network for pix2pixhd, cyclegan, and unit gans, as well as custom gans. create a patchgan discriminator network using the function. the discriminator decides at a patch-level whether an image is real or fake. by operating on a patch instead of pixels, the patchgan focuses on the general style of the input rather than the specific content. you can also use the
|
some networks require additional modification beyond the options available in the network creation functions. for example, you may want to replace the addition layers with depth concatenation layers, or you may want the initial leaky relu layer of a unit network to have a scale factor other than 0.2. to refine an existing gan network, you can use deep network designer (deep learning toolbox). for more information, see build networks with deep network designer (deep learning toolbox).
if you need a network that is not available through the built-in creation functions, then you can create custom gan networks from modular components. first, create the encoder and decoder modules, then combine the modules using the function. you can optionally include a bridge connection, skip connections, or additional layers at the end of the network. for more information, see .
train gan network
to train gan generator and discriminator networks, you must use a custom training loop. there are several steps involved in preparing a custom training loop. for an example that shows the complete workflow, see (deep learning toolbox).
create the generator and discriminator networks.
create one or more datastores that read, preprocess, and augment training data. for more information, see (deep learning toolbox). then, create a (deep learning toolbox) object for each datastore that manages the mini-batching of observations in a custom training loop.
define the model gradients function for each network. the function takes as input the network and a mini-batch of input data, and returns the gradients of the loss. optionally, you can pass extra arguments to the gradients function (for example, if the loss function requires extra information), or return extra arguments (for example, the loss values). for more information, see (deep learning toolbox).
define the loss functions. certain types of loss functions are commonly used for image-to-image translation applications, although the implementation of each loss can vary.
adversarial loss is commonly used by generator and discriminator networks. this loss relies on the pixelwise or patchwise difference between the correct classification and the predicted classification by the discriminator.
cycle consistency loss is commonly used by unsupervised generator networks. this loss is based on the principle that an image translated from one domain to another, then back to the original domain, should be identical to the original image.
specify training options such as the solver type and the number of epochs. for more information, see (deep learning toolbox).
create the custom training loop that loops over mini-batches in every epoch. the loop reads each mini-batch of data, evaluates the model gradients using the (deep learning toolbox) function, and updates the network parameters.
optionally, include display functions such as plots of scores or batches of generated images that enable you to monitor the training progress. for more information, see (deep learning toolbox).
references
[1]
[2]
[3]
[4]
see also
| | | | | |
related examples
- (computer vision toolbox)
more about
- (deep learning toolbox)
- (deep learning toolbox)
- (deep learning toolbox)
- (deep learning toolbox)
- (deep learning toolbox)