Deep Learning

Creating Artificially Generated Images with PyTorch GANs

Generative Adversarial Networks (GANs) are a powerful class of deep learning models that have revolutionized the field of generative modeling. GANs have been used to generate high-quality images, videos, music, and even text. In this hands-on project, I will walk you through the process of building a GAN to generate realistic images of faces.

Project overview

Our goal in this project is to build a GAN that can generate realistic images of faces. We will be using the CelebA dataset (available on Kaggle), which contains over 200,000 images of celebrity faces. The CelebA dataset is a popular benchmark dataset for GANs, and has been used in plenty of research papers.

The GAN we will build consists of two neural networks: a generator and a discriminator. The generator network takes in random noise as input and generates fake images, while the discriminator network takes in real and fake images and tries to distinguish between them. During training, the generator tries to generate images that fool the discriminator, while the discriminator tries to correctly identify real and fake images.

We will be using PyTorch (a popular deep learning framework and the one I personally like) to implement our GAN. PyTorch has excellent support for building GANs, with many pre-built components and examples available.

Preprocessing our data

Before we can train our GAN, we need to preprocess the CelebA dataset. We will use the torchvision package to download and preprocess the data. Specifically, we will resize all the images to 64x64 pixels, normalize the pixel values to the range [-1, 1], and convert the images to tensors. In technical terms, a tensor is a multi-dimensional array of data that is used to represent complex mathematical objects in deep learning and other scientific fields. Tensors can have any number of dimensions, ranging from a scalar value of 0 to n (a tensor of rank n). In deep learning, we use tensors to represent data such as images, audio, and text, as well as the weights and activations of neural network layers.


    import torch
    import torchvision.datasets as datasets
    import torchvision.transforms as transforms

    # Define data transforms
    transform = transforms.Compose([
        transforms.Resize((64, 64)),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

    # Load the CelebA dataset
    dataset = datasets.CelebA(root='./data', download=True, transform=transform)

Building the generator

Next, we will build the generator network. The generator takes in random noise as input and generates fake images. We will use a deep convolutional neural network (CNN) architecture for the generator, consisting of several layers of transposed convolutions, batch normalization, and ReLU activation functions.


    import torch.nn as nn

    class Generator(nn.Module):
        def __init__(self, latent_dim):
            super(Generator, self).__init__()
            
            # Input layer
            self.fc = nn.Linear(latent_dim, 4 * 4 * 512)
            self.bn1 = nn.BatchNorm2d(512)
            
            # Hidden layers
            self.conv1 = nn.ConvTranspose2d(512, 256, 4, 2, 1)
            self.bn2 = nn.BatchNorm2d(256)
            self.conv2 = nn.ConvTranspose2d(256, 128, 4, 2, 1)
            self.bn3 = nn.BatchNorm2d(128)
            self.conv3 = nn.ConvTranspose2d(128, 64, 4, 2, 1)
            self.bn4 = nn.BatchNorm2d(64)
            
            # Output layer
            self.conv4 = nn.ConvTranspose2d(64, 3, 4, 2, 1)
            self.tanh = nn.Tanh()
            
        def forward(self, z):
            x = self.fc(z)
            x = x.view(-1, 512, 4, 4)
            x = self.bn1(x)
            x = nn.functional.relu(x)
            x = self.conv1(x)
            x = self.bn2(x)
            x = nn.functional.relu(x)
            x = self.conv2(x)
            x = self.bn3(x)
            x = nn.functional.relu(x)
            x = self.conv3(x)
            x = self.bn4(x)
            x = nn.functional.relu(x)
            x = self.conv4(x)
            x = self.tanh(x)
            return x

The generator takes in a noise vector of size latent_dim as input and outputs a tensor of size (batch_size, 3, 64, 64), representing our fake images.

Building the discriminator

Next, we will build the discriminator network. The discriminator takes in real and fake images as input and tries to distinguish between them. We will use a CNN architecture for the discriminator, consisting of several layers of convolutions, batch normalization, and LeakyReLU activation functions.


    class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        
        # Input layer
        self.conv1 = nn.Conv2d(3, 64, 4, 2, 1)
        self.leaky_relu1 = nn.LeakyReLU(0.2)
        
        # Hidden layers
        self.conv2 = nn.Conv2d(64, 128, 4, 2, 1)
        self.bn2 = nn.BatchNorm2d(128)
        self.leaky_relu2 = nn.LeakyReLU(0.2)
        self.conv3 = nn.Conv2d(128, 256, 4, 2, 1)
        self.bn3 = nn.BatchNorm2d(256)
        self.leaky_relu3 = nn.LeakyReLU(0.2)
        self.conv4 = nn.Conv2d(256, 512, 4, 2, 1)
        self.bn4 = nn.BatchNorm2d(512)
        self.leaky_relu4 = nn.LeakyReLU(0.2)
        
        # Output layer
        self.fc = nn.Linear(4 * 4 * 512, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.leaky_relu1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.leaky_relu2(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.leaky_relu3(x)
        x = self.conv4(x)
        x = self.bn4(x)
        x = self.leaky_relu4(x)
        x = x.view(-1, 4 * 4 * 512)
        x = self.fc(x)
        x = self.sigmoid(x)
        return x

The discriminator takes in a tensor of size (batch_size, 3, 64, 64) as input and outputs a scalar value between 0 and 1, representing the probability that the input is a real image.

Training the GAN

Yes, we're almost there. Now that we have defined our generator and discriminator networks, we can train the GAN. During training, we alternate between updating the generator and discriminator networks. For each update, we sample a batch of noise vectors from a normal distribution and use the generator to generate fake images. We also sample a batch of real images from the CelebA dataset. We then train the discriminator on these real and fake images, and train the generator to generate images that fool the discriminator.


    # Define hyperparameters
    num_epochs = 200
    batch_size = 128
    lr = 0.0002
    betas = (0.5, 0.999)
    latent_dim = 100

    # Create the generator and discriminator
    generator = Generator(latent_dim).to(device)
    discriminator = Discriminator().to(device)

    # Define loss function and optimizers
    criterion = nn.BCELoss()
    gen_optimizer = optim.Adam(generator.parameters(), lr=lr, betas=betas)
    dis_optimizer = optim.Adam(discriminator.parameters(), lr=lr, betas=betas)

    # Load the dataset
    transform = transforms.Compose([
        transforms.Resize(64),
        transforms.CenterCrop(64),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
    ])
    dataset = datasets.ImageFolder(root='celeba', transform=transform)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

    # Train the GAN
    for epoch in range(num_epochs):
        for i, (real_images, _) in enumerate(dataloader):
            batch_size = real_images.shape[0]
            
            # Train discriminator
            real_labels = torch.ones(batch_size, 1).to(device)
            fake_labels = torch.zeros(batch_size, 1).to(device)
            
            # Train on real images
            real_images = real_images.to(device)
            dis_optimizer.zero_grad()
            real_outputs = discriminator(real_images)
            dis_loss_real = criterion(real_outputs, real_labels)
            dis_loss_real.backward()
            
            # Train on fake images
            noise = torch.randn(batch_size, latent_dim, 1, 1).to(device)
            fake_images = generator(noise)
            fake_outputs = discriminator(fake_images.detach())
            dis_loss_fake = criterion(fake_outputs, fake_labels)
            dis_loss_fake.backward()
            
            # Update discriminator
            dis_loss = dis_loss_real + dis_loss_fake
            dis_optimizer.step()
            
            # Train generator
            gen_optimizer.zero_grad()
            noise = torch.randn(batch_size, latent_dim, 1, 1).to(device)
            fake_images = generator(noise)
            fake_outputs = discriminator(fake_images)
            gen_loss = criterion(fake_outputs, real_labels)
            gen_loss.backward()
            
            # Update generator
            gen_optimizer.step()
            
            # Print loss every 100 steps
            if (i+1) % 100 == 0:
                print('Epoch [{}/{}], Step [{}/{}], Discriminator Loss: {:.4f}, Generator Loss: {:.4f}'
                    .format(epoch+1, num_epochs, i+1, len(dataloader), dis_loss.item(), gen_loss.item()))

In the training code above, we define the hyperparameters for the GAN, create the generator and discriminator networks, define the loss function and optimizers, and load the CelebA dataset using PyTorch's ImageFolder and DataLoader classes.

We then train the GAN for a specified number of epochs. For each epoch, we iterate over the batches of the dataset and update the discriminator and generator networks. We first train the discriminator on a batch of real and fake images, computing the loss using binary cross entropy. We then update the discriminator's parameters using the Adam optimizer. Next, we train the generator to generate images that fool the discriminator, again using binary cross entropy to compute the loss. Finally, we update the generator's parameters using the Adam optimizer. At the end of each epoch, we print the discriminator and generator losses.

Finally, we save the generator's parameters to a file named generator.pth. We then generate 64 sample images using the generator and save them to a file named generated_images.png.

        
    # Save the generator
    torch.save(generator.state_dict(), 'generator.pth')

    # Generate sample images
    generator.eval()
    with torch.no_grad():
        noise = torch.randn(64, latent_dim, 1, 1).to(device)
        fake_images = generator(noise)
        save_image(fake_images, 'generated_images.png')

Use cases

There are several use cases for generating face images using GANs.

Entertainment: CGI is often used in the industry to create realistic human character in movies and video games. However, GANs could potentially be used to generate high quality human faces.
Facial recognition: GANs could be used to generate synthetic images of faces, which could then be used to augment existing datasets for training purposes. This could help improve the accuracy and robustness of these models, especially in scenarios where there is limited training data available.
Forensics:: Potential use to generate realistic images of suspects based on eyewitness descriptions. This could help law enforcement agencies identify suspects and solve crimes more quickly.

Conclusions

GANs have become an increasingly popular area of research in the field of deep learning, and have been used for a variety of tasks such as image generation, style transfer, and anomaly detection. While the basic principles of GANs are relatively simple, training them can be challenging and requires careful tuning of hyperparameters and network architecture.

Despite these challenges, GANs offer a powerful tool for generating high-quality, realistic images, and have the potential to revolutionize the field of computer vision. With further research and development, GANs may be able to generate not just images, but entire virtual worlds, with applications in fields such as gaming, virtual reality, and simulation.

To conform to standards of decency, I have chosen not to include the face images I generated in this blog post. However, if you encounter any issues with the code, please feel free to contact me for assistance.