Method

image-20220223123440232

  1. As shown in Fig. 2, our proposed method adopts an attention-guided U-Net as the generator and uses the dual-discriminator to direct the global and local information.
  2. We also use a self feature preserving loss to guide the training process and maintain the textures and structures.
  3. U-Net generator is implemented with 8 convolutional blocks.
    • Each block consists of 2 3×33\times 3 convolutional layers, followed by LeakyReLu and a batch normalization layer.
    • At the upsampling stage, replace the standard deconvolutional layer with one bilinear upsampling layer + one convolutional layer to mitigate the checkboard artifacts.

A. Global-Local Discriminators

  • a small bright region in an overall dark background, the global image discriminator alone is often unable to provide the desired adaptivity.

  • In addition to the image-level global discriminator, we add a local discriminator by taking randomly cropped patches from both output and real normal-light images, and learning to distinguish which are real.

    Ensures all local patches of an enhanced images look like realistic normal-light ones

    • 对于global discriminator, we utilize the relativistic discriminator structure

      image-20220223125150266

      replace the sigmoid function with the least-square GAN loss

      LDGlobal =ExrPreal [(DRa(xr,xf)1)2]+ExfPfake [DRa(xf,xr)2]\mathcal{L}_{D}^{\text {Global }}=\mathbb{E}_{x_{r} \sim \mathbb{P}_{\text {real }}}\left[\left(D_{R a}\left(x_{r}, x_{f}\right)-1\right)^{2}\right]\\ +\mathbb{E}_{x_{f} \sim \mathbb{P}_{\text {fake }}}\left[D_{R a}\left(x_{f}, x_{r}\right)^{2}\right]

      LGGlobal =ExfPfake [(DRa(xf,xr)1)2]+ExrPreal [DRa(xr,xf)2]\mathcal{L}_{G}^{\text {Global }}=\mathbb{E}_{x_{f} \sim \mathbb{P}_{\text {fake }}}\left[\left(D_{R a}\left(x_{f}, x_{r}\right)-1\right)^{2}\right] \\ +\mathbb{E}_{x_{r} \sim \mathbb{P}_{\text {real }}}\left[D_{R a}\left(x_{r}, x_{f}\right)^{2}\right]

    • For the local discriminator, we randomly crop 5 patches from the output and real images each time.

B. Self Feature Preserving Loss

pretrained VGG \rightarrow perceptual loss: 但是perceptual loss 对intensity不敏感。

new loss:

LSFP(IL)=1Wi,jHi,jx=1Wi,jy=1Hi,j(ϕi,j(IL)ϕi,j(G(IL)))2\mathcal{L}_{S F P}\left(I^{L}\right)=\frac{1}{W_{i, j} H_{i, j}} \sum_{x=1}^{W_{i, j}} \sum_{y=1}^{H_{i, j}}\left(\phi_{i, j}\left(I^{L}\right)-\phi_{i, j}\left(G\left(I^{L}\right)\right)\right)^{2}

  • ILI^L: input low-light image
  • G(IL)G(I^L): the generator’s enhanced output.
  • ϕi,j\phi_{i,j} : the feature learned by VGG-16 pre-trained on ImageNet.

overall loss function

Loss=LSFPGlobal+LSFPLocal+LGGlobal+LGLocalLoss = \mathcal{L}^{Global}_{SFP} + \mathcal{L}^{Local}_{SFP} + \mathcal{L}^{Global}_{G}+ \mathcal{L}^{Local}_{G}

C. U-Net Generator Guided with Self-Regularized Attention

  1. Intuitively, in a low-light image of spatially varying light condition, we always want to enhance the dark regions more than bright regions.

  2. We take the illumination channel I of the input RGB image, normalize it to [0,1][0, 1], and then use 1I1 - I as our self-regularized attention map.

    and then resize to fit each feature map and multiply it with all intermediate feature maps.