1. Introduction

  1. Much research has been devoted to improving imag inpainting either by image self-similarity or deep generative models.


    failed in cases when holes are large, or the expected contents inside hole regions have complicated semantic depth, texture.

  2. These problems can be addressed if there happens to be a second reference image of the same scene that exposes some desired image content.

    reffered to as reference-guided image inpainting.

    • target image: image with holes
    • source image: used as references
  3. Why reference-guided problem remains challenging?

    • the hole regions could be very large

    • uncalibrated camera to freely translate from src image to tgt image.

      induce large parallax

    • assumption: no more than two photos

    • there may exist regions in the source image that do not exist in target image


  4. multi-homography fusion pipeline

    • Assumption: there may be multiple depth planes inside the hole.


    Given a target and a source image:

    1. estimate the matched feature points between the 2 images
    2. cluster the inliers according to their estimated depths in the target image
    3. for each cluster estimate a single homography

3. Method

system pipeline

Note that M indicates the hole regions with value one, and elsewhere with zero.

  2. propose multiple global homographies using the multi-homography proposal module and locally adjust color and spatial misalignments in each pro

    posal using our Color-Spatial Transformer (CST)

  3. Then we merge each proposal with the output Ig from a single-image inpainting model using Single-Proposal Fusion (SPF), and finally selectively blend all the proposals.

3.1 multi-homography proposals


  • compute the monocular depth DtD_t of the non-hole region ItMI_t^M, and cluster the feature matching points into N sub-groups using the depth values.

  • Eacah estimated homography HiH_i will align different regions within the hole.

  • SIFT: extract features

  • OANet: outlier rejection

  • estimate the depth map DtD_t from ItMI_t ^M using a deep learning based monocular depth estimator.

  • We then cluster those points into a partitin of N subsets {Ptj}\{P_t^j\}​ by their depth.

  • RANSAC 对每个子集和全集计算homography matrices, 得到了N+1个homography matrices.

    然后warp得到了一系列转换后的source images

3.2 Color-Spatial Transformation Module


  • we propse to learn the transformations in a lower resolution, and obtain the full-resolution coeffcientss using up-sampling.

  • Color Transformation

    • 学习一个仿射变换IsiI_s^i to IsciI_{sc}^i

    Aci=[Kcibci]RW×H×3×4A_{c}^{i}=\left[\begin{array}{ll} K_{c}^{i} & b_{c}^{i} \end{array}\right] \in \mathbb{R}^{W \times H \times 3 \times 4}

    ​ Formally, for each pixel at location p,

    Isci(p)=Kci(p)Isi(p)+bci(p)I_{s c}^{i}(p)=K_{c}^{i}(p) I_{s}^{i}(p)+b_{c}^{i}(p)

    • deep bilateral filtering
  • Spatial Transfromation(ST)