TransFill

1. Introduction

Much research has been devoted to improving imag inpainting either by image self-similarity or deep generative models.

这些方法从non-hole区域获取语义信息或者从大量图片中学习。

failed in cases when holes are large, or the expected contents inside hole regions have complicated semantic depth, texture.
These problems can be addressed if there happens to be a second reference image of the same scene that exposes some desired image content.

reffered to as reference-guided image inpainting.
- target image: image with holes
- source image: used as references
Why reference-guided problem remains challenging?
- the hole regions could be very large
- uncalibrated camera to freely translate from src image to tgt image.
  
  induce large parallax
- assumption: no more than two photos
- there may exist regions in the source image that do not exist in target image
  
  因为通过网络或是其它方式采集到的图片曝光时间、光照条件都不一样
multi-homography fusion pipeline
- Assumption: there may be multiple depth planes inside the hole.
Proposal

Given a target and a source image:
1. estimate the matched feature points between the 2 images
2. cluster the inliers according to their estimated depths in the target image
3. for each cluster estimate a single homography

system pipeline

Note that M indicates the hole regions with value one, and elsewhere with zero.

target图片打上掩码
propose multiple global homographies using the multi-homography proposal module and locally adjust color and spatial misalignments in each pro

posal using our Color-Spatial Transformer (CST)
Then we merge each proposal with the output Ig from a single-image inpainting model using Single-Proposal Fusion (SPF), and finally selectively blend all the proposals.

multi-homography

compute the monocular depth $D_t$ of the non-hole region $I_t^M$ , and cluster the feature matching points into N sub-groups using the depth values.
Eacah estimated homography $H_i$ will align different regions within the hole.
SIFT: extract features
OANet: outlier rejection
estimate the depth map $D_t$ from $I_t ^M$ using a deep learning based monocular depth estimator.
We then cluster those points into a partitin of N subsets $\{P_t^j\}$ by their depth.
RANSAC 对每个子集和全集计算homography matrices, 得到了N+1个homography matrices.

然后warp得到了一系列转换后的source images

we propse to learn the transformations in a lower resolution, and obtain the full-resolution coeffcientss using up-sampling.
Color Transformation
- 学习一个仿射变换 $I_s^i$ to $I_{sc}^i$
$A_{c}^{i}=\left[\begin{array}{ll} K_{c}^{i} & b_{c}^{i} \end{array}\right] \in \mathbb{R}^{W \times H \times 3 \times 4}$

Formally, for each pixel at location p,

$I_{s c}^{i}(p)=K_{c}^{i}(p) I_{s}^{i}(p)+b_{c}^{i}(p)$
- deep bilateral filtering
Spatial Transfromation(ST)