Harris角点
图像特征类型
A和B处于平坦区域,没有什么确切的特征,它们所在的位置有很多种可能;
C和D要相对简单一些,它们是建筑物的边缘,我们可以找到一个大致的位置,但是要定位到精确的位置仍然很难。所以边缘是更好的特征,但还不够好。
E和F是建筑的一些角落,可以很容易地发现它们的位置,因为对于建筑物角落这个图像片段,我们不管朝哪个方向移动,这个片段看起来都会不一样。
蓝色矩形表示一个平坦区域,在各方向移动,窗口内像素值没有变化;
黑色矩形表示一个边缘特征(Edges),如果沿着垂直方向移动(梯度方向),像素值会发生改变;如果沿着边缘移动(平行于边缘) ,像素值不会发生变化;
对于红色矩形框来说,它是一个角(Corners),不管你把它朝哪个方向移动,像素值都会发生很大变化。
图像特征提供了图像丰富的信息。角点特征是图像中较好的特征,比边缘特征更好地用于定位。
在图像的所有区域中,那些在所有方向上做微小移动,像素值变化都很大的区域,就是角点特征所在的区域。
Harris 角点检测器
角点是两条边缘的交点,它表示两条边方向改变的地方,所以角点在任意一个方向上做微小移动,都会引起该区域的梯度 ...
EnlightenGAN
Method
As shown in Fig. 2, our proposed method adopts an attention-guided U-Net as the generator and uses the dual-discriminator to direct the global and local information.
We also use a self feature preserving loss to guide the training process and maintain the textures and structures.
U-Net generator is implemented with 8 convolutional blocks.
Each block consists of 2 3×33\times 33×3 convolutional layers, followed by LeakyReLu and a batch normalization layer.
At the upsampling stage, replac ...
RANSAC
1.基本矩阵的求解方法
直接线性变换法
对于一对匹配点 x1=[u1,v1,1]T,x2=[u2,v2,1]Tx_1=[u_1, v_1, 1]^T, x_2=[u_2, v_2, 1]^Tx1=[u1,v1,1]T,x2=[u2,v2,1]T,根据对极约束 x2TFx1=0x_2^TFx_1=0x2TFx1=0,
(u1v11)[F11F12F13F21F22F23F31F32F33](u2v21)=0\left(\begin{array}{lll}
u_{1} & v_{1} & 1
\end{array}\right)\left[\begin{array}{lll}
F_{11} & F_{12} & F_{13} \\
F_{21} & F_{22} & F_{23} \\
F_{31} & F_{32} & F_{33}
\end{array}\right]\left(\begin{array}{c}
u_{2} \\
v_{2} \\
1
\end{array}\right)=0
(u1 ...
Bilateral_Grid
Bilateral filter
A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images.
This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on Euclidean distance of pixels, but also on the radiometric differences.
bf(I)p=1Wp∑q∈N(p)Gσs(∥p−q∥)Gσr(∣Ip−Iq∣)IqWp=∑q∈N(p)Gσs(∥p−q∥)Gσr(∣Ip−Iq∣)\begin{aligned}
b f(I)_{\mathbf{p}} &=\frac{1}{W_{\mathbf{p}}} \sum_{\mathbf{q} \in N(\mathbf{p})} G_{\sigma_{\mathrm{s}}}(\|\mathbf{p}-\ma ...
event-representation
one may categorize event representation roughly into 4 modalities.
spike processing such as SNN
natively support sparse asynchronous data
difficult to train
require specialized hardware
analytical event representations
task-specific: do not generalize to a wide range of applications
intermediary representation
to be paired with machine learning methods in synchronous form.
be transformed into a proxy 2d image-like or 3d video frame-like representation——“proxy frames”
intensity image re ...
COLMAP
COLMAP
Quickstart
COLMAP provides an automatic reconstruction tool that simply takes a folder of input images and produces a sparse and dense reconstruction in a workspace folder.
Reconstruction > Automatic Reconstruction
if your images are located in path/to/project/images, you could select path/to/project as a workspace folder and after running the automatic reconstruction tool, the folder would look similar to this:
Structure-from-Motion
Structure-from-Motion (SfM) is the process of ...
calibration
单目标定
相机坐标转换
世界坐标系:也称为测量坐标系,是一个三维直角坐标系,以其为基准可以描述相机和待测物体的空间位置。
相机坐标系:坐标原点为相机的光心位置,X 轴和Y 轴分别平行于图像坐标系的X轴和 Y 轴,Z 轴为相机的光轴。
图像坐标系: 坐标原点为CCD 图像平面的中心,X轴和Y 轴分别平行于图像平面的两条垂直边。单位(毫米)
像素坐标系:坐标原点为图像平面的左上角顶点,X 轴和Y 轴分别平行于图像物理坐标系的 X 轴和Y 轴。单位(像素)
针孔相机模型
相机将三维世界中的坐标点映射到二维图像平面的过程能够用一个几何模型进行描述。这个模型有很多种,其中最简单的称为针孔模型。
也就是有如下关系:
Zcf=XcX′=YcY′\frac{Z_c}{f} = \frac{X_c}{X'} = \frac{Y_c}{Y'}
fZc=X′Xc=Y′Yc
世界→\rightarrow→ 相机
可以理解为相机放的位置跟世界坐标原点位置不同,而且相机还会有角度上的偏差(pitch, yaw, roll)
相机→\rightarrow→ 图像
图像→\rightarro ...
DiffPool
Hierarchical Graph Representation Learning
Abstract
However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs.
DIFFPOOL, a diferentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various GNN architectures.
the input nodes at the layer lll GNN module correspond to the clusters learned at the layer l−1l - 1l−1 GNN module.
1. Introduction
This lack of hierarchical structure is especia ...
OANet
Learning Two-View Correspondences
1. Introduction
Until recently, most of geometric matching pipeline focus on learning local feature detectors and descriptors.
Previous works exploited PointNet-like architecture and Context Normalization(PointCN)
CONS
apply MLP on each point individually and cannot capture the local context.
邻居像素的也有类似的运动→\rightarrow→ 有利于outlier rejection
Context Normalization编码全局信息,忽略了不同点之间的特性
One of the challenges in mitigating the limitations above:
sparse matches ...
Graph-based
Graph-based Asynchronous Event Processing
1. Introduction
Since the output of an event camera is a sparse asynchronous events stream, most works transform events stream into:
regular 2D event frames
3D voxel grids
丢失了事件的稀疏性、把事件的时间戳量化
Event-by-event processint:
SNN
Time-surface-based methods
对调参敏感、难以训练
当前基于事件的GNN仍然是分批处理事件,at the cost of discarding the low latency nature of events data.
Contributions
graph-based recursive algorithm
a novel incremental graph convolution
an event-specific ...
SIFT
SIFT 尺度不变特征变换
https://www.cnblogs.com/wangguchangqing/p/4853263.html
如何知道两张图片包含相同的信息?
下图是第一张图片的某个点映射到第二个图片的具有相同语义的点
1. 建立高斯差分金字塔
左图
同样大小的图片为一组,每一组图片有很多层
第一组图片使用不同尺度(σ\sigmaσ)的高斯核进行卷积得到的
模拟近大远小,高斯核的作用:近处清晰远处模糊
第二组图片是第一组图片进行降采样得到
其余组以此类推
右图
在同一组内两层图片相减得到Difference of Gaussian(DOG), 高斯差分金字塔
论文中给出的建议值:
O=[log2(min(M,N))]−3O = [log_2(min(M, N))] - 3
O=[log2(min(M,N))]−3
O是应该有多少组
M, N是原图片的宽和高
S=n+3S = n + 3
S=n+3
每组有S层
n是希望从多少张图片中提取特征
比如五张图片差分后得到4张,然后因为要在尺度空间中求极值,所以需要求导,最上和最下的图片 ...
TransFill
1. Introduction
Much research has been devoted to improving imag inpainting either by image self-similarity or deep generative models.
这些方法从non-hole区域获取语义信息或者从大量图片中学习。
failed in cases when holes are large, or the expected contents inside hole regions have complicated semantic depth, texture.
These problems can be addressed if there happens to be a second reference image of the same scene that exposes some desired image content.
reffered to as reference-guided image inpainting.
target imag ...