Tokiwa-17

Created2022-02-27|cv

图像特征类型 A和B处于平坦区域，没有什么确切的特征，它们所在的位置有很多种可能； C和D要相对简单一些，它们是建筑物的边缘，我们可以找到一个大致的位置，但是要定位到精确的位置仍然很难。所以边缘是更好的特征，但还不够好。 E和F是建筑的一些角落，可以很容易地发现它们的位置，因为对于建筑物角落这个图像片段，我们不管朝哪个方向移动，这个片段看起来都会不一样。蓝色矩形表示一个平坦区域，在各方向移动，窗口内像素值没有变化；黑色矩形表示一个边缘特征（Edges），如果沿着垂直方向移动(梯度方向)，像素值会发生改变；如果沿着边缘移动(平行于边缘) ，像素值不会发生变化；对于红色矩形框来说，它是一个角（Corners），不管你把它朝哪个方向移动，像素值都会发生很大变化。图像特征提供了图像丰富的信息。角点特征是图像中较好的特征，比边缘特征更好地用于定位。在图像的所有区域中，那些在所有方向上做微小移动，像素值变化都很大的区域，就是角点特征所在的区域。 Harris 角点检测器角点是两条边缘的交点，它表示两条边方向改变的地方，所以角点在任意一个方向上做微小移动，都会引起该区域的梯度 ...

EnlightenGAN

Created2022-02-23|image-enhancement

Method As shown in Fig. 2, our proposed method adopts an attention-guided U-Net as the generator and uses the dual-discriminator to direct the global and local information. We also use a self feature preserving loss to guide the training process and maintain the textures and structures. U-Net generator is implemented with 8 convolutional blocks. Each block consists of 2 3×33\times 33×3 convolutional layers, followed by LeakyReLu and a batch normalization layer. At the upsampling stage, replac ...

RANSAC

Created2022-02-21|3d vision

1.基本矩阵的求解方法直接线性变换法对于一对匹配点 x1=[u1,v1,1]T，x2=[u2,v2,1]Tx_1=[u_1, v_1, 1]^T， x_2=[u_2, v_2, 1]^Tx1=[u1,v1,1]T，x2=[u2,v2,1]T，根据对极约束 x2TFx1=0x_2^TFx_1=0x2TFx1=0, (u1v11)[F11F12F13F21F22F23F31F32F33](u2v21)=0\left(\begin{array}{lll} u_{1} & v_{1} & 1 \end{array}\right)\left[\begin{array}{lll} F_{11} & F_{12} & F_{13} \\ F_{21} & F_{22} & F_{23} \\ F_{31} & F_{32} & F_{33} \end{array}\right]\left(\begin{array}{c} u_{2} \\ v_{2} \\ 1 \end{array}\right)=0 (u1 ...

Bilateral_Grid

Created2022-02-21|cv

Bilateral filter A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on Euclidean distance of pixels, but also on the radiometric differences. bf(I)p=1Wp∑q∈N(p)Gσs(∥p−q∥)Gσr(∣Ip−Iq∣)IqWp=∑q∈N(p)Gσs(∥p−q∥)Gσr(∣Ip−Iq∣)\begin{aligned} b f(I)_{\mathbf{p}} &=\frac{1}{W_{\mathbf{p}}} \sum_{\mathbf{q} \in N(\mathbf{p})} G_{\sigma_{\mathrm{s}}}(\|\mathbf{p}-\ma ...

event-representation

Created2022-02-20|Event Representation

one may categorize event representation roughly into 4 modalities. spike processing such as SNN natively support sparse asynchronous data difficult to train require specialized hardware analytical event representations task-specific: do not generalize to a wide range of applications intermediary representation to be paired with machine learning methods in synchronous form. be transformed into a proxy 2d image-like or 3d video frame-like representation——“proxy frames” intensity image re ...

COLMAP

Created2022-02-02|Structure from Motion

COLMAP Quickstart COLMAP provides an automatic reconstruction tool that simply takes a folder of input images and produces a sparse and dense reconstruction in a workspace folder. Reconstruction > Automatic Reconstruction if your images are located in path/to/project/images, you could select path/to/project as a workspace folder and after running the automatic reconstruction tool, the folder would look similar to this: Structure-from-Motion Structure-from-Motion (SfM) is the process of ...

calibration

Created2022-02-02|3D-vision

单目标定相机坐标转换世界坐标系：也称为测量坐标系，是一个三维直角坐标系，以其为基准可以描述相机和待测物体的空间位置。相机坐标系：坐标原点为相机的光心位置，X 轴和Y 轴分别平行于图像坐标系的X轴和 Y 轴，Z 轴为相机的光轴。图像坐标系：坐标原点为CCD 图像平面的中心，X轴和Y 轴分别平行于图像平面的两条垂直边。单位（毫米）像素坐标系：坐标原点为图像平面的左上角顶点，X 轴和Y 轴分别平行于图像物理坐标系的 X 轴和Y 轴。单位（像素）针孔相机模型相机将三维世界中的坐标点映射到二维图像平面的过程能够用一个几何模型进行描述。这个模型有很多种，其中最简单的称为针孔模型。也就是有如下关系： Zcf=XcX′=YcY′\frac{Z_c}{f} = \frac{X_c}{X'} = \frac{Y_c}{Y'} fZc=X′Xc=Y′Yc 世界→\rightarrow→ 相机可以理解为相机放的位置跟世界坐标原点位置不同，而且相机还会有角度上的偏差（pitch, yaw, roll) 相机→\rightarrow→ 图像图像→\rightarro ...

DiffPool

Created2022-01-31|GNN

Hierarchical Graph Representation Learning Abstract However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs. DIFFPOOL, a diferentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various GNN architectures. the input nodes at the layer lll GNN module correspond to the clusters learned at the layer l−1l - 1l−1 GNN module. 1. Introduction This lack of hierarchical structure is especia ...

OANet

Created2022-01-30|Structure from Motion

Learning Two-View Correspondences 1. Introduction Until recently, most of geometric matching pipeline focus on learning local feature detectors and descriptors. Previous works exploited PointNet-like architecture and Context Normalization(PointCN) CONS apply MLP on each point individually and cannot capture the local context. 邻居像素的也有类似的运动→\rightarrow→ 有利于outlier rejection Context Normalization编码全局信息，忽略了不同点之间的特性 One of the challenges in mitigating the limitations above: sparse matches ...

Graph-based

Created2022-01-27|Event Representation

Graph-based Asynchronous Event Processing 1. Introduction Since the output of an event camera is a sparse asynchronous events stream, most works transform events stream into: regular 2D event frames 3D voxel grids 丢失了事件的稀疏性、把事件的时间戳量化 Event-by-event processint: SNN Time-surface-based methods 对调参敏感、难以训练当前基于事件的GNN仍然是分批处理事件，at the cost of discarding the low latency nature of events data. Contributions graph-based recursive algorithm a novel incremental graph convolution an event-specific ...

SIFT

Created2022-01-26

SIFT 尺度不变特征变换 https://www.cnblogs.com/wangguchangqing/p/4853263.html 如何知道两张图片包含相同的信息？下图是第一张图片的某个点映射到第二个图片的具有相同语义的点 1. 建立高斯差分金字塔左图同样大小的图片为一组，每一组图片有很多层第一组图片使用不同尺度（σ\sigmaσ）的高斯核进行卷积得到的模拟近大远小，高斯核的作用：近处清晰远处模糊第二组图片是第一组图片进行降采样得到其余组以此类推右图在同一组内两层图片相减得到Difference of Gaussian(DOG), 高斯差分金字塔论文中给出的建议值： O=[log2(min(M,N))]−3O = [log_2(min(M, N))] - 3 O=[log2(min(M,N))]−3 O是应该有多少组 M, N是原图片的宽和高 S=n+3S = n + 3 S=n+3 每组有S层 n是希望从多少张图片中提取特征比如五张图片差分后得到4张，然后因为要在尺度空间中求极值，所以需要求导，最上和最下的图片 ...

TransFill

Created2022-01-26|inpainting

1. Introduction Much research has been devoted to improving imag inpainting either by image self-similarity or deep generative models. 这些方法从non-hole区域获取语义信息或者从大量图片中学习。 failed in cases when holes are large, or the expected contents inside hole regions have complicated semantic depth, texture. These problems can be addressed if there happens to be a second reference image of the same scene that exposes some desired image content. reffered to as reference-guided image inpainting. target imag ...