“The core of robot research is: navigation and positioning, path planning, obstacle avoidance, and multi-sensor fusion. There are several kinds of positioning technology, do not care, only care about vision. Vision technology uses “eyes” and can be divided into: monocular, binocular, multi-eye, RGB-D, the latter three can make the image have depth, these eyes can also be called VO (visual odometry: monocular or stereo) , an introduction given by Wikipedia: In robotics and computer vision problems, visual odometry is a method to determine the position and attitude of a robot by analyzing and processing related image sequences.
The core of robot research is: navigation and positioning, path planning, obstacle avoidance, and multi-sensor fusion. There are several kinds of positioning technology, do not care, only care about vision. Vision technology uses “eyes” and can be divided into: monocular, binocular, multi-eye, RGB-D, the latter three can make the image have depth, these eyes can also be called VO (visual odometry: monocular or stereo) , an introduction given by Wikipedia: In robotics and computer vision problems, visual odometry is a method to determine the position and attitude of a robot by analyzing and processing related image sequences.
Nowadays, due to the rapid development of digital image processing and computer vision technology, more and more researchers use cameras as sensing sensors for fully autonomous mobile robots. This is mainly because the original ultrasonic or infrared sensor has limited amount of perceptual information and poor robustness, and the visual system can make up for these shortcomings. The real world is three-dimensional, and the image projected on the camera lens (CCD/CMOS) is two-dimensional. The ultimate goal of visual processing is to extract relevant three-dimensional world information from the perceived two-dimensional image.
The basic composition of the system: CCD, PCI, PC and its peripherals.
A line of silicon imaging elements, a photosensitive element and a charge transfer device are arranged on a substrate, through the sequential transfer of charges, the video signals of multiple pixels are time-divisionally and sequentially taken out, such as the resolution of images collected by an area array CCD sensor. The ratio can be from 32×32 to 1024×1024 pixels etc.
video digital signal processor
The image signal is generally a two-dimensional signal. An image is usually composed of 512×512 pixels (of course, sometimes there are 256×256, or 1024×1024 pixels), and each pixel has 256 gray levels, or 3×8bit, There are 16M colors of red, yellow and blue, and one image has 256KB or 768KB (for color) data. In order to complete the sensing, preprocessing, segmentation, description, recognition and interpretation of visual processing, the above-mentioned main mathematical operations can be summarized as:
(1) Point processing is often used for contrast enhancement, density nonlinear correction, threshold processing, pseudo-color processing, etc. The input data of each pixel is mapped to the output data of the pixel through a certain relationship. For example, logarithmic transformation can achieve contrast expansion in dark areas.
(2) The operation of two-dimensional convolution is often used in image smoothing, sharpening, contour enhancement, spatial filtering, standard template matching calculations, etc. If the M×M convolution kernel matrix is used to convolve the entire image, M2 multiplications and (M2-1) additions are required to obtain the output result of each pixel. Small convolution sums also require a lot of multiply-add operations and memory accesses.
(3) Two-dimensional orthogonal transform Commonly used two-dimensional orthogonal transforms include FFT, Walsh, Haar and KL transforms, etc., which are often used in image enhancement, restoration, two-dimensional filtering, data compression, etc.
(4) Coordinate transformation is often used for image magnification, rotation, movement, registration, geometric correction, and image reconstruction from photographic values.
(5) Statistical calculation such as calculation of density histogram distribution, mean value and covariance matrix, etc. These statistics calculations are often performed when performing histogram equalization, area calculations, classification, and KL transforms.
The working principle of visual navigation and positioning system
Simply put, it is to optically process the environment around the robot. First, the camera is used to collect image information, compress the collected information, and then feed it back to a learning subsystem composed of neural networks and statistical methods. The subsystem links the collected image information with the actual position of the robot to complete the autonomous navigation and positioning function of the robot.
1) Camera calibration algorithm: 2D-3D mapping parameters.
The traditional camera calibration mainly includes Faugeras calibration method, Tscai two-step method, direct linear transformation method, Zhang Zhengyou plane calibration method and Weng iteration method. Self-calibration includes Kruppa equation-based self-calibration method, hierarchical stepwise self-calibration method, absolute quadric-based self-calibration method and Pollefeys’ modulo constraint method. Visual calibration includes Ma Songde’s three-orthogonal translation method, Li Hua’s plane orthogonal calibration method and Hartley’s rotation to find internal parameters.
2) Machine vision and image processing:
a. Preprocessing: ashing, noise reduction, filtering, binarization, edge detection. . .
b. Feature extraction: feature space to parameter space mapping. Algorithms are HOUGH, SIFT, SURF.
c. Image segmentation: RGB-HIS.
d. Image description recognition
3) Localization algorithm: The filter-based localization algorithm mainly includes KF, SEIF, PF, EKF, UKF and so on.
Monocular vision and odometry fusion methods can also be used. Taking the odometer readings as auxiliary information, the trigonometric method is used to calculate the coordinate positions of the feature points in the current robot coordinate system. According to the three-dimensional coordinates of the feature points in the current camera coordinate system and its world coordinates in the map, the pose of the camera in the world coordinate system is estimated. This reduces the cost of the sensor, eliminates the accumulated error of the odometer, and makes the positioning result more accurate. In addition, compared with the calibration between cameras in stereo vision, this method only needs to calibrate the parameters of the cameras, which improves the efficiency of the system.
The basic process of positioning algorithm:
The simple algorithm process can be easily implemented based on OpenCV.
The video stream obtained by the camera (mainly grayscale images, the images in stereo VO can be either color or grayscale), the images obtained by the recording camera at t and t+1 are It and It+1, The internal parameters of the camera, obtained through camera calibration, can be calculated as fixed quantities by matlab or opencv.
Calculate the position + pose of the camera for each frame
●Get image It, It+1
●Distort the acquired image
● Perform feature detection on the image It through the FAST algorithm, and track these features into the image It+1 through the KLT algorithm. If the tracking features are lost and the number of features is less than a certain threshold, the feature detection will be performed again
● Estimation of the essential matrix of two images by 5-point algorithm with RANSAC
●Evaluate R,t by calculating the essential matrix
●Evaluate the scale information, and finally determine the rotation matrix and translation vector