Advances and Challenges of UAV SFM MVS Photogrammetry and Remote Sensing: Short Review

Interest in Unnamed Aerial Vehicle (UAV)-sourced data and Structure-from-Motion (SfM) and Multi-View-Stereo (MVS) photogrammetry has seen a dramatic expansion over the last decade, revolutionizing the fields of aerial remote sensing and mapping. This literature review provides a summary overview on the recent developments and applications of light-weight UA V s and on the widely-accepted SfM - MVS approach. Firstly, the advantages and limitations of UAV remote sensing systems are discussed, followed by an identification of the different UA V and miniaturised sensor models applied to numerous disciplines, showing the range of systems and sensor types utilised recently. Afterwards, a concise list of advantages and challenges of UAV SfM-MVS is provided and discussed. Overall, the accuracy and quality of the SfM-MVS-derived products (e.g. orthomosaics, digital surface model) depends on the quality of the UAV data set, characteristics of the study area and processing tools used. Continued development and investigation are necessary to better determine the quality, precision and accuracy of UAV SfM-MVS derived outputs.


INTRODUCTION
Over the last decade the emergence of Unnamed Aerial Vehicle (UAV)-acquired observations has considerably contributed to the fields of aerial photogrammetry, remote sensing and mapping. Not only has the UAV technology assisted in collecting data with higher spatio-temporal resolution than before, but it has also supported the development of enhanced algorithms for photogrammetric processing and remote sensing analysis, such as the Structure-from-Motion (SfM) and Multi-View Stereo (MVS) approach (Remondino et al., 2014). In parallel to the advance of light-weight consumer-grade UAV platforms and sensor miniaturisation, a dramatic expansion in research over a wide spectrum of disciplines has also been observed.
As a complementary piece of work to previous reviews (Eltner et al., 2016, Fonstad et al., 2013, Manfreda et al., 2018, Colomina and Molina, 2014, the review presented here highlights the benefits and drawbacks of the UAV technology when applied to different photogrammetric and remote sensing applications. It presents the recent developments of various miniature remote sensing sensors as well as discusses the challenges and merits of the widely-accepted SfM-MVS process. This review constitutes an up-to-date, concise summary of the UAV based photogrammetry and remote sensing advances and considerations.

UAV PHOTOGRAMMETRY AND REMOTE SENSING APPLICATIONS
A new era of fine-scale remote sensing has emerged with the arrival of light-weight consumer-grade UAVs (<10 kg) (Berni et al., 2009, Sharma et al., 2013. UAVs are also known as aerial robots, drones, remotely piloted aircraft systems (Toth and Jóźków, 2016) and most recently defined by the UK Civil Aviation Authority as small unmanned aircrafts (SUA; CAP 393 (2019)). Originally employed by the military, such technology has notably expanded into the civil sector in the 2000's, and it has been increasingly used for numerous commercial (e.g. recreation, cinematography) and research applications including photogrammetry and remote sensing, due to their affordability and flexibility (Colomina and Molina, 2014).
Over the last two years, a range of consumer off-the-shelf (COTS) sensors, attached to UAVs creating aerial data acquisition systems, have provided observations of high spatiotemporal resolution for applications in a wide spectrum of photogrammetry and remote sensing, as listed in Table 1. Many sensors have been miniaturised and/or adapted to be fitted on a UAV platform, ranging from low-cost mass-market, amateur and professional, to sensors specifically developed for UAVs (Colomina and Molina, 2014). In particular, Van Blyenburgh (2013) identified 406 imaging and ranging instruments specifically designed for UAVs. Table 1 reports on UAV-based applications, found in literature over the last two years, based on six sensor type categories: 1) COTS (either unmodified, detecting RGB, or modified to sense visible and near infra-red radiation); 2) multispectral; 3) hyperspectral (either imaging cameras or radiometers); 4) Light Detection And Ranging (LiDAR); 5) Thermal; and 6) Synthetic Aperture Radar (SAR). Among these, COTS sensors have become the most widely used remote sensing tool to date (Torresan et al., 2017), due to their ease of use, low cost, compact size, portability, low weight and compact data storage (Rabatel et al., 2014). At the other extreme, SAR technology has been one of the most challenging to be miniaturised and fixed on lightweight UAVs (Aye et al., 2017). Nevertheless, due to the rapid technological development and growing interest in this area, the other sensors (categories 2-6 above) are expected to become common-place for data acquisition with lightweight UAVs. However, high operational costs of a few instruments (e.g. LiDAR; Torresan et al. (2017)) constitute yet a critical factor for monitoring applications that require repeated UAV surveys.  Santos et al. (2019) compared three different region-based convolutional neural networks (R-CNNs) to identify tree species from UAV-acquired RGB images. Another study in Vetrivel et al. (2018) incorporated CNNs with oblique imagery acquired from various UAV platforms to detect earthquake damages. They designed a deep learning architecture by combing a conventional supervised classification algorithm (e.g. Support Vector Machine) with a CNN to classify UAV images in undamaged and damaged regions. The aforementioned studies demonstrate the recent trend to integrate UAV-borne data with advanced deep learning technologies.

UAV ADVANCES AND LIMITATIONS
This widespread exploitation of UAV technology can be attributed to a series of favourable factors such as: data acquisition in (near) real time; lower operational cost than the cost of manned aerial surveys; user-defined temporal and spatial resolution (e.g. 5 cm spatial resolution in Figure 1); highintensity data collection and flexibility based on the sensor type on-board. Advantages compared to in-situ surveys include repeated UAV surveys a) over hazardous areas (e.g. glaciers; Dall'Asta et al. (2017)); b) over specific targeted areas (e.g. pest outbreak Lehmann et al. (2015) or wildfires Yuan et al. (2015)); and c) at optimal seasons (e.g. forest phenology; Berra et al. (2019)). Unlike satellite remote sensing, UAVs can also be particularly helpful in generating time-series data without being constrained by cloudy conditions (Torres-Sanchez et al., 2013). Also, UAV-based observations can complement multi-scale analysis from ground to airborne to satellite observations (Garzonio et al., 2017).
However, some limitations of UAVs include: a) relatively small area coverage compared to areas observed with manned aerial aircrafts and satellites; b) data acquisitions not often simultaneous to acquisitions from spaceborne sensors-this in turn can limit the analysis of multi-scale monitoring applications; c) UAV operational constrains in high winds and/or during precipitation; d) significant investment in UAV pilot training; e) time taken for in-house manufacturing of bespoke UAV systems; f) flat terrain requirement for landing in the case of fixed-wing UAVs; g) UAV payload constrains due to heavyweight sensors types and/or limited space on the aircraft's body; and e) flight endurance (Anderson andGaston, 2013, Chabot andBird, 2013). The challenge of surveying larger areas is due primarily to battery endurance, but also often imposed by civil and federal aviation laws, such as the requirement to retain the line-of-sight during operations (Torresan et al., 2017). However, to overcome these challenges scientists prefer engineering custom-built platforms to serve particular purposes. For example, Jouvet et al. (2019) manufactured a powerefficient fixed-wing UAV lasting for almost three hours flying beyond the line of sight over glaciers in Greenland. They used a bungee catapult for taking off and a net for landing for safety reasons. When engineering a UAV there is always the potential to develop a robust equipment for specific needs but cost and expertise are certainly required.
Regarding UAV acquisitions, there are some technical challenges when dealing with data of unprecedented spatial resolution, such as influence of viewing geometry, geometric, radiometric calibration and atmospheric correction (Berni et al., 2009), georeferencing and mosaicking hundreds of images typically acquired during a single UAV flight (McGwire et al., 2013). These factors can diminish the capability to generate detailed quantitative information (Kelcey and Lucieer, 2012), critical for geomorphological applications and/or time series analysis. Furthermore, a large volume of data can be acquired per study site, the size of which can increase significantly with derivation of photogrammetric and/or remote sensing products, potentially resulting in information management problems (Rychkov et al., 2012). Despite the freedom of choosing which UAV and sensor(s) can be employed, it is necessary to be careful regarding all technical and operational processes needed to generate datasets of high quality with meaningful interpretable information. Extra efforts might be necessary if using non-scientific sensors (e.g. COTS) as these require further geometric and radiometric calibration (Berra et al., 2017).

Fixed and rotary-wing UAVs
An important aspect to take into consideration when choosing a UAV is which aircraft type is optimal for a specific study site: fixed-or rotary-wing ( Figure 2). Unlike rotary-wings, fixedwing UAVs have the advantage of covering larger areas. However, the target area must be close to a relatively flat terrain with an open space zone allowing for a safe take-off and landing. Conversely, rotary-wing are less demanding, as they are manoeuvrable, easy to take-off and land (vertically) even in challenging environments such as steep rugged slopes. They can also fly at lower heights, but they cover smaller areas (Anderson andGaston, 2013, Chabot andBird, 2013). Another advantage of rotary-wing UAVs is their ability to hover over a selected target for a pre-defined time, allowing for multiple measurements (e.g. for Bidirectional Reflectance Distribution Function (BRDF) investigation (Burkart et al., 2015)). Experience with operating UAVs has shown that it is easier to set up an oblique image capture with rotary-wing than fixedwing UAVs. Rotary-wing UAVs can also offer a more flexible gimbal setup compared to fixed-wing UAVs for accommodating heavyweight sensors such as LiDAR and SAR (see example in Figure 2). As evidenced in Table 1, reported studies prefer the use of rotary-wing UAVs for the deployment of all other sensor types than COTS (i.e. categories 2-6 in Table 1).

Consumer and survey-grade UAVs
UAVs are also classified into consumer and survey-grade with respect to accuracy levels of the on-board positional sensors (Rehak and Skaloud, 2017), which is another critical aspect when choosing a UAV as a data acquisition system. Specifically, in a consumer-grade UAV the on-board Global Navigation Satellite Systems (GNSS) receiver is normally limited to single frequency and provides positional accuracy to 5 m or better (Rehak and Skaloud, 2017). Typically, a UAV autopilot unit contains a small, low-grade Micro-Electro Mechanical System-Inertial Measurement (MEMS-IMU), comprising of three-axis accelerometers, gyroscopes and magnetometers as well as a barometer. As these sensors are small in size, lightweight and inexpensive they are prone to errors (gyro drift and accelerometer bias) that accumulate rapidly over time. Survey-grade UAVs consist of high performance IMU or multiple MEMS-IMU sensors (Rehak and Skaloud, 2017) alongside dual frequency GNSS and/or augmentation with Real Time Kinematic (RTK)-GNSS receivers (Carbonneau and Dietrich, 2017). Such UAVs rely on a base ground control GNSS station (with known coordinates) which send corrections to the on-board GNSS receiver (Dall'Asta et al., 2017).
The RTK-integrated UAV allows for direct georeferencing (i.e. without the use of ground control points (GCPs) versus indirect georeferencing; see section 4.1) with the aid of the on-the-fly GNSS coordinates of the sensor exposures, thereby enabling an automatic orientation of the photogrammetric image block. Even with this emerging technology, compared to conventional airborne photogrammetric approaches with metric sensors, UAV approaches fitted with COTS still cannot provide better than dm-level positional and arc-minute orientational accuracy, independently of direct or indirect georeferencing approach, as noted in Rehak et al. (2013). Alternative approach is the use of precise point positioning (PPP) on consumer-grade UAVs with long flight duration to secure fixed ambiguities, as in Grayson et al. (2018). They proposed a global position system PPP by using satellite orbits and clock parameters from the International GNSS Service to conduct UAV on-board GPS level-arm calibration. This approach delivered cm-level horizontal precision of the UAV trajectory without the need for GCPs. Apart from the accuracy level, it is noteworthy that with respect to operational costs, consumer-grade UAVs are generally more affordable than survey-grade UAVs due to the demanding market.

UAV-BASED SFM-MVS
Together with the continuously emerging UAV technology, as discussed previously, contemporary processing approaches have been developed blending well-known photogrammetric (e.g. Triggs et al. (2000)) and computer vision (e.g. Hirschmüller (2008)) algorithms (Colomina and Molina, 2014). This blend has resulted in the current state of the Structure-from-Motion (SfM) and Multi-View Stereo (MVS) processing pipeline (Remondino et al., 2014). SfM-MVS is related to the 3D geometry of an object or a scene (structure) viewed from multiple positions (multi-view) of a moving camera (motion) (Snavely et al., 2008). The SfM-MVS pipeline has expedited the automatic generation of high spatio-temporal resolution UAV products (Remondino et al., 2014, Snavely et al., 2008, in a time-efficient, cost-effective and user-friendly manner (Fonstad et al., 2013).

Typical SfM-MVS process
According to published studies and reviews (Eltner et al., 2016, Remondino et al., 2014, Snavely et al., 2008, Westoby et al., 2012, Granshaw and Fraser, 2015, James et al., 2017, Fonstad et al., 2013, the standard SfM-MVS pipeline can be summarized into three main phases, as follows: Sparse point cloud reconstruction: Firstly, the generation of a point cloud of tie points (i.e. image observations, internal constraints, sparse point cloud) is performed, with feature-based matching, via a self-calibrating bundle adjustment without any a priori information). This step aligns acquired images and establishes relative orientation, recosntructing multi-stereo pairs based on epipolar geometry. In particular, a feature-based algorithm detects and matches corresponding points lying on epipolar lines across images. Subsets of images are incrementally aligned until the complete photogrammetric block is orientated. Outlier detection is recursively performed to eliminate erroneous point matches. The camera's interior (IOP) and exterior orientation parameters (EOPs) are simultaneously determined through iterations in a least squares sense by minimizing a global reprojection error. This quantifies the pixel differences between the initially detected corresponding points and those estimated and back-projected into all overlapping images of the photogrammetric block. Hence, space resection and intersection of every tie point is resolved and a sparse point cloud with 3D coordinates in an arbitrary coordinate system is generated. Georeferencing: Control information is necessary to scale and orientate the resultant sparse point cloud and photogrammetric block, determining the precise 3D shape of a surface. It is usually provided in the form of surveyed GCPs (indirect georefencing, IG), or obtained from the positions and/or orientations of the camera exposure stations (direct georeferencing, DG). This information is used as weighted observations (i.e. external constraints) in conjunction with the tie points (i.e. internal constraints) in a least squares bundle adjustment, thereby re-estimating the camera's IOPs, EOPs and the 3D coordinates of the sparse point cloud in the desired coordinate system. Dense point cloud (DPC) reconstruction: Given the already established epipolar geometry of the photogrammetric block from the first phase, disparities are computed at all pixels via image matching approaches, such as the semi-global matching (Hirschmuller, 2007). The pixels are back-projected to all images and triangulated (i.e. via spatial intersection) to form a 3D surface without abrupt irregularities through gradient-based and energy minimization algorithms. The SfM-MVS pipeline results in a RGB-coloured DPC that constitutes the raw form of a 3D surface representation. The georeferenced point cloud (either sparse or dense) can be exported and/or interpolated to generate a digital surface model (DSM) or a digital elevation model (DEM) without vegetation (e.g. Figure 1) or a digital terrain model representing only the bare ground. Ultimately, this model can be used to generate orthophotos and orthomosaics.
The SfM-MVS pipeline has been adopted in commercial (e.g.  Snavely et al., (2008)) software packages that offer automated routines designed for non-expert users. In recent years, Agisoft has gained popularity in the scientific community, as evidenced in Figure 3, mostly due to its userfriendly, almost "black-box", workflow (Eltner et al., 2016). The number of published studies have gradually increased relative to Pix4D, whereas there is a steady linear trend for MicMac use. It should be noted that the results in Figure 3 include UAV studies from many scientific communities, and not only photogrammetry or remote sensing.

Advantages and challenges of SfM-MVS process
The SfM-MVS pipeline has become a standard workflow for processing UAV imagery as it can handle mixed image block geometries of non-vertical, unordered and marker-less images. This is mainly attributable to the feature-based image matching algorithms, which are able to generate a high number of image correspondences (usually >1000) regardless of the different image rotations, scales and baselines within the photogrammetric block (Fonstad et al., 2013).
However, numerous recent studies have revealed the presence of systematic errors in the automatic SfM-MVS pipeline (Carbonneau and Dietrich, 2017, Eltner et al., 2016, Harwin et al., 2015, James et al., 2017, Remondino et al., 2014. Such systematic errors usually originate from image sensor characteristics, camera distortion models included within the SfM-MVS software, SfM-MVS software settings, imaging network configurations, GCP characteristics, surface texture, lightning and weather conditions, as well as overparameterisation. For instance, low image overlap might yield mismatches during the initial step of the SfM-MVS pipeline and generate discontinuities in the reconstructed sparse point cloud. This, in turn, can destabilise the bundle adjustment solution and errors can propagate into the DEMs (Harwin et al., 2015). Illumination differences are caused by either wrong exposure camera settings or variations in lighting during a UAV flight. Overexposing bright areas or under exposing dark areas can vary the distinctive properties of surface features, thereby adversely affecting the tie point detection. In addition, parallel flight lines can cause vertical systematic bowl-shape deformations on the resultant DEM. According to James and Robson (2014) these errors can be significantly reduced either by acquiring convergent images, or with the inclusion of evenly distributed GCPs into the SfM-MVS bundle adjustment. Further, Remondino et al. (2014) suggested that when GCPs constitute "ground truth" for the SfM-MVS pipeline, they should be independently surveyed, providing an estimated precision at least three times better than the expected results.
As several parameters are involved at different stages of the SfM-MVS pipeline, errors are propagated through the process (Eltner et al., 2016). Typical quality indicators of a photogrammetric process are provided from the covariance and correlation matrices computed in bundle adjustment. A large number of observations from hundreds of images and many estimated parameters in the self-calibrating bundle adjustment can impede the matrix inversion, essential for covariance estimation. Thus, another possible source of the systematic errors is over-parameterisation, which cannot be easily controlled with SfM-MVS software packages. In response, recent studies Dietrich, 2017, James et al., 2017) suggested analytical ways of quantifying the internal precision of the estimated IOPs within SfM-MVS, such as Monte Carlo tests either to derive optimal combinations of camera distortion coefficients or examination of the optimal SfM-MVS software parameters (i.e. marker/tie points accuracies) and camera distortion models.
The aforementioned examples show how various factors affect the SfM-MVS pipeline. Further details can be found in Remondino et al. (2014) and Eltner et al. (2016). Nevertheless, isolating or correcting the exact source of errors is often challenging when using "black-box" software packages, as they hardly provide well explained bundle adjustment reports.
Other challenges of SfM-MVS process include: a) lengthy processing time (although this depends upon computational power) that can only be reduced after downsampling the original high spatial resolution of UAV imagery; b) the difficulty to visualise high resolution point clouds in some GIS platforms; and c) areas with dense complex vegetation, or water bodies (with homogeneous image texture) or steep topography can often hinder the tie point detection and in turn the 3D scene reconstruction. A recent trend is the combination of SfM-MVS pipeline with deep learning CNNs to automate vegetation filtering process in point clouds as described in Gruszczyński et al. (2019). Such developments can improve the DEM reconstruction minimising vertical deformations due to very low vegetation captured with UAV.
On the other hand, a key characteristic and advantage of SfM-MVS is that during tie point detection and matching, it can overcome the aforementioned common challenges present on UAV images up to a certain degree as noted in James et al. (2017). It can also process images acquired with different camera settings or sensors (Snavely et al., 2008). As described in section 4.1, another great advantage of the SfM-MVS pipeline is the fully automated processing from feature extraction to scene geometry reconstruction and then to appealing, photo-realistic products for 3D visualisation even for non-experts.
Overall, the accuracy and quality of the SfM-derived products depends on the quality of the data set, characteristics of the study area and processing tools used (James et al., 2017).

CONCLUSION
Over the last decade, UAV remote sensing platforms have become increasingly easy and fast to deploy on an operational basis. The success of UAVs in remote sensing and mapping applications has been due to not only technological developments in UAVs (including positioning systems) and sensors, but also significant advances in data processing techniques, especially in SfM photogrammetry and computer vision. While the SfM-MVS approach has a number of advantages, it equally has a number of data collection and data processing challenges. With the continuous interest in UAV-sourced images and SfM and continuous development and investigation of the quality, precision and accuracy of outputs, this method has the potential to evolve, creating new opportunities and insights across all the sectors (science, industry and military) currently benefitting from it. Overall, the presented brief review provides to non-expert users a fundamental understanding of the advantages but also the challenges and errors that are associated with the UAV SfM photogrammetry.