vision.middlebury.edu/stereo/eval3

Middlebury Stereo Evaluation - Version 3

#page { display:none; } #noscript { display:inline; background-color:crimson; color:white; font-size:14px; font-weight:bold; } Please enable javascript to use the site.

Mouseover the table cells to see the produced disparity map. Clicking a cell will blink the ground truth for comparison. To change the table type, click the links below. For more information, please see the description of new features.

Submit and evaluate your own results.

Set:	test densetest sparsetraining densetraining sparse
Metric:	bad 0.5 bad 1.0 bad 2.0 bad 4.0 avgerr rms A50 A90 A95 A99 time time/MP time/GD
Mask:	nonocc all
plot selected show invalid Reset sort Reference list

Left View Crusade

bad 1.0 error

bad 1.0 (%)

Weight

Date

Name

Res

Avg

Austr

AustrP

Bicyc2

Class

ClassE

Compu

Crusa

CrusaP

Djemb

DjembL

Hoops

Livgrm

Nkuba

Plants

Stairs

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 250
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 1.5
nd: 256
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 410
im0	im1
GT
nonocc

MP: 5.9
nd: 320
im0	im1
GT
nonocc

MP: 5.5
nd: 570
im0	im1
GT
nonocc

MP: 5.6
nd: 320
im0	im1
GT
nonocc

MP: 5.2
nd: 450
im0	im1
GT
nonocc

06/06/24

245

MoCha-V2

11.4

6.31

5.31

5.96

7.55

20.6

7.70

18.4

18.6

3.87

9.98

21.5

11.8

17.4

13.6

7.30

11/13/23

223

Selective-IGEV

11.4

6.63

4.77

7.39

9.05

23.4

10.7

15.8

13.9

4.86

13.5

23.0

10.8

14.3

11.5

13.5

11/10/22

195

DLNR

11.8

7.09

5.52

8.46

8.30

15.8

12.4

12.5

12.6

4.67

14.9

24.2

12.1

21.3

13.2

11.4

02/21/24

234

ClearDepth

12.1

8.35

6.69

7.73

8.76

22.3

11.3

12.4

13.0

4.96

12.5

26.8

10.6

20.5

14.4

10.9

01/31/24

230

HART

12.6

7.27

5.40

7.44

8.71

17.8

10.7

11.7

12.2

4.83

19.5

28.7

17.1

20.3

12.6

20.6

10/28/23

218

SAMTormer

12.7

7.87

6.37

6.46

8.21

16.6

10.6

14.1

13.5

4.98

16.8

27.8

14.1

25.0

14.0

15.0

06/13/22

168

EAI-Stereo

12.9

8.13

6.85

7.63

9.45

15.9

12.7

17.6

16.0

4.91

19.6

27.3

13.1

19.9

10.7

14.1

03/06/24

238

MFMstereo

13.0

8.61

7.22

9.95

8.51

22.3

12.0

16.1

15.1

4.85

14.6

28.2

13.5

21.1

11.1

11.9

03/04/24

236

AEACV

13.4

13.6

7.02

6.99

13.8

17.7

9.13

16.7

15.5

5.35

12.5

21.7

14.7

23.4

16.7

11.9

03/04/24

237

ET_Stereo

13.5

8.51

6.85

7.85

9.80

23.3

10.7

17.9

17.1

4.47

14.3

27.3

16.3

23.4

11.3

12.3

10/09/23

216

EGLCR-Stereo

13.6

10.8

6.20

7.89

12.1

28.3

12.8

15.9

16.0

4.86

14.7

27.0

12.8

22.9

12.4

11.6

02/28/24

235

AKD_Stereo

13.7

8.90

7.22

9.32

8.27

22.8

19.7

15.4

16.2

4.81

14.0

28.7

12.7

21.8

11.3

13.7

06/22/23

208

IGEV-Stereo

13.8

8.03

6.82

6.16

9.69

19.4

8.95

30.5

25.1

5.66

11.9

21.3

14.3

19.5

9.95

11.4

11/10/21

160

CREStereo

14.0

10.3

8.60

10.9

12.0

17.2

11.6

17.6

17.5

6.09

17.5

25.6

15.3

18.7

14.7

14.1

10/30/23

220

LoS

14.2

11.7

9.43

9.94

12.5

17.1

9.74

17.4

16.1

6.82

20.3

25.4

15.3

22.0

14.2

13.8

06/05/24

244

MGS-Stereo

14.3

7.69

7.73

8.43

12.1

24.3

11.3

18.2

19.1

6.08

16.1

29.3

14.2

19.9

13.5

19.9

07/26/21

154

RAFT-Stereo

15.1

8.87

7.38

8.64

9.77

22.3

13.3

21.3

20.3

5.24

18.3

30.0

15.7

25.0

12.5

19.2

11/07/22

194

ConvStereo

15.7

11.2

9.42

10.9

12.8

17.0

13.5

20.1

19.0

6.50

22.3

27.5

15.2

20.7

18.9

19.9

11/16/23

225

LoS_RVC

16.3

15.0

9.30

10.7

14.6

26.2

13.8

20.4

17.3

7.15

22.4

27.1

15.7

22.9

19.1

15.4

10/01/22

180

CREStereo++_RVC

16.5

10.1

8.43

10.9

15.5

20.0

12.9

24.6

26.2

6.40

18.0

26.3

18.0

23.6

14.8

16.0

08/13/23

213

Any-RAFT

16.8

11.0

8.89

10.5

12.8

21.8

19.8

19.1

16.6

7.15

23.5

32.0

20.5

23.7

16.3

22.2

02/05/24

231

CAS

17.1

11.2

10.8

11.1

13.4

20.7

12.5

17.7

16.9

10.3

35.2

27.5

18.3

25.8

19.5

20.1

03/05/21

143

LocalExp-RC

20.0

11.9

12.6

15.0

31.4

21.4

18.1

22.4

10.5

24.0

28.6

20.1

35.1

24.3

22.6

05/26/18

NOSS_ROB

20.1

12.3

11.3

13.2

14.8

30.7

20.3

22.9

24.8

10.2

23.1

28.7

20.9

32.0

22.7

22.6

12/19/19

103

CRLE

20.3

11.7

11.0

13.9

14.8

30.1

20.7

21.1

25.9

11.1

24.3

33.4

20.4

33.7

20.7

22.8

10/16/22

191

LMCR-Stereo

20.4

15.4

11.2

13.0

16.8

22.7

23.6

24.4

22.9

9.00

31.0

33.4

21.5

27.8

22.4

22.8

08/30/20

127

LE_PC

20.5

12.1

11.9

13.4

14.5

28.6

21.4

20.8

26.0

10.5

21.9

34.7

18.5

35.8

22.7

23.3

03/09/19

3DMST-CM

20.5

13.1

11.7

12.8

16.2

30.7

22.6

20.8

21.4

13.5

26.0

30.6

18.5

33.9

20.9

27.1

08/12/23

212

UGRU

20.5

8.79

8.43

6.75

19.9

26.3

10.9

41.2

44.4

124

6.15

17.4

41.1

20.3

24.4

18.8

17.0

11/21/21

162

Gwc_CoAtRS

20.6

13.1

12.9

13.7

15.1

20.9

23.4

25.1

23.8

12.0

32.7

36.8

20.5

26.7

20.4

25.6

10/28/20

131

HITNet

20.7

13.2

10.8

12.7

14.4

30.2

20.8

24.2

23.4

8.85

49.3

146

32.5

19.6

26.2

20.5

28.3

07/16/20

119

HLocalExp-CM

20.8

11.9

11.3

12.7

15.9

31.7

20.7

21.1

25.0

11.4

25.8

33.0

21.5

35.6

23.2

20.0

We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatch-like 3D slanted window formulation, where raw matching costs within a window are computed by MC-CNN-acrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.

06/22/17

LocalExp

21.0

12.8

11.5

11.7

14.7

30.8

21.1

20.1

26.2

11.1

24.6

33.7

22.9

35.6

24.1

25.1

03/11/24

239

StereoIM

21.3

9.03

7.90

5.22

17.8

59.2

143

11.6

44.5

109

44.8

126

6.95

15.3

41.8

15.9

22.5

12.9

27.2

11/12/23

222

SNDR

21.3

16.9

13.5

13.4

14.2

27.5

24.6

27.2

23.0

20.7

130

24.0

28.9

27.8

30.0

16.0

15.3

The method that estimate optimal parameters for MRF stereo can not be directly used to estimate parameters for local expansion moves stereo. To estimate regularization weight for local expansion moves stereo, we propose the probabilistic mixture models for slanted patch matching terms and curvature regularization terms.

06/23/21

151

ERW-LocalExp

21.4

12.9

11.3

12.2

18.0

31.9

24.7

17.6

26.2

11.3

24.3

33.2

21.6

35.7

24.8

27.5

07/23/21

153

HBP_ISP

21.5

14.0

12.5

13.8

17.6

34.1

23.0

19.0

24.2

12.9

25.9

34.9

24.1

33.4

24.2

19.7

05/31/18

CBMV_ROB

21.6

9.62

10.1

14.4

16.8

23.2

23.8

21.5

22.2

10.3

36.0

33.2

25.8

38.0

110

24.5

24.1

08/10/23

211

CroCo-Stereo

21.6

15.2

10.7

7.31

22.9

50.4

125

12.4

33.5

36.6

7.99

18.6

38.2

17.1

21.5

25.0

28.4

01/24/17

3DMST

22.0

12.4

10.6

15.8

18.4

29.9

24.2

21.3

28.0

12.4

27.4

37.1

23.1

34.1

20.5

26.4

11/15/23

224

D2Stereo

22.4

11.7

8.05

13.1

13.2

53.3

130

22.8

27.6

28.0

6.25

23.5

31.5

19.8

27.9

43.3

132

19.2

10/02/22

182

raft+_RVC

22.6

24.5

9.62

12.9

23.0

45.9

110

21.2

32.1

23.8

8.18

24.6

38.5

25.0

28.3

18.1

26.7

03/10/17

MC-CNN+TDSR

22.7

15.1

13.3

15.7

16.4

31.5

26.6

25.2

29.8

12.6

26.2

41.8

23.6

28.8

24.6

19.2

A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.

05/12/16

PMSC

22.8

11.6

10.0

16.2

18.2

29.5

26.1

22.7

28.0

12.4

27.2

38.0

25.8

35.0

23.2

29.0

10/03/22

184

GEStereo_RVC

22.8

17.8

8.84

14.2

21.0

41.2

23.6

29.6

27.2

8.89

24.5

41.1

27.1

27.2

18.2

35.1

11/11/22

197

AnPM

23.0

15.7

11.7

17.0

40.5

24.0

25.0

19.0

11.1

29.4

32.5

24.7

31.5

39.4

116

27.3

04/22/21

144

LESC

23.1

12.8

13.2

13.5

18.0

34.9

23.0

28.2

25.1

12.9

26.0

37.4

24.9

36.6

101

24.6

26.9

10/03/22

185

iRaftStereo_RVC

24.0

18.0

16.0

110

15.4

19.3

27.9

25.8

29.1

26.6

14.4

36.3

38.4

24.3

29.0

25.5

27.7

12/07/23

226

4D-IteraStereo

24.2

11.4

10.3

13.2

16.8

36.1

25.9

44.0

107

46.0

133

9.51

25.4

34.8

18.1

29.4

23.5

23.3

11/06/22

193

24.4

17.3

13.7

15.3

16.4

29.0

25.4

28.6

25.7

23.9

146

26.2

41.5

27.3

37.3

106

17.5

32.9

01/12/19

EHCI_net

24.4

13.2

8.93

20.8

112

40.0

161

30.4

20.6

21.4

21.8

12.7

23.1

32.4

22.5

35.0

39.4

115

25.8

We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity. Also a new approach to the BP marginal solution is proposed that we call one-view-occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure.

12/11/17

OVOD

24.6

13.1

10.1

14.4

20.2

35.6

26.9

30.1

34.4

10.1

28.4

39.9

28.5

31.9

24.3

35.1

12/22/20

140

SLCCF

24.6

12.6

11.6

14.7

17.8

34.3

23.7

24.6

27.1

16.9

26.4

42.5

32.6

36.7

102

28.3

32.3

An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities. The method uses the MC-CNN code for the matching cost computation only.

01/19/16

NTDE

24.9

18.1

14.3

16.2

17.8

33.8

27.4

29.6

33.1

10.5

28.3

40.2

26.8

31.9

26.1

34.4

We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patch-based network architecture with multi-size and multi-layer pooling unit is adopted to learn cross-scale feature representations. For disparity refinement, the initial optimal and sub-optimal disparity maps are incorporated and diverse base learners are applied.

10/12/17

FEN-D2DRR

24.9

14.1

11.9

15.1

19.1

34.9

28.4

31.7

30.2

12.2

26.7

40.2

28.9

31.7

24.4

39.1

02/28/18

SDR

25.1

17.4

13.8

15.7

16.6

29.6

25.5

29.2

26.4

25.5

156

26.9

42.5

28.2

38.0

109

18.0

36.0

10/19/16

LW-CNN

25.2

14.7

13.2

15.6

20.5

37.6

27.8

31.4

31.1

11.6

26.8

40.4

29.7

31.8

25.1

35.3

03/06/23

203

PCVNet

25.5

14.8

14.1

16.2

19.4

27.6

28.3

30.7

30.2

12.6

43.2

114

38.9

23.1

30.2

26.0

52.0

132

02/22/23

202

GLC_STEREO

25.8

13.7

10.7

12.8

19.2

29.5

14.3

56.6

146

62.2

191

11.5

29.1

40.8

17.4

23.0

29.6

17.3

Semi-Global Matching (SGM) uses an aggregation scheme to combine costs from multiple 1D scanline optimizations that tends to hurt its accuracy in difficult scenarios. We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. Our proposed SGM-Forest algorithm solves this problem using per-pixel classification. SGM-Forest currently ranks 1st on the ETH3D stereo benchmark and is ranked competitively on the Middlebury 2014 and KITTI 2015 benchmarks. It consistently outperforms SGM in challenging settings and under difficult training protocols that demonstrate robust generalization, while adding only a small computational overhead to SGM.

03/11/18

SGM-Forest

26.2

16.2

14.3

15.6

22.0

35.8

26.5

34.4

37.5

12.6

28.6

41.5

28.7

32.5

25.9

33.9

08/12/20

125

CFNet_RVC

26.2

34.0

121

13.8

15.1

21.0

37.5

27.7

35.0

31.0

12.9

27.1

42.0

26.3

30.9

30.2

27.6

11/11/22

196

ICVP

26.3

30.2

110

10.2

13.9

26.2

38.6

24.3

35.4

38.4

14.3

32.1

40.3

27.7

27.0

25.0

31.8

We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MC-CNN.

09/13/16

SNP-RSM

26.3

15.9

13.9

16.3

18.0

32.7

29.5

34.2

35.7

13.4

28.2

43.2

30.8

30.7

30.6

32.2

05/28/20

116

LEAStereo

26.4

23.4

14.7

16.9

23.2

27.3

25.0

31.0

32.4

16.8

37.8

40.1

27.3

31.6

28.4

37.4

08/09/20

122

UCNet

26.4

34.5

123

14.6

15.1

22.0

38.9

27.4

36.9

31.6

12.7

26.6

42.8

27.4

29.9

25.2

32.6

11/21/21

161

FENet

26.5

23.0

9.24

11.7

18.9

49.3

123

22.3

49.1

123

47.1

138

10.5

27.5

35.5

25.0

29.7

22.1

35.2

06/09/23

207

SSVM-CFPMF

26.7

22.4

13.8

16.2

20.8

25.4

30.2

37.5

25.2

154

32.4

42.2

29.7

39.2

121

21.2

32.1

05/25/22

167

LSMSW

26.8

16.6

15.0

16.7

19.7

34.2

27.0

37.8

39.6

12.7

28.1

43.4

29.2

31.1

29.0

33.6

We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uni-scale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the cross-based cost aggregation and the semiglobal matching.

11/28/18

MSFNetA

26.9

19.2

14.1

18.0

20.8

33.5

28.8

36.7

38.5

12.5

31.7

42.3

27.0

30.5

29.2

32.5

10/25/21

159

SWFSM

26.9

16.6

15.0

16.6

19.6

34.4

27.2

38.0

40.0

102

12.5

27.7

43.8

29.5

31.2

28.9

33.7

03/07/23

204

GOAT18

27.0

15.7

16.0

112

20.7

110

15.2

28.7

36.7

155

25.9

24.4

20.9

132

32.5

42.8

29.4

27.6

37.7

106

46.6

118

05/28/16

APAP-Stereo

27.0

23.6

23.7

155

21.2

117

20.7

44.0

102

30.0

31.2

27.2

20.7

131

32.8

34.8

24.2

38.3

112

22.4

21.8

10/29/18

Dense-CNN

27.1

16.6

14.8

16.7

20.0

35.1

28.5

38.2

39.1

12.9

29.2

42.9

30.2

32.2

28.0

33.4

09/20/21

157

GANet-RSSM

27.2

27.5

15.9

106

15.4

20.9

43.0

26.7

37.6

34.6

12.5

28.2

41.5

26.2

31.7

33.2

30.6

Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter. DETAILS: The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.

08/28/15

MC-CNN-acrt

27.3

17.0

14.9

16.9

20.4

34.0

28.4

38.4

39.3

12.9

29.8

43.0

30.3

32.2

28.9

33.8

Cost aggregation plays a critical role in existing stereo matching methods. Generally, aggregating matching costs in homogeneous regions with similar disparities is benefi- cial to matching accuracy. However, previous approaches commonly use 3D convolutions for cost aggregation with- out considering the homogeneity of different regions. In this paper, we revisit cost aggregation in stereo match- ing from a perspective of disparity classification and pro- pose a generic yet efficient Disparity Context Aggregation (DCA) module to improve the performance of CNN-based methods.

10/26/22

192

DCANet

27.3

25.0

16.9

121

16.9

21.5

35.9

27.5

39.1

34.1

15.1

32.2

42.9

29.5

31.1

25.2

33.4

We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function. The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.

11/03/15

MC-CNN+RBS

27.5

17.7

16.0

112

17.0

20.2

33.7

28.2

39.2

40.3

107

13.6

29.4

43.8

29.3

31.6

29.3

32.4

02/19/24

232

HCR

27.5

23.9

8.78

11.4

25.2

68.5

172

24.5

48.9

122

45.1

128

10.4

28.1

36.9

21.9

31.8

25.7

22.8

07/03/16

LPU

27.7

30.0

109

8.53

19.9

105

22.3

46.1

111

27.4

35.1

25.8

12.6

48.4

141

44.5

30.0

34.7

30.2

31.6

11/16/16

MCSC

28.3

30.0

108

12.9

23.3

132

24.0

22.5

33.3

123

34.3

12.3

11.0

50.5

115

34.7

110

33.3

29.2

49.3

124

01/26/16

MC-CNN-fst

28.4

20.8

15.2

100

19.4

104

23.5

39.5

31.1

104

36.5

35.4

12.9

33.7

44.3

31.4

33.7

29.4

34.1

09/01/22

176

GMStereo

28.4

16.1

14.6

18.8

35.3

139

36.9

16.0

31.7

35.3

22.5

138

36.7

36.4

30.8

38.6

115

29.9

38.1

04/18/23

205

DMCANet

28.5

23.9

11.6

13.5

20.5

40.4

32.9

116

42.1

100

36.1

15.6

32.4

39.7

26.8

33.4

39.7

118

32.9

06/05/18

CBMBNet

28.9

25.4

16.5

116

17.6

20.8

36.0

29.6

41.7

41.4

113

12.2

35.8

40.5

30.1

34.7

31.6

33.2

03/11/24

240

MIF-Stereo

29.0

17.6

16.6

117

16.7

26.7

101

32.9

31.1

105

35.1

17.2

103

37.2

51.8

124

26.6

32.8

34.2

41.8

101

Accurate disparity prediction is a hot spot in computer vision, and how to efﬁciently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network (NLCANet) to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning (GFL) module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching (NLAM) module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry reﬁnement (GR) module to reﬁne the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1), our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error (EPE) and the fraction of erroneous pixels (D 1 ); (2), our proposed method particularly has superior performance in the reﬂective regions and occluded areas.

08/11/20

124

NLCA_NET_v2_RVC

29.4

31.2

112

11.8

17.3

24.0

47.9

118

35.0

144

42.0

35.5

13.8

40.6

106

40.9

29.2

32.6

30.4

31.5

A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters.

06/27/18

DCNN

29.4

17.5

14.7

18.1

22.0

31.4

29.6

44.7

111

41.2

111

14.0

35.9

48.5

107

34.0

105

32.2

32.3

36.1

A lightweight network with dilated ResNet feature extractor, a correlation cost volume run at a low resolution, and a refinement network to get a full resolution disparity output. Sparse disparity is processed from the dense disparity using a threshold on the network confidence output and a region grower to remove suspected bad disparities.

08/24/21

156

MMStereo

29.5

48.0

155

16.9

122

17.8

28.0

111

47.5

117

23.5

47.3

118

34.3

17.6

105

35.6

41.9

29.1

31.3

27.2

17.9

It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.

11/07/19

LBPS

30.7

15.6

15.8

105

19.1

100

21.2

34.8

35.0

146

34.5

31.7

15.6

35.2

44.1

33.4

102

44.1

149

38.1

108

60.8

152

We propose a robust learning-based method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely data-driven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.

11/13/17

CBMV

30.8

19.2

16.0

111

20.4

109

22.6

45.1

106

31.8

108

39.1

15.4

47.5

138

44.1

33.8

104

38.1

111

32.6

37.0

04/14/24

241

SMFormer

30.9

34.3

122

14.9

17.5

22.8

43.2

29.5

39.4

30.0

14.4

35.8

45.3

26.2

33.1

50.0

153

58.6

145

11/08/18

HSM-Net_RVC

31.2

34.9

125

14.2

19.4

103

26.8

104

46.4

114

32.8

114

35.3

28.1

17.5

104

43.2

113

45.8

34.2

107

36.9

105

38.7

111

42.7

103

10/02/22

181

MaskLacGwcNet_RVC

31.3

23.1

15.7

103

18.3

23.7

45.7

109

22.0

45.8

116

42.5

118

22.1

136

47.4

136

48.2

103

30.6

27.5

43.8

136

35.1

05/17/24

243

FormerRaft_RVC

31.4

32.4

114

18.1

127

21.0

114

29.2

118

39.4

16.7

42.2

101

43.9

122

23.7

145

38.5

47.2

26.4

32.7

37.4

103

45.4

112

01/27/22

163

UPFNet

31.5

27.4

12.1

15.8

23.8

57.0

140

30.3

40.3

35.8

18.7

115

45.8

128

44.5

35.0

113

36.2

34.3

49.4

125

08/08/22

172

UCFNet_RVC

31.6

31.0

111

12.7

14.5

20.0

44.2

103

28.1

45.1

113

47.2

140

14.2

30.3

49.4

109

27.7

51.5

185

37.5

104

38.6

01/08/24

229

GINet

31.7

37.3

128

14.3

16.5

25.4

45.5

108

28.3

54.8

138

48.5

145

14.1

32.2

42.3

26.7

31.0

35.8

43.5

106

02/27/22

164

MSTR

32.0

21.7

21.2

147

20.2

108

28.2

114

36.6

32.8

115

35.6

39.6

16.2

43.4

115

45.8

31.9

35.5

32.6

64.5

169

10/07/22

188

MCNet

32.1

33.0

115

13.1

18.8

27.3

107

44.7

105

33.8

128

42.5

102

36.1

18.6

114

39.4

103

49.7

110

34.7

111

34.0

30.2

56.2

141

11/17/20

136

RASNet

32.1

100

33.1

117

14.4

16.7

25.7

56.5

137

29.5

35.9

30.9

13.8

45.3

125

42.1

29.0

35.9

55.3

172

51.0

130

11/15/16

MC-CNN-WS

32.1

101

33.1

118

18.1

129

24.9

144

27.0

105

43.9

101

33.3

123

42.0

38.1

16.6

35.6

47.5

101

33.1

100

34.8

33.7

39.2

08/05/22

171

RDNet

32.3

102

29.2

106

14.2

18.4

27.6

109

41.4

33.7

126

44.4

108

40.6

109

23.6

144

40.1

105

49.0

108

35.8

119

31.8

33.5

40.2

05/20/20

114

AANet++

32.3

103

37.0

127

16.2

114

19.1

101

23.6

63.8

159

30.5

100

40.7

37.2

10.9

33.2

50.7

116

31.1

36.0

36.5

100

60.0

150

11/25/20

138

LPSC

32.4

104

20.6

19.8

138

18.4

25.5

46.1

113

30.6

101

39.0

41.4

114

16.6

38.0

47.4

100

35.4

115

43.2

142

37.6

105

42.4

102

11/14/19

101

HSM-Smooth-Occ

32.6

105

34.8

124

15.9

106

21.9

123

30.3

123

43.8

32.2

110

40.9

33.3

19.7

126

44.6

121

49.8

111

35.7

117

33.2

34.8

46.6

117

This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.

04/19/15

MeshStereo

32.9

106

18.8

16.8

119

23.1

130

33.3

134

30.4

32.9

117

40.2

37.7

19.0

117

44.6

120

50.1

113

33.4

101

43.7

146

39.1

113

39.5

10/03/22

183

CroCo_RVC

32.9

107

19.8

14.3

17.0

26.7

102

40.2

20.9

59.2

162

58.4

180

16.6

39.9

104

55.1

142

29.2

35.7

31.5

49.6

126

01/14/23

201

AASNet

33.0

108

34.9

126

15.1

19.9

107

24.5

42.8

34.0

135

41.9

39.3

18.2

110

34.6

48.2

103

35.8

118

33.2

49.2

150

41.5

We propose a novel deep stereo matching network a new real-world stereo dataset of cluttered objects taken with a commercially available stereo sensor. We design a U-shaped architecture with various types of attentions which more efficiently extracts global and local contexts from rectified image pairs, resulting in highly accurate disparities. Furthermore, its symmetric structure allows simultaneous estimation both left and right disparity. It can also implicitly estimate the uncertainty i.e. the confidence of estimated disparities.

09/14/23

214

CASS

33.0

109

19.6

17.9

126

27.6

155

26.7

100

42.1

22.9

33.5

36.4

24.3

149

54.6

154

46.2

34.5

109

39.5

123

45.4

143

45.8

114

10/06/22

186

AGCVNet

33.1

110

28.1

103

12.9

17.8

27.6

108

43.9

100

34.0

133

50.8

129

43.8

121

22.4

137

39.4

102

48.2

105

36.2

123

33.3

36.0

39.5

12/12/22

200

KPEA-Stereo

33.4

111

29.1

105

15.5

102

19.1

26.3

41.5

27.7

49.4

124

40.1

104

22.1

135

40.7

107

48.4

106

33.0

36.5

100

39.1

112

57.1

142

10/06/22

187

GwcSlice

33.4

112

33.7

120

12.8

17.5

24.4

44.6

104

33.3

122

43.9

106

37.3

18.5

112

41.3

108

47.7

102

35.5

116

36.2

45.7

144

59.1

148

10/17/21

158

ACVNet

33.7

113

27.9

102

12.6

14.8

24.6

59.3

144

33.2

120

43.2

104

41.3

112

17.0

101

44.1

117

44.9

30.8

33.1

55.8

173

54.5

138

07/14/22

170

CRMV2

34.1

114

26.7

19.2

133

23.4

133

24.0

48.3

121

36.5

154

41.6

40.0

103

19.8

127

41.5

110

51.1

120

34.7

111

39.6

125

40.8

123

47.0

120

03/09/17

SGMEPi

34.2

115

20.4

20.0

140

24.4

138

29.4

119

37.0

34.5

140

40.2

40.5

108

18.7

115

45.8

127

53.8

132

38.6

134

43.3

145

38.2

109

41.7

100

11/16/20

135

SSCasStereo

34.7

116

57.2

174

17.0

124

19.9

105

30.2

122

73.0

198

29.3

47.7

119

29.9

19.2

124

67.5

183

45.7

34.2

108

37.9

108

32.2

28.9

12/18/15

INTS

34.8

117

40.7

134

16.8

118

22.9

129

31.1

127

54.0

133

32.7

112

45.4

114

38.9

17.9

107

44.1

118

50.9

117

36.0

122

40.9

129

33.6

48.6

122

12/02/22

198

GANet+ADL

35.2

118

42.0

138

10.6

16.0

27.9

110

49.0

122

23.5

63.7

175

54.3

161

16.0

44.7

123

46.2

31.5

29.7

45.1

141

63.0

168

09/09/20

129

AdaStereo

35.5

119

44.2

148

17.3

125

21.4

120

36.3

142

38.1

25.0

53.8

134

42.3

117

21.9

134

26.4

54.6

137

36.4

124

38.6

116

41.4

129

55.0

140

04/19/24

242

DCSE

35.6

120

37.3

129

13.8

17.6

29.7

120

65.8

164

30.9

102

56.2

144

55.5

170

19.1

119

44.6

121

46.1

33.5

103

34.2

33.1

49.7

127

A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.

08/27/14

LPS

36.0

121

15.6

13.1

21.8

121

27.0

106

96.8

240

34.8

143

36.2

36.5

16.1

93.5

240

56.4

147

34.1

106

40.9

130

36.4

43.2

105

05/10/19

tMGM-16

36.2

122

20.3

16.9

120

22.7

127

37.1

146

40.0

36.0

150

51.9

130

45.8

132

16.4

38.5

51.8

125

44.0

168

38.6

116

36.9

102

60.6

151

10/07/14

IDR

36.4

123

56.1

171

11.8

19.3

102

37.7

149

57.6

141

34.2

137

55.4

140

40.1

105

17.0

100

45.9

129

52.2

127

36.0

121

39.0

118

39.5

117

39.4

10/30/23

219

LSTS

36.7

124

26.2

23.6

153

22.7

126

30.5

125

43.1

34.7

142

56.9

147

55.9

171

16.0

38.7

101

50.0

112

37.3

128

36.8

103

37.8

107

54.6

139

11/06/16

SPS

36.9

125

29.1

104

27.2

167

29.0

160

26.8

103

32.2

37.9

160

44.9

112

40.3

106

19.1

118

41.7

111

54.9

139

43.3

162

40.6

126

50.8

157

45.0

110

This model is trained on low-resolution data but aims at high-resolution images. It uses a recurrent module to iteratively update a coarse disparity prediction. Then a special refinement module makes a final adjustment. The recurrent update and final refine are applied in a patch-wise manner across the initial disparity.

03/05/21

142

ORStereo

37.2

126

60.6

191

19.6

136

21.2

118

37.1

145

59.6

146

33.5

125

50.0

125

30.4

16.4

49.1

144

58.4

157

38.5

132

40.9

128

40.5

122

47.0

119

09/22/22

178

DCstereo

37.3

127

27.7

101

24.5

159

24.6

141

30.4

124

43.5

29.1

50.1

127

47.3

141

22.7

139

43.7

116

51.4

123

37.6

130

51.0

182

43.7

135

45.0

111

The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.

10/13/15

MDP

37.4

128

38.6

132

21.8

148

27.8

156

35.2

138

54.0

132

30.4

47.9

120

42.0

116

24.6

152

49.5

148

51.1

119

35.0

114

43.9

148

41.1

125

43.7

107

In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.

05/13/19

37.5

129

23.6

154

28.6

159

29.9

121

39.3

38.5

161

42.7

103

42.9

119

27.0

165

48.0

139

54.0

133

43.7

165

50.2

177

38.7

110

40.9

11/12/20

134

UnDAF-GANet

37.5

130

12.9

11.8

34.9

181

42.5

170

49.7

124

46.6

186

50.1

126

46.5

136

10.4

59.9

163

56.9

150

26.2

50.9

180

28.9

61.4

159

08/09/22

173

issga

37.6

131

33.5

119

25.7

162

20.7

110

39.1

157

40.2

31.3

106

52.9

131

50.8

150

15.3

29.1

66.7

192

39.1

137

39.1

120

40.3

121

62.0

163

The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.

07/28/14

SGM

38.1

132

59.3

184

15.4

101

21.3

119

38.7

154

59.8

147

34.0

132

55.6

141

39.7

100

18.0

108

46.4

134

54.5

136

38.6

133

38.5

113

39.7

119

52.7

134

03/14/18

DTS

38.1

133

29.7

107

28.9

177

23.6

134

32.2

130

46.7

115

34.3

138

54.2

137

54.9

165

24.3

151

38.0

51.0

118

36.5

125

43.8

147

41.3

127

40.0

07/26/19

EdgeStereo

38.1

134

49.6

157

15.9

109

20.8

113

43.8

176

40.3

25.5

55.3

139

48.1

143

21.3

133

26.4

55.1

141

39.8

143

39.6

124

43.6

133

75.0

197

08/22/22

174

PSM-Aug

38.2

135

27.5

24.3

157

24.6

140

29.1

117

47.3

116

31.0

103

44.5

110

45.0

127

23.1

140

46.1

131

55.6

144

41.0

150

44.3

151

53.2

166

58.2

144

05/18/18

PDS

38.3

136

38.2

131

19.6

135

24.6

142

32.8

132

52.7

128

40.8

173

45.8

115

37.6

23.9

146

49.5

148

51.4

122

39.8

144

44.5

153

44.6

138

58.9

146

03/09/18

SGM_RVC

38.6

137

58.8

178

15.9

108

22.0

124

32.0

129

56.0

136

36.7

156

56.4

145

39.8

101

17.9

106

46.6

135

55.5

143

40.7

149

41.6

134

45.0

140

51.0

130

08/22/21

155

SDCO

38.6

138

51.3

161

17.0

123

21.8

122

36.8

144

53.4

131

33.8

129

59.0

161

47.1

139

24.3

150

46.3

132

57.2

151

40.2

145

41.9

137

39.2

114

34.3

04/17/15

TMAP

38.6

139

40.9

135

20.2

143

23.8

135

35.0

137

54.6

134

34.4

139

57.7

150

51.3

151

19.4

125

48.7

143

51.2

121

37.4

129

41.6

133

42.7

131

42.9

104

02/07/20

108

CasStereo

38.7

140

48.4

156

19.7

137

24.5

139

30.8

126

88.2

233

36.2

152

39.6

36.5

17.1

102

64.8

177

54.2

135

41.3

152

37.3

107

41.0

124

64.9

173

08/25/14

LPS

38.8

141

20.2

20.1

142

23.3

131

28.1

112

97.7

241

35.0

147

36.8

41.6

115

19.8

128

95.4

242

53.2

129

42.2

155

43.0

141

36.9

101

48.8

123

09/18/14

SNCC

38.8

142

64.6

199

15.1

21.2

116

39.8

159

61.2

153

32.7

113

60.9

170

43.0

120

18.3

111

38.5

56.6

148

38.1

131

38.6

114

44.2

137

45.7

113

We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimization-based approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistency-enforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model end-to-end. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to state-of-the-art deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) post-processing steps by a trainable, well-understood model.

03/22/17

JMR

39.4

143

20.9

145

29.7

162

39.0

156

48.1

119

38.8

162

59.4

163

59.1

185

19.2

123

38.3

50.4

114

37.2

127

52.7

188

35.0

46.5

116

04/28/23

206

ADStereo

39.9

144

37.5

130

35.5

192

34.0

176

43.4

173

45.2

107

33.7

127

43.5

105

45.8

131

26.8

162

41.4

109

75.6

217

45.4

175

36.9

104

39.9

120

27.4

10/08/22

189

MANet

39.9

145

47.9

154

14.8

22.8

128

31.8

128

56.8

139

37.5

159

53.6

132

41.1

110

23.5

142

48.2

140

56.7

149

43.4

163

43.2

143

50.9

158

62.0

164

The project proposes a stereo matching network based on neural operator, which can achieve mapping from RGB image pair space to disparity space. This network supports users to test images at any scale, and can customize the disparity range according to different scenarios, and dynamically build Cost Volume based on different scales and disparity ranges.

02/20/24

233

DispNO

40.2

146

42.0

139

18.1

128

24.2

137

35.5

141

66.3

166

37.0

158

50.7

128

46.2

134

32.5

179

58.7

160

51.9

126

40.2

145

39.3

122

46.8

147

46.4

115

05/14/20

112

SRM

40.6

147

33.1

116

31.0

185

30.4

165

38.0

151

48.1

119

32.2

111

54.1

136

54.5

164

25.5

155

45.2

124

52.6

128

40.4

148

45.5

156

41.1

126

50.9

129

Unsupervised Stereo Matching methods have made significant strides recently. However, these approaches have predominantly relied on the assumption of photometric consistency, leading to potential limitations: sensitivity to illuminance changes and difficulty in dealing with problematic areas like occluded or textureless regions. To mitigate these limitations, this paper introduces a novel self-supervised dual-level framework named \textbf{\textit{Dual-Net}}. This framework mainly consists of two key components: self-supervised teacher training and student training based on knowledge distillation. Specifically, the teacher model is first trained in a self-supervised fashion with a focus on feature space and data augmentation consistency. On the one hand, pixels from feature space are robust to noise and luminance changes, which are discriminative even in textureless regions. On the other hand, a data augmentation consistency loss is presented to guide the model toward enhanced contextual awareness, thus leading to a completed depth estimation in problematic regions. Then, the knowledge learned by the teacher model is distilled and transferred probabilistically to the student model. By leveraging this distilled knowledge, the student model is guided by validated insights, enabling it to outperform its teacher model by a large margin.

01/08/24

228

DualNet

41.2

148

43.9

146

20.9

146

24.7

143

37.4

148

50.9

126

36.3

153

58.0

153

52.4

156

27.1

166

46.4

133

58.0

153

38.9

135

44.8

155

47.9

149

53.7

135

05/22/18

DN-CSS_ROB

41.3

149

52.9

163

16.3

115

23.8

136

28.8

115

67.1

170

33.0

119

58.2

154

44.6

125

24.1

148

68.7

188

61.8

173

35.9

120

41.7

135

52.3

162

65.3

175

07/28/14

SGM

41.8

150

62.7

194

13.9

19.1

45.8

181

64.1

161

33.2

120

68.2

190

44.0

123

18.2

109

44.2

119

58.6

158

44.8

172

39.0

119

47.1

148

68.4

178

10/13/22

190

42.5

151

31.5

113

28.0

174

31.1

167

39.4

158

37.1

31.3

106

48.0

121

45.6

130

36.8

191

68.4

184

60.3

165

42.7

158

42.1

138

55.1

171

64.6

171

12/05/22

199

Ct-Net

42.6

152

60.5

190

26.9

166

26.3

149

37.8

150

56.6

138

34.0

133

58.9

159

49.6

147

25.2

153

49.4

147

54.7

138

36.9

126

41.1

132

58.8

177

54.0

137

We propose a novel lightweight network for stereo estimation. The method uses densely connected layer structures to learn expressive features without the need of fully-connected layers or 3D convolutions. This leads to a network structure with only 0.37M parameters while still having competitive results. The post-processing consists of filtering, a consistency check and hole filling.

11/10/20

132

FC-DCNN

42.9

153

43.3

144

26.7

164

30.8

166

38.7

153

58.9

142

46.2

185

53.8

133

53.4

158

26.2

159

47.5

137

53.4

130

40.2

145

46.3

164

46.5

146

50.8

128

07/21/20

121

AANet_RVC

42.9

154

42.5

140

20.6

144

22.6

125

28.9

116

77.0

211

41.3

176

58.3

156

51.6

152

15.6

42.8

112

63.3

181

39.7

142

47.2

170

57.2

175

80.5

216

07/17/20

120

GANetREF_RVC

43.1

155

43.9

146

15.7

104

21.0

115

28.1

113

70.4

181

36.9

157

58.6

157

57.8

177

19.2

122

55.3

155

55.9

145

39.1

138

50.8

179

61.1

180

75.1

198

This paper presents a novel unsupervised stereo matching cost for stereo matching. Specifically, a novel two-branch convolutional sparse coding (CSC) is used to learn the convolution filter bank without ground truth disparity maps. Then, the sparse representations over the learned convolutional filter bank are utilized to measure the similarity between image patches, namely, the stereo matching cost can be computed by measuring the l1 distance between sparse representations of image patches.

04/12/19

TCSCSM

43.3

156

64.7

200

24.9

160

31.1

168

45.3

180

63.8

158

35.8

149

55.8

142

45.3

129

26.5

161

49.6

150

56.3

146

45.6

177

46.4

165

44.8

139

43.9

108

OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/28/14

SGBM1

43.4

157

55.1

168

29.3

179

28.1

158

33.0

133

81.1

223

34.2

136

57.0

148

52.3

154

18.6

113

65.6

179

59.5

162

41.2

151

41.9

136

43.6

134

64.8

172

03/02/22

165

AANet_Edge

44.1

158

50.9

158

23.2

152

27.0

153

34.6

136

65.6

163

40.4

168

54.0

135

46.4

135

19.2

121

50.8

151

59.0

160

46.6

182

41.0

131

63.0

185

84.5

227

09/27/22

179

FCDSN-DC

44.2

159

43.7

145

25.6

161

36.7

188

45.9

183

59.4

145

42.4

178

57.8

151

52.7

157

32.0

177

46.1

130

55.1

140

45.8

178

45.9

159

42.4

130

47.7

121

04/24/16

HLSC_cor

44.7

160

42.9

143

22.0

149

34.8

179

36.4

143

53.0

129

44.5

184

60.8

168

61.1

187

26.2

160

53.0

153

59.2

161

42.7

160

50.0

176

54.3

169

44.0

109

11/21/20

137

DecStereo

44.8

161

42.6

141

30.8

184

32.4

171

38.9

155

51.4

127

50.3

198

45.8

116

48.4

144

27.2

168

49.3

145

58.2

155

44.0

169

42.9

140

65.3

188

65.2

174

10/29/18

iResNet

44.9

162

54.1

166

19.0

131

31.5

169

41.4

168

60.0

150

32.2

109

58.9

159

47.8

142

35.6

187

73.5

205

53.5

131

39.0

136

42.5

139

56.7

174

71.4

184

07/25/14

SGBM1

44.9

163

62.5

193

24.4

158

26.0

148

40.7

163

87.9

232

33.8

130

66.6

185

54.0

159

16.4

68.8

189

63.1

180

42.4

156

40.7

127

45.1

142

59.9

149

09/10/14

LAMC_DSM

45.0

164

74.1

233

26.9

165

26.8

150

37.3

147

63.1

155

40.1

167

68.0

189

55.4

169

20.0

129

45.4

126

57.9

152

44.5

171

44.7

154

52.3

160

52.5

133

Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.

07/28/14

Cens5

45.1

165

64.4

198

23.8

156

25.9

147

41.2

165

62.2

154

40.6

169

65.7

180

55.0

166

23.5

142

52.4

152

60.4

166

45.4

176

43.3

144

49.5

151

60.8

153

10/10/18

DISCO

45.6

166

58.8

179

18.4

130

25.6

146

40.4

162

71.2

188

39.4

166

60.2

166

52.3

155

28.4

170

63.4

174

58.7

159

45.1

174

44.3

150

53.0

164

73.6

191

05/31/18

iResNet_ROB

45.9

167

47.3

153

19.5

134

29.3

161

32.2

131

59.8

148

33.9

131

57.1

149

54.0

160

30.9

174

66.7

182

62.5

176

41.9

154

46.0

160

69.9

204

81.2

218

11/11/18

MBM

46.5

168

59.4

185

26.1

163

34.8

180

43.5

175

61.2

152

43.2

183

58.3

155

50.0

148

31.9

176

60.4

164

59.5

163

44.8

173

46.4

166

50.5

155

62.1

166

12/18/18

MCV-MFC

46.7

169

53.9

165

19.9

139

29.8

163

38.0

152

84.9

229

35.4

148

55.9

143

46.6

137

36.5

190

80.1

229

54.2

134

39.3

139

47.1

169

63.0

184

71.6

185

07/25/14

SGM

47.0

170

59.8

186

29.2

178

33.3

174

43.3

171

59.9

149

42.7

182

58.6

158

50.3

149

35.8

189

60.6

166

61.5

171

43.4

164

48.2

174

51.6

159

61.0

155

The method comprises two main steps. First, we use adaptive support weights for local matching. Apart from the color similarity and geometric distance, the adaptive weight distribution favors pixels in the block matching with smaller cost. Besides, we use a multiscale strategy with invalidation criteria to reduce match ambiguity and computational time. Second, a global interpolation using a variational formulation is carried out. The energy functional penalizes deviations from the local disparity estimation at different scales.

02/15/19

DAWA-F

47.5

171

67.7

205

30.3

181

27.9

157

39.9

160

81.0

222

42.4

179

61.2

171

58.2

178

23.2

141

63.1

171

63.0

179

43.8

167

47.6

172

52.6

163

59.0

147

02/24/20

110

SGBMP

47.7

172

58.9

180

36.7

196

33.5

175

46.1

184

88.3

234

34.6

141

60.8

169

51.7

153

25.8

157

73.8

209

58.2

154

41.4

153

44.4

152

53.1

165

57.6

143

06/02/21

146

FADNet_RVC

47.7

173

42.8

142

19.0

132

27.2

154

34.6

135

63.4

156

33.0

118

67.4

187

70.0

206

27.1

166

75.2

215

62.6

177

42.7

158

48.0

173

65.0

187

81.9

220

06/26/23

209

CCL-Stereo

48.0

174

69.4

211

27.5

171

26.9

151

51.7

193

92.7

237

9.91

71.0

196

55.2

167

26.9

163

58.0

158

60.5

167

39.6

140

45.7

157

75.1

223

61.4

160

02/20/20

109

CRAR

48.2

175

51.2

159

35.6

193

36.6

186

44.1

177

46.1

112

49.2

194

60.3

167

58.8

183

34.1

182

57.5

157

63.3

181

46.2

181

46.1

161

50.0

153

64.6

170

06/04/21

147

RANet++

48.3

176

41.7

137

20.1

141

32.4

172

35.3

140

67.0

169

35.0

144

66.9

186

66.5

197

25.9

158

70.0

192

61.0

170

44.3

170

47.6

171

70.0

206

78.9

211

12/24/20

141

ACR-GIF-OW

48.6

177

62.7

195

31.7

186

35.7

185

42.0

169

66.0

165

41.3

175

64.0

176

56.6

173

34.7

184

63.9

176

61.6

172

45.9

179

46.8

168

50.7

156

61.4

158

04/09/15

PFS

48.7

178

76.8

235

40.4

202

25.3

145

61.5

210

81.5

226

47.7

190

66.4

184

49.4

146

19.1

120

48.6

142

60.5

169

42.4

157

54.1

193

41.3

127

53.8

136

05/09/21

145

ADSG

48.7

179

60.1

188

30.6

182

35.6

184

43.5

174

60.3

151

42.4

181

64.6

178

58.3

179

35.6

186

60.7

167

62.1

174

46.2

180

46.2

162

53.2

167

61.5

162

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)

09/14/15

ELAS

50.1

180

69.4

211

27.5

171

26.9

151

51.7

193

92.7

237

36.1

151

71.0

196

55.2

167

26.9

163

58.0

158

60.5

167

39.6

140

45.7

157

75.1

223

61.4

160

11/12/20

133

RLStereo

50.9

181

45.2

149

27.4

170

38.2

192

48.6

186

79.8

219

56.1

220

57.9

152

54.4

162

39.9

195

78.5

224

58.4

156

43.3

161

48.6

175

70.1

207

40.8

OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM2

51.0

182

55.7

170

30.8

183

37.4

189

41.2

166

85.3

230

40.7

171

62.4

172

56.9

175

35.1

185

78.6

225

63.3

183

47.4

184

52.7

187

57.2

175

68.5

179

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.

03/26/18

ELAS_RVC

51.7

183

65.3

202

37.8

198

36.6

187

49.3

189

78.7

215

40.6

170

64.1

177

56.2

172

34.4

183

63.3

173

63.5

185

47.6

185

51.2

183

62.4

182

60.9

154

05/15/19

PSMNet_2000

51.8

184

46.6

151

22.0

150

34.3

177

55.7

202

63.9

160

39.3

165

76.4

218

77.3

233

31.1

175

77.4

222

59.9

164

43.8

166

50.6

178

54.4

170

78.2

207

12/30/19

104

F-GDGIF

52.1

185

59.2

183

28.1

175

37.9

191

54.6

201

54.9

135

40.7

171

75.9

214

70.1

207

29.2

172

62.8

169

72.2

201

47.9

186

51.5

186

53.4

168

75.5

200

Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.

02/18/16

LS-ELAS

53.1

186

70.7

221

27.3

168

30.1

164

54.2

199

90.1

235

42.4

179

71.9

202

58.8

182

27.7

169

59.4

161

64.9

188

49.1

189

46.3

163

76.3

226

74.5

193

07/25/14

SGBM1

53.6

187

56.3

172

43.0

209

43.4

199

41.4

167

76.0

207

41.0

174

63.4

174

61.8

189

38.7

194

74.7

214

64.3

186

48.1

187

53.5

191

61.2

181

78.0

204

06/08/20

118

MANE

53.7

188

72.3

227

35.3

191

34.9

182

52.4

195

71.7

191

48.0

191

71.6

199

62.1

190

33.5

181

59.5

162

65.9

191

51.6

195

55.7

194

59.9

179

62.0

165

04/08/15

REAF

53.8

189

74.2

234

49.9

215

34.9

183

63.3

217

79.6

217

52.5

205

66.3

183

56.7

174

28.7

171

55.9

156

63.4

184

47.4

183

59.2

199

46.1

145

62.9

167

11/26/20

139

CooperativeStereo

54.0

190

54.8

167

27.8

173

37.5

190

43.4

172

82.1

227

49.8

196

65.6

179

58.5

181

42.1

196

87.5

236

65.6

189

51.0

192

53.0

189

70.8

212

61.3

157

05/31/18

FBW_ROB

54.7

191

58.0

177

22.4

151

31.8

170

40.9

164

87.2

231

39.1

164

75.9

216

67.8

201

35.8

188

63.4

175

72.3

203

53.9

199

59.0

197

76.0

225

82.6

223

11/07/18

IEBIMst

55.0

192

59.1

182

34.7

189

42.9

196

52.7

196

72.5

196

47.4

188

76.9

220

71.5

213

32.3

178

63.3

172

75.7

219

51.1

193

51.2

184

50.0

152

84.2

226

05/23/17

r200high

55.4

193

81.6

243

28.3

176

32.7

173

51.5

192

81.2

224

54.6

209

73.6

207

54.4

162

30.7

173

63.0

170

68.8

194

54.8

202

59.4

201

67.9

195

75.4

199

We propose "DeepPruner", a real-time stereo matching algorithm, which combines the strength of deep network and search space pruning techniques. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities and generates a sparse representation of the cost-volume. We then exploit this representation to learn which range to prune for each pixel. Our method achieves competitive results on KITTI / SceneFlow datasets while running in real-time at 62ms. Moreover, we obtain the first place (on overall rankings) on the Robust Vision Challenge. For more details, check out our paper and source code.

06/26/19

DeepPruner_ROB

57.1

194

63.7

197

39.3

201

45.0

203

50.6

191

77.1

212

55.4

213

60.1

165

57.2

176

47.5

205

77.6

223

65.9

190

51.3

194

55.8

195

72.6

214

72.9

189

10/26/23

217

StereoStar

57.3

195

41.4

136

30.0

180

39.0

193

44.6

178

63.5

157

50.0

197

78.2

226

75.5

224

44.7

200

60.5

165

79.4

232

60.0

208

68.6

229

70.7

210

66.3

176

05/19/20

113

SUWNet

57.5

196

52.4

162

35.0

190

43.0

197

45.3

179

70.7

185

54.7

210

71.7

200

71.0

211

47.6

206

72.2

201

62.3

175

49.5

190

59.3

200

79.6

232

67.7

177

11/10/23

221

GASNet

57.7

197

46.9

152

40.4

203

48.6

207

57.8

204

71.6

190

41.3

177

59.5

164

59.1

184

47.0

202

74.2

213

72.4

204

54.0

200

61.2

203

79.6

231

81.8

219

07/31/18

MotionStereo

58.4

198

80.7

242

38.8

200

43.7

200

56.5

203

70.0

177

55.1

211

78.2

227

67.3

199

37.1

192

61.0

168

72.5

205

57.0

206

53.5

191

70.5

209

61.0

155

07/14/21

152

MFN_USFDSRVC

58.5

199

55.6

169

33.6

188

42.3

195

49.3

188

64.2

162

47.6

189

76.8

219

76.6

229

47.0

203

70.6

195

76.5

224

50.6

191

53.5

190

81.5

235

78.3

209

08/10/20

123

CVANet_RVC

58.5

200

53.1

164

35.9

194

43.8

201

48.7

187

72.1

195

56.1

218

72.3

203

72.8

216

49.1

208

73.6

206

62.7

178

48.6

188

60.4

202

77.8

228

70.9

181

09/24/20

130

ACMC

58.7

201

73.7

232

41.3

205

47.2

204

52.8

197

74.3

201

52.5

206

69.5

193

60.4

186

44.4

199

71.1

196

67.0

193

56.6

205

61.5

205

67.3

192

75.6

201

01/07/20

107

ADSR_GIF

59.2

202

66.6

203

41.3

205

61.3

220

50.2

190

74.9

203

47.1

187

77.7

225

72.2

214

42.9

197

72.0

200

72.6

206

53.7

198

46.6

167

59.2

178

89.1

236

11/11/19

100

CACA-Net

59.7

203

51.3

160

42.6

208

48.2

205

54.5

200

70.6

184

61.6

235

68.2

191

67.5

200

47.1

204

69.6

191

71.1

197

56.5

204

62.2

207

66.4

191

80.3

215

No post processing (no filtering, no hole-filling, no interpolation) performed. The concepts of intrinsic curves were revisited and used for: - disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search - reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior The final energy minimization was done using semi global approach along eight paths.

04/03/16

ICSG

59.8

204

80.5

240

32.6

187

34.3

178

60.3

208

84.0

228

53.7

208

81.1

234

62.2

192

32.9

180

65.4

178

75.1

214

58.1

207

61.7

206

74.5

221

87.0

232

09/28/15

R-NCC

59.8

205

40.0

133

27.3

169

41.9

194

45.8

182

79.6

218

55.4

216

87.2

241

75.4

223

38.0

193

70.0

193

81.4

236

61.4

209

50.9

180

85.5

240

86.2

231

05/23/20

115

CCNet

60.1

206

56.5

173

38.2

199

43.3

198

46.9

185

75.2

204

72.5

243

66.1

182

65.4

195

49.5

209

71.2

197

64.5

187

52.7

197

71.8

237

70.3

208

82.3

221

Stereo matching process is attracted numbers of study in recent years. The process is unique and difficult due to visual discomfort occurred which contributed to effect of accuracy of disparity maps. By using multistage technique implemented most of Stereo Matching Algorithm; taxonomy by D. Scharstein and R. Szeliski, in this paper proposed new improvement algorithm of stereo matching by using the effect of Adaptive Weighted Bilateral Filter as main filter in cost aggregation stage which able contribute edge-preserving factor and robust against plain colour region. With some improvement parameters in matching cost computation stage where windows size of sum of absolute different (SAD) and thresholds adjustment was applied and Median Filter as main filter in refinement disparity map’s stage may overcome the limitation of disparity map accuracy. Evaluation on indoor datasets, latest (2014) Middlebury dataset were used to prove that Adaptive Weighted Bilateral Filter effect applied on proposed algorithm resulted smooth disparity maps and achieved good processing time.

03/06/19

SM-AWP

60.6

207

57.3

175

48.1

214

48.3

206

52.9

198

70.4

181

58.8

230

74.4

210

73.7

219

46.7

201

76.1

219

72.3

202

52.2

196

59.1

198

65.8

189

80.2

214

06/11/21

150

R3DCNN

61.1

208

61.6

192

50.7

217

43.8

202

71.0

237

70.1

178

49.1

193

81.1

234

79.6

238

44.2

198

65.7

180

73.9

209

54.1

201

65.8

214

52.3

161

72.9

189

08/03/23

210

62.0

209

68.3

207

56.1

221

60.1

217

61.1

209

67.1

170

48.9

192

62.8

173

61.7

188

57.6

214

69.3

190

70.4

195

65.0

217

65.6

212

62.7

183

71.0

182

Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity.

06/14/18

MSMD_ROB

62.3

210

59.0

181

45.7

212

52.2

209

58.4

205

66.3

167

56.5

221

71.8

201

69.0

202

57.1

212

71.7

199

71.8

200

63.8

214

65.8

213

68.5

196

71.1

183

01/17/19

FASW

63.1

211

69.5

213

56.0

220

60.2

218

63.2

216

69.4

174

49.5

195

66.0

181

63.7

194

58.7

216

68.6

187

71.5

199

64.7

216

65.9

215

64.9

186

71.7

186

06/07/21

148

FADNet++

63.6

212

45.7

150

40.7

204

54.4

210

58.5

206

76.2

208

58.3

228

74.0

209

72.9

217

50.4

210

86.7

233

73.4

207

56.2

203

61.3

204

83.8

239

85.9

230

04/04/17

DDL

63.9

213

70.7

219

56.8

223

62.2

224

63.5

218

69.8

175

51.2

199

67.8

188

63.7

193

60.4

219

71.6

198

71.2

198

65.0

218

65.9

215

66.1

190

69.7

180

08/29/22

175

MCP-HA-VQ

64.1

214

73.0

231

55.3

219

60.0

216

62.8

215

70.1

178

52.4

203

70.7

195

66.9

198

58.1

215

68.6

186

70.6

196

64.4

215

64.9

210

67.4

194

74.7

194

05/28/19

PWCA_SGM

65.3

215

71.6

222

58.3

230

61.2

219

62.1

214

74.9

202

52.1

202

69.4

192

65.9

196

61.1

225

75.4

217

73.7

208

66.2

219

68.8

231

67.3

193

72.3

188

01/15/17

IGF

66.2

216

69.9

217

57.1

225

61.3

220

65.8

221

72.0

194

55.4

214

72.7

205

69.4

204

60.4

220

73.2

203

75.3

216

67.0

221

67.1

221

69.0

199

74.5

192

This paper proposes a novel non-data-driven matching cost for dense correspondence in view of sparse representation. This new matching cost can separate the source of impact such as illuminations and exposures, thus making it more suitable and selective for stereo matching. In addition, the new matching cost can be used as a adaptive weight in the process of cost calculation, and can improve the accuracy of the matching costs by weighting.

01/24/18

SMSSR

66.4

217

72.3

228

58.1

228

62.7

227

63.7

219

73.5

200

52.4

204

75.6

213

72.5

215

59.6

217

66.3

181

74.8

210

66.4

220

68.2

225

69.6

202

75.7

202

In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results. We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.

01/21/15

TSGO

66.6

218

63.2

196

48.1

213

51.4

208

73.5

239

78.8

216

39.0

163

78.8

229

76.0

226

66.0

239

82.7

231

75.2

215

68.0

228

68.8

230

74.7

222

77.1

203

07/13/22

169

ACT

66.6

219

69.2

209

58.0

227

62.8

228

66.3

225

71.0

187

55.2

212

72.4

204

70.6

208

62.0

230

74.0

210

74.9

212

67.1

222

66.7

219

69.7

203

75.0

196

11/15/19

102

SPPSMNet

67.0

220

60.1

189

45.2

210

59.0

215

62.0

213

76.7

210

62.7

236

78.2

227

75.3

222

64.1

237

85.6

232

77.2

227

62.0

211

64.8

209

74.2

220

80.0

213

04/11/22

166

Z2ZNCC

67.1

221

69.7

215

58.6

231

64.8

235

65.9

222

71.3

189

55.4

215

72.8

206

70.6

209

63.2

233

73.0

202

75.1

213

68.4

230

68.5

228

69.9

204

71.8

187

01/02/20

105

PPEP-GF

67.1

222

70.6

218

58.7

232

61.3

222

66.3

226

71.8

193

53.6

207

75.4

212

73.6

218

60.3

218

70.5

194

74.9

211

68.0

229

68.3

227

70.8

211

78.2

208

05/01/18

PSMNet_ROB

67.3

223

60.0

187

51.1

218

56.3

213

59.6

207

77.4

213

60.5

234

77.1

221

75.2

221

57.6

213

90.7

238

76.4

222

62.0

212

64.2

208

82.1

237

85.9

229

We propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs.

06/10/21

149

ReS2tAC

67.3

224

69.9

216

58.0

226

62.6

226

61.7

211

70.4

183

58.1

225

71.3

198

69.4

203

60.8

223

73.6

207

77.2

226

69.1

232

70.7

235

72.9

216

83.2

224

03/23/17

DSGCA

67.5

225

71.9

225

59.7

234

61.9

223

66.5

227

72.9

197

56.1

219

73.8

208

70.9

210

61.6

227

74.2

212

75.6

218

67.4

223

69.6

233

71.0

213

74.8

195

01/05/20

106

MTS2

67.6

226

70.7

219

36.5

195

54.4

211

69.3

234

98.8

242

58.2

227

81.3

237

77.4

234

48.1

207

91.7

239

82.7

238

62.0

210

58.9

196

81.9

236

91.5

239

In recent years, convolutional-neural-network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited to applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net aggregation architecture for disparity computation and a color guidance in 2D CNN for disparity refinement.

09/20/22

177

LWNet

67.7

227

64.9

201

50.2

216

57.8

214

66.2

224

68.9

173

56.6

222

75.9

217

79.1

237

55.0

211

75.5

218

78.1

228

68.8

231

65.5

211

81.3

234

91.1

238

04/27/16

JEM

68.0

228

66.9

204

61.7

239

64.6

234

66.9

229

70.9

186

57.6

224

75.3

211

74.3

220

63.0

232

73.3

204

76.3

221

67.9

226

66.3

217

68.6

197

79.7

212

This article presents a disparity map algorithm to improve the depth map estimation based on Census Transform and hierarchical segment-tree on each block.The stereo matching algorithm presented in this study comprises of four steps: Cost Computation, Cost Aggregation, Optimization, and Post-Processing, all of which will refine the final disparity map.

12/31/23

227

H-CENST

68.2

229

69.4

210

60.1

236

64.5

232

65.9

223

70.2

180

58.1

225

77.2

223

76.1

227

61.9

229

73.8

208

76.5

225

67.6

225

67.9

224

69.5

201

78.0

204

11/24/16

ADSM

68.3

230

68.5

208

56.8

222

62.5

225

67.0

230

78.0

214

51.2

199

79.5

233

76.6

230

61.4

226

77.2

221

80.6

235

67.5

224

66.5

218

69.4

200

85.6

228

09/03/20

128

LPSM

68.7

231

68.2

206

56.9

224

62.9

229

66.5

228

66.7

168

51.5

201

79.5

232

77.0

231

61.8

228

68.5

185

85.2

239

69.8

236

69.3

232

74.1

219

89.9

237

08/14/20

126

STTRV1_RVC

69.1

232

57.5

176

59.4

233

64.2

231

61.8

212

73.2

199

62.9

237

69.7

194

69.9

205

71.9

242

86.8

234

76.1

220

62.1

213

73.3

238

82.2

238

78.1

206

05/28/20

117

RTSMNet

69.5

233

72.2

226

45.6

211

55.8

212

63.7

220

90.6

236

68.6

241

81.2

236

78.8

236

60.7

221

79.1

226

80.1

233

68.0

227

67.1

220

78.1

229

80.6

217

04/17/18

ISM

69.6

234

69.6

214

60.1

235

66.3

238

67.6

231

71.7

192

59.2

232

79.0

230

77.1

232

62.3

231

74.0

210

79.0

231

69.6

233

68.2

226

68.7

198

88.5

234

09/27/23

215

FM-DT

69.8

235

71.7

224

58.1

229

65.8

237

69.5

235

75.3

205

56.7

223

77.3

224

75.6

225

61.0

224

75.3

216

76.5

223

69.8

235

67.4

222

77.1

227

89.0

235

06/14/17

DoGGuided

70.7

236

72.9

229

60.4

238

64.5

233

68.2

232

76.3

209

56.0

217

81.5

238

78.8

235

63.7

236

79.7

227

80.2

234

69.7

234

70.0

234

73.5

218

87.0

233

08/31/14

BSM

70.9

237

80.4

239

62.0

240

63.9

230

69.7

236

80.4

220

59.2

231

75.9

215

71.5

212

63.6

235

81.3

230

78.5

229

71.2

237

74.0

240

72.9

215

84.0

225

The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking. The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.

03/15/16

MPSV

71.4

238

78.8

238

65.0

241

65.8

236

69.3

233

76.0

206

58.4

229

79.2

231

76.5

228

65.6

238

76.8

220

78.6

230

72.2

238

73.4

239

72.9

217

78.5

210

03/13/20

111

MTS

74.1

239

76.9

236

41.6

207

68.9

240

74.3

240

94.9

239

63.8

239

84.9

239

80.0

239

60.8

222

88.0

237

85.7

240

76.5

240

67.5

223

88.8

241

93.3

240

08/31/16

SED

75.9

240

73.0

230

36.7

196

83.6

242

77.1

242

80.9

221

59.4

233

90.2

242

82.2

241

63.3

234

87.0

235

87.3

242

80.0

243

71.5

236

92.7

243

95.4

242

02/07/18

77.0

241

71.7

223

60.4

237

67.2

239

73.1

238

81.3

225

68.6

242

85.9

240

85.4

242

67.4

240

93.8

241

86.3

241

76.1

239

75.5

241

88.9

242

95.8

243

02/05/19

AMNet

77.5

242

78.3

237

84.7

243

75.9

241

76.3

241

70.0

176

64.6

240

77.1

221

80.1

240

79.0

243

80.1

228

81.7

237

77.5

241

76.5

242

80.5

233

82.5

222

10/23/16

SIGMRF

81.9

243

80.6

241

65.8

242

83.7

243

82.2

243

99.7

245

63.5

238

93.2

243

91.6

243

68.8

241

97.7

243

96.4

243

79.1

242

83.3

243

78.4

230

94.7

241

03/23/18

AVERAGE_ROB

98.8

244

98.0

245

98.1

245

98.3

244

99.0

244

99.0

243

98.8

244

98.9

244

99.1

245

99.4

244

99.4

244

99.5

244

99.8

244

96.6

244

98.8

245

99.4

245

03/23/18

MEDIAN_ROB

98.9

245

98.0

244

97.9

244

99.3

245

99.4

245

99.4

244

99.4

245

99.2

245

98.9

244

99.4

245

99.4

245

99.7

245

100.0

245

97.6

245

97.6

244

99.2

244

Reference list