Real time vehicle detection and tracking on multiple lanes · Real time vehicle detection and...

Real time vehicle detection and tracking on multiple lanes

Kristian Kovacic, Edouard Ivanjko, Hrvoje GoldDepartment of Intelligent Transportation Systems

Faculty of Transport and Traffic Sciences, University of ZagrebVukeliceva 4, HR-10000 Zagreb, Croatia

[email protected], [email protected], [email protected]

ABSTRACTDevelopment of computing power and cheap video cameras enabled today’s traffic management systems to includemore cameras and computer vision applications for transportation system monitoring and control. Combined withimage processing algorithms cameras are used as sensors to measure road traffic parameters like flow, origin-destination matrices, classify vehicles, etc. In this paper development of a system capable to measure traffic flowand estimate vehicle trajectories on multiple lanes using only one static camera is described. Vehicles are detectedas moving objects using foreground and background image segmentation. Adjacent pixels in the moving objectsimage are grouped together and a weight factor based on cluster area, cluster overlapping area and distance betweenmultiple clusters is computed to enable multiple moving object tracking. To ensure real time capabilities, imageprocessing algorithm computation distribution between CPU and GPU is applied. Described system is tested usingreal traffic video footage obtained from Croatian highways.

KeywordsMultiple object detection, intelligent transportation system (ITS), vehicle detection, vehicle tracking, algorithmparallelization, trajectory estimation

1 INTRODUCTIONVideo sensors or cameras combined with image pro-cessing algorithms are more and more becoming theapproach to today’s road traffic monitoring and con-trol. They have become robust enough for continuousmeasurement of road traffic parameters [Con14]. Fromthe obtained video footage high level traffic informationcan be extracted, i.e. incident detection, vehicle classi-fication, origin-destination (OD) matrix estimation, etc.This information is crucial in advanced traffic manage-ment systems from the domain of intelligent transporta-tion systems (ITS).

In order to provide high level traffic information usinga computer vision system, vehicle detection has to beimplemented first. Most often used current approachesare based on: (i) foreground / background (Fg/Bg) im-age segmentation methods where moving (foreground)objects are separated from static (background) objectsas described in [Con12a] and [Con07]; (ii) optical flowcomputation of specific segments (moving clusters)in an image [Con00]; and (iii) vehicle detection

Permission to make digital or hard copies of all or part ofthis work for personal or classroom use is granted withoutfee provided that copies are not made or distributed for profitor commercial advantage and that copies bear this notice andthe full citation on the first page. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee.

algorithms based on the Hough method [Con12b]or on Haar-like features [Con06]. For real timesystems most suitable approach is the Fg/Bg imagesegmentation method because of its low computationaldemands. Low computational demand is importantin creating real time systems especially for multipleobject detection and tracking systems.

After a vehicle in an image has been detected, its move-ment is tracked by storing its pose (position and ori-entation) and its pose change for each time segment.Using the saved vehicle poses and pose changes, its tra-jectory can be estimated. Trajectory estimation can beperformed using various approaches (mostly a predic-tion/correction framework with odometry as the motionmodel) and its basic task is to determine a mathemati-cal function which will best describe given set of points(vehicle poses) in 2D space. Vehicle trajectory estima-tion can be separated into following three parts: (i) find-ing a function f which will best describe given set ofpoints in 2D space; (ii) transformation of parametersfrom 2D to 3D space model; and (iii) computing move-ment parameters such as vehicle direction, velocity andacceleration/deacceleration in 3D space model [Pon07].

Typical commercial computer vision based traffic mon-itoring systems use one camera per lane to ensure ac-curate and robust traffic parameters measurement. Thispresents a drawback since many cameras are needed forroads with multiple lanes which makes such systemsexpensive. This paper tackles the mentioned problem

WSCG 2014 Conference on Computer Graphics, Visualization and Computer Vision

Poster Proceedings 67 ISBN 978-80-86943-72-5

by modifying the image processing part to enable ve-hicle detection and tracking on multiple lanes in realtime. Image processing is parallelized and its executionis distributed between the CPU and GPU. So, only onecamera per road is needed making such systems simplerand cheaper.

This paper is organized as follows. Second sectiondescribes the approach used in this paper. Third sec-tion describes the vehicle tracking algorithm. Follow-ing fourth section presents obtained results. Paper endswith conclusion and future work description.

2 VEHICLE DETECTIONBasic work flow of the proposed system consists of fourmain parts: (i) image preprocessing; (ii) Fg/Bg imagesegmentation; (iii) pixel clusterization; and (iv) mul-tiple object tracking. The task of the image prepro-cessing part is to enhance the image imported from avideo stream using a blur filter. After preprocessing, theimage is passed through Fg/Bg image segmentation toseparate foreground (moving) from background (static)segments in the image. This method is based on creat-ing a background model from a large number of prepro-cessed images and comparing it with the new prepro-cessed image. The Fb/Bg segmentation result is thenpassed through pixel clusterization which computes lo-cation of each object (vehicle) in a scene and tracks itstrajectory through consecutive images (frames). Finalpart for multiple object tracking of the proposed systemperforms also vehicle counting using markers definedin the scene.

In order to accurately detect, track and count vehiclesobtained traffic video needs to satisfy following re-quirements: (i) camera perspective in the video footagemust be constant over time (fixed mounted camera);(ii) all moving objects in the scene are vehicles (sys-tem does not perform classification between detectedmoving objects); and (iii) traffic video must not be pre-viously preprocessed with image enhancing algorithms(for example auto-contrast function which can degradethe system accuracy).

Algorithms for image downsampling, image prepro-cessing and Fg/Bg segmentation are executed entirelyon GPU. Today’s GPUs enable high parallelization sup-port for mentioned algorithms and can reduce execu-tion time for basic image processing operations. Part ofthe system related to image preprocessing is performedwith a single render call. In Fg/Bg segmentation, cre-ation of background model is performed with 8 render

Figure 1: CPU/GPU computation distribution.

calls to process one frame. Moving object detection(comparison of background model and current image)is performed with one single render call. Algorithms forpixel clusterization and vehicle tracking are not suitableto run on GPU because of their structure and thereforeare run entirely on CPU. Pixel clusterization and ve-hicle tracking algorithms are made of many dynamicnestings and IF clauses, and too complex to run on GPUin parallel. The mentioned image processing workflowand distribution of computations between the CPU andGPU is given in Fig. 1.

2.1 Image PreprocessingEvery image imported from the road traffic videofootage contains a certain percentage of noise. Noisecomplicates the vehicle detection process and signifi-cantly reduces the accuracy of the described systemso it needs to be minimized. For noise reduction, blurfilters are commonly used. They reduce the number ofdetails in the image including noise. In the proposedsystem a 4× 4 matrix Gaussian blur filter is used fornoise reduction. Work flow of image preprocessing isgiven in Fig. 2.

Figure 2: Image preprocessing work flow.

2.2 Foreground / Background Image Seg-mentation

After the imported image has been successfully pre-processed, Fg/Bg image segmentation is performed asshown in Fig. 3. As mentioned, this process consistsof creating a background model of the scene and com-paring computed background model with the latest im-age imported from the video [Con13]. The backgroundmodel is obtained using the following equation:

BGt=BGt−1 +

n∑

i=1sign(Ii −BGt−1)

n

, (1)

where BGt represents value of a specific pixel in thebackground model for current frame, BGt−1 is value ofa specific pixel in the background model for the previ-ous frame, Ii is a value of the certain pixel in i th image,and n is the number of stored images.

By comparing mentioned pixels in imported images,every pixel in the currently processed image can beclassified. If difference between current image pixel



value and background model pixel value is larger thanspecified threshold constant, pixel is classified as a partof a foreground object. Otherwise it is considered as apart of the background model. Result of preprocessingand Fb/Bg image segmentation is given in Fig. 4.

2.3 Pixel ClusterizationAfter each pixel in the image is classified as part of aforeground object or as a segment of the backgroundmodel, pixel clusterization needs to be performed. Usedapproach is based on marking all adjacent pixel thathave the same pixel value as a part of a specific clus-ter. Afterward, all pixels within the same cluster arecounted and their minimum and maximum values of xand y coordinates are found. With mentioned informa-tion clusters can be represented as rectangles. Rectan-gle center is used as the cluster center.In the proposed system, pixel clusterization is per-formed only on foreground pixels. Additionally, allclusters that do not contain enough pixels relatedto them are discarded and excluded from furtherprocessing.

3 VEHICLE TRACKINGThe clustering part returns a large number of clustersi.e. objects or possible vehicles. To sort out objects

Figure 3: Fg/Bg image segmentation work flow:a) background model creation, and b) backgroundmodel and current image comparison.

Figure 4: Original image (a) passed through prepro-cessing algorithm (b) and Fg/Bg segmentation (c).

Figure 5: Vehicle tracking and counting on two lanes.

that are not vehicles, filtering is applied. In the pro-posed system, spatio-temporal tracking of objects in ascene is used for filtering. Every currently tracked ob-ject in the scene is compared with each cluster detectedin the current image. Cluster that does not match withany of the previously detected objects is set as a newobject. Cluster matching is performed by searching forthe largest weight factor related to the cluster and spe-cific object. Cluster will be assigned to the object withhighest weight factor. Appropriate weight factor w iscomputed using the following equations:

wdist =1− d −dmin

dmax −dmin, (2)

warea =1− a−amin

amax −amin, (3)

wcover=ais

max(aob j,acl), (4)

w =wdist +warea +wcover

3, (5)

where d is distance between location of the specificcluster and estimated object location, dmin and dmax areminimum and maximum distance between all clustersand processed object, a is difference between the clusterarea (size) and estimated object area, amin and amax areminimum and maximum difference between all clustersarea and estimated object area respectively, ais is inter-section area between cluster and object, aob j is area ofthe object, and acl is the processed cluster area.

To compute the distance between location of the spe-cific cluster and estimated object location their geomet-ric centers are used. Cluster and object area are com-puted as their surrounding bounding box area.

4 EXPERIMENTAL RESULTSThe proposed system has been tested using real worldtraffic footage captured on a highway with two lanesnear the city of Zagreb in Croatia. Camera was mountedabove the highway and passing vehicles were recordedusing a top view camera perspective as given in Fig. 5.Duration of the test video was 10 [min]. Obtained orig-inal video resolution is 1920×1080 pixels (RGB).



ApproachVehicle count

Total LaneLeft Right

Overlapcheck

Hits 126 65 61FP / FN 0/6 0/5 0/1

Accuracy 95.6% 92.9% 98.4%

Trajectorycheck

Hits 129 68 61FP / FN 1/4 0/3 1/1

Accuracy 96.2% 95.8% 96.8%True vehicle count 132 70 62

Table 1: Counting results of the proposed system.

For experimental results, two approaches for vehiclecounting were tested. Both are based on markers (vir-tual vehicle detectors). Markers are placed in bottompart of the scene on each lane as shown in Fig. 5 withyellow and red rectangles. Yellow color denotes an in-active marker and red color an activated marker. Edgesof markers are perpendicular to the image x and y axis.When a vehicle passes through marker and a hit is de-tected, counter for that marker is incremented. First ap-proach checks if an object is passing through markerwith its trajectory and second approach performs checkif an intersection between marker and object exists.Both approaches discard all objects whose trajectory di-rection is outside of a specific interval. In performedtest, all moving objects need to have their direction be-tween 90−270 [◦] in order not to be discarded. Objectsalso need to be in the scene for more than 30 frames.Value of the threshold constant used in Fg/Bg segmen-tation method is 10 and number of consecutive imagesused when creating background model (n) is 105. Bluelines in Fig. 5 represent computed vehicle trajectory.Experimental results are given in Tab. 1. FP repre-sents false positive and FN represents false negativehits. True vehicle count is acquired by manually count-ing all passed vehicles.

In Fig. 6, execution time is given for various reso-lutions tested on Windows 7 (64bit) computer with

Figure 6: Comparison of execution time of the pro-posed system.

CPU Intel Core i7 - 2,4 GHz, GPU NVIDIA QuadroK1000M video card and 8 GB RAM. In the exper-imental testing, both approaches (overlap and trajec-tory check) for vehicle counting had the same execu-tion time. Tested application was compiled without op-timization and with GPU support. Video importing wasperformed by Microsoft Direct Show framework.

From the acquired results it can be concluded that realtime vehicle detection can be performed on SVGA andlower resolutions using a standard PC computer. OnSVGA resolution, 37 [ms] is required to process asingle frame. This enables maximum frame rate of27 [fps]. At QVGA resolution, 52 [fps] can be achievedwith 19 [ms] required to process a single frame. It canalso be concluded that approach with trajectory checkgives better results (accuracy) than approach with over-lap check. In second testing application was fully opti-mized by compiler and FFMPEG framework was usedfor video importing. Achieved results in the secondtesting show that application with GPU support and ca-pability of executing on 8 threads by CPU can achievemuch faster execution. At SVGA resolution, executiontime for each frame was 17 [ms] which enables pro-cessing of 58 [fps]. At QVGA resolution, executiontime was 7 [ms] which gives capability of processing142 [fps]. The highest resolution in which applicationcan still perform real time image processing is HD720resolution with frame rate of 35 [fps]. In Fig. 8, ratio

Figure 7: Comparison of application execution timewith and without GPU and multi-thread capabilities.



Figure 8: Execution time distribution between appliedimage processing tasks.

between execution time of a specific image processingtask and overall execution time of the implemented ap-plication is given.

For the purpose of determining the efficiency of theproposed system with GPU and CPU multi-threadsupport, implemented application was modified to runwithout GPU support (only on CPU). At SVGA reso-lution application execution time for each frame was167 [ms] (5 [fps]) with multi-thread capability (8 work-ing threads) and 612 [ms] (1 [fps]) without multi-threadcapability (single thread). In Fig. 7, comparison ofpreviously mentioned version of implementations isgiven. All developed version of applications providesame results regarding vehicle detection accuracy(number of hits, FP and FN detections).

5 CONCLUSION AND FUTUREWORK

In this paper a system for vehicle detection and trackingon multiple lanes based on computer vision is proposed.Developed system uses only one camera to detect andtrack vehicles on a road with multiple lanes. First test-ing results are promising with vehicle detection accu-racy of over 95%. Methods used in the proposed systemare easy to implement and to parallelize. They are alsosuitable for executing in GPU and CPU multi-threadenvironments enabling real-time capabilities of the pro-posed system. Implemented vehicle trajectory trackingcurrently does not use any vehicle dynamics and pre-dicts the tracked vehicle pose for one succeeding frameonly.

From the execution time distribution analysis resultscan be concluded that algorithm for Fg/Bg image seg-mentation is the most slowest part of the application andit consumes 76% of the total application execution time.Further optimization of this algorithm would lower sys-tem requirements of the application and increase max-imum resolution at which real time image processingcan be achieved. It would also allow other complex al-gorithms (vehicle classification, license plate recogni-tion, etc.) to be implemented into the system and exe-cuted in real time.

Future work consists of developing a multiple objecttracking system which would estimate vehicle trajec-tory based on a vehicle model with dynamics included.Additionally, it is planned to develop a system whichwill perform vehicle classification and therefore sepa-rate vehicles by their type. This will enable detectionand tracking of vehicles that are coming to standstill oncrossroads.

6 ACKNOWLEDGMENTSThis work has been supported by the IPA2007/HR/16IPO/001-040514 project “VISTA - Computer VisionInnovations for Safe Traffic” which is co-financedby the European Union from the European Regionaland Development Fund and by the EU COST actionTU1102 - “Towards Autonomic Road Transport Sup-port Systems”. Authors wish to thank Marko Ševrovicand Marko Šoštaric for obtaining the road traffic video,and to Sanja Palajsa for preparing the test video.

7 REFERENCES[Con14] Pirc, J., Gostiša, B. Comparison between li-

cense plate matching and bluetooth signature rei-dentification systems for travel time estimation onhighways, in Conf. proc. ISEP, 2014.

[Con06] Bai, H., and Wu, J., and Liu, C. Motionand haar-like features based vehicle detection,in Conf. proc. MMM, 2006.

[Con12a] Braut, V., and Culjak, M., and Vukotic, V.,and Šegvic, S., and Ševrovic, M., and Gold, H.Estimating OD matrices at intersections in air-borne video - a pilot study, in Conf. proc. MIPRO,pp.977-982, 2012.

[Con13] Kovacic, K., and Ivanjko, E., and Gold, H.Computer vision systems in road vehicles: a re-view, in Conf. proc. CCVW, pp. 25-30, 2013.

[Pon07] Ponsa, D., and López, A. Vehicle trajectoryestimation based on monocular vision, Patternrecognition and image analysis, Vol. 4477, pp.587-594, 2007.

[Con00] Stanisavljevic, V., and Kalafatic, Z., and Rib-aric, S. Optical flow estimation over extendedimage sequence, in Conf. proc. MELECON, Vol.2, pp. 546-549, 2000.

[Con07] Tanaka, T., and Shimada, A., and Arita, D.,and Taniguchi, R.I. A fast algorithm for adaptivebackground model construction using parzen den-sity estimation, in Conf. proc. AVSS, pp. 528-533,2007.

[Con12b] Xiaoli, H., and Dianfu, H., and Xing, Y. Ve-hicle window detection based on line features, inConf. proc. IMSNA, Vol. 1, pp. 261-263, 2012.



Real time vehicle detection and tracking on multiple lanes · Real time vehicle detection and...

Documents

Transcript of Real time vehicle detection and tracking on multiple lanes · Real time vehicle detection and...