Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4...

49
Master’s thesis Two years Electronics Design Titel Robust and flexible hardware implementation of ITU-G4 Aart Mulder Campus Sundsvall Holmgatan 10, SE-851 70 Sundsvall. Campus ¨ Ostersund Kunskapens v¨ ag 8, SE-851 25 ¨ Ostersund. Phone: +46(0)771 97 50 00. Fax: +46(0)771 97 50 01.

Transcript of Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4...

Page 1: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Master’s thesis

Two years

Electronics Design

TitelRobust and flexible hardware implementation of ITU-G4

Aart Mulder

Campus Sundsvall Holmgatan 10, SE-851 70 Sundsvall.Campus Ostersund Kunskapens vag 8, SE-851 25 Ostersund.

Phone: +46(0)771 97 50 00. Fax: +46(0)771 97 50 01.

Page 2: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

.

Mid Sweden UniversityElectronics Design(EKS)

Examiner: Bengt Oelmann, [email protected]: Najeem Lawal, [email protected]: Aart Mulder, [email protected]: Master of Science, Electronics Design, 180 creditsSemester, year: 1-2, 2014

Page 3: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Abstract

This project was carried out as thesis work during the last semester of my Master studiesElectronics Design at the Mid Sweden University. Firstly, it considers a robust and flexibleimplementation of ITU-G4 in hardware based on earlier work, and secondly, it covers review ofrelated work and investigation in the weaknesses of two published designs. More specifically,it is an investigation on the robustness of the previously developed VHDL implementation ofthe ITU-G4 algorithm. This includes designing of a debug interface to track the compressionprocess inside the FPGA. The final result, when comparing to earlier work and other publisheddesigns, the ITU-G4 compression performs without any glitches or crashes at certain patterns.The maximum frame rate the design can run at is 60fps at a frame size of 752x480 and clockrate of 33.3MHz. The design is tested with three sets of images: easy, medium and complexwhich are all successfully compressed. This includes imperfect images of bar-codes and Q-codeswithout the need of morphological preprocessing when comparing to the published design thatneeds preprocessing for medium and complex images to remove unexpected transitions.

The ITU-G4 implementation is available for download at opencores.com [18].

Keywords: VHDL, Image compression, Fax4, ITU-G4

December 29, 2014 1 Aart Mulder

Page 4: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Foreword

I am very grateful to my project supervisor Dr. Najeem Lawal, Mr Muhammad Imran, OkonkwoOnyedika Sunday and Pat Du Russel for their technical support and all my friends and familyfor their help and confidence at difficult times for to be able to complete this project success-fully. Furthermore, I would like to thank the Department of Electronics Design, Mid SwedenUniversity for providing an anjoyable study environment and the necessary equipment to fullfillthis challenge.

In advance I estimated that this project was a great challenge and now, afterwards I can say thatit has indeed forced me to take a serious step forward. The past three and a half years at MidSweden University have besides the technical knowledge and skills, enriched my personal andsocial skills at least as much, due to the different environment compared to my home country.

This thesis work respects the rules about plagiarism and so does not contain any text copiedfrom other articles or books. Furthermore, it does not include any animal abuse and doesno harm to privacy. All electronic equipment used is certified according to standard safetyregulations

December 29, 2014 2 Aart Mulder

Page 5: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Contents

Abstract 1

Foreword 2

Table of Contents 4

List of figures 5

List of tables 6

1 Introduction 7

2 Project 92.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Compression algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Hardware implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Task description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Methodology 163.1 Functional testing of the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 System diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Resource optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Implementation 194.1 Debugger and camera simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Resource optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Splitting of the horizontal code . . . . . . . . . . . . . . . . . . . . . . . . 244.2.2 Byte segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2.3 Transmission memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Multi-platform evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Client application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Results 28

6 Discussion 31

7 Conclusion 34

December 29, 2014 3 Aart Mulder

Page 6: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

8 Future work 35

Appendix 36

A Set of test images 37

B Huffman tables 39

C Debugger diagram 41

D Transmission memory diagram 43

E Source code 45

Bibliography 45

December 29, 2014 4 Aart Mulder

Page 7: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

List of Figures

2.1 The three different coding schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 ITU-T Rec. T.6. coding flow diagram . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Selection of the test set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Block diagram of the FPGAlink as available online. . . . . . . . . . . . . . . . . . 194.2 Control path/state machine of the debugger. . . . . . . . . . . . . . . . . . . . . 204.3 Block diagram of the complete system with debugger and camera simulator. . . . 214.4 Table view of the Huffman output. . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5 Table view of the transmission memory contents. . . . . . . . . . . . . . . . . . . 234.6 The horizontal coding scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.7 Symbolic diagram of the oldbyte segmentation module. . . . . . . . . . . . . . . . 254.8 Symbolic diagram of the newbyte segmentation module. . . . . . . . . . . . . . . 254.9 Symbolic diagram of the variable-with-input transmission memory. . . . . . . . . 264.10 The client application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.11 The port selection window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1 The ITU-G4 compressed test image with cropping effect due to memory size of6000 for proposed v1 and 1000 for proposed v2. . . . . . . . . . . . . . . . . . . . 30

6.1 Design comparison in a real world application. . . . . . . . . . . . . . . . . . . . 316.2 Compression result of the complex image set with 100% success rate. . . . . . . . 32

C.1 Data path of the Debugger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

D.1 Data path of the transmission memory module. . . . . . . . . . . . . . . . . . . . 44

December 29, 2014 5 Aart Mulder

Page 8: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

List of Tables

3.1 Comparison of simulation and debugging. . . . . . . . . . . . . . . . . . . . . . . 18

4.1 Huffman coding table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1 Maximum clock speed of all the submodules in the design for a Spartan-3E 1200. 285.2 Comparison of the device utilization for the Spartan-3E 1200. . . . . . . . . . . . 285.3 Comparison of the timing results for the Spartan-3E 1200. . . . . . . . . . . . . . 295.4 Device utilization for the Spartan-6 LX45. . . . . . . . . . . . . . . . . . . . . . . 295.5 Timing results for the Spartan-6 LX45. . . . . . . . . . . . . . . . . . . . . . . . 295.6 Energy comparison of the systems. . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.1 Memory usage per image group. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

B.1 Make-up codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39B.2 Termination codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

December 29, 2014 6 Aart Mulder

Page 9: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 1

Introduction

This project is carried out as a 30 credit thesis project during the fourth and last semester ofmy Master studies Electronics Design at the Mid Sweden University. The design is part of theeagle project where it is used to compress black and white images before transmitting them viaa radio link to the server which reduces the transmission time and so the energy consumption.It is divided into two parts, firstly, implementation of a robust and flexible implementation ofITU-G4 on a FPGA platform based on earlier work, and secondly, review of related work andinvestigation on the weaknesses of two published designs. The first part consists of the twosubparts design of a debug system and optimization of earlier work.

The aim of the debug system is to simulate the camera data by streaming user defined imagesfrom a PC application to the FPGA and feed them into the compression algorithm. Anotherfunctionality is monitoring and modification of registers in the design. The greatest advantageof the debug system is that it makes it possible to examine the behaviour of the hardware designfor specific images and at a later stage of the eagle detection system it is even possible to test theimage segmentation, ROI and compression algorithm simultaneously on hardware. Furthermore,the system can be used for hardware acceleration when connecting it as a hardware-in-the-loopsystem. An edited version of the debug client and client application would do the work.

Optimization of earlier work is necessary to reduce the device utilization and to support multipleplatforms since the earlier work was designed for the Spartan-3E. This can be done splitting thehorizontal tiff-runs into a white and black part, or in other words doing time-multiplexing inorder to reduce the maximum tiff-code length to nearly the half. This reduces the complexity ofall the modules in the system and makes it possible to do the byte segmentation before storagewhich optimizes the memory usage by at least four times.

The client application that handles the RS232 communication is developed during the 15 creditproject preceding this thesis work because it was needed for verification purposes at that time,however, administratively it is considered part of this thesis work. Its purpose is to offer auser-friendly interface that is operating system independent, has the possibility to examinepreviously captured images and offers pan and zoom functionality.

December 29, 2014 7 Aart Mulder

Page 10: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Investigation and verification of the functionality of the system is to be carried out with a setof images divided into three categories; easy, medium and complex. Other options would be todo a deployment or test the system with every possible image. The images need to as complexas possible from the point of view of tiff-runs which for example could be imperfect receipts orbar-codes. The debugger is excellent for doing the verification because it can stream images tothe board and emulate the camera input.

The aim of the project is to create three different versions of the ITU-G4 implementation. Thefirst one is dedicated to non-complex images that contain little transitions which results in a lowmemory usage. The second version has maximum memory usage and is dedicated to all compleximages. The last one is a rapid-prototyping version that can be used to do hardware-in-the-loopoperation.

December 29, 2014 8 Aart Mulder

Page 11: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 2

Project

This project is part of a bigger project that investigates in the possibility to detect the presenceof eagles around wind mills by a vision system and interaction with them to stop the wind millswhen an eagle comes too close. The vision system consists of a network of battery poweredcamera systems and a centralized server. The aim of this project in particular is to minimizethe time and energy that it takes to transmit captured images to the server.

2.1 Related works

2.1.1 Compression algorithms

Colour image compression

The JPEG2000 standard, as described in article [11] is the successor of the old JPEG standardand has a broad field of applications in comparison to JPEG. Although it has been availablesince 2000 and out-performs its predecessor in every way it is not as popular. It supports bothlossy and lossless compression, Mono-tone, gray-scale and color images, Region of Interest,Tiling and motion objects.

The aim of the Joint Photographic Expert Group(JPEG) was to replace discrete cosine trans-form with a transform function that has higher performance in both quality and compression.The wavelet transform is adopted as the replacement together with a ranging component andtwo coding blocks: Tier1 and Tier2 which make soft and hardware implementation of JPEG2000considerably complex [16]. The first one does entropy coding per code block and is a combina-tion of fractional bit-plane coding(BPC) and binary arithmetic coding(BAC). Tier2 generatesthe layer and block summary information per code block. Furthermore, JPEG2000 is com-putationally intensive due to the fact that the Tier1 processes every bit and every bit-planesequentially [2].

The efficient lossless compression of color images makes JPEG2000 a serious candidate for thecompression of high resolution sky images. However, it is not interesting for battery poweredvision nodes because of its high energy consumption.

Grayscale image compression

Bit-plane compression, as explained in article [13] is an image compression method used tocompress images by at least 2 bit/pixel. The article describes how this methodology can be

December 29, 2014 9 Aart Mulder

Page 12: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

used to store 16 bit/pixel magnitude or 32 bit/pixel complex Synthetic Aperture Radar(SAR)data without loss. The algorithm consists of two stages, the first stage, as the name indicates,splits the image into n planes(n is the number of bit/pixel) with the same resolution as theoriginal image. The second stage compresses the separate bit planes with a certain compressionmethod, in this article ITU-G3(1D) is used.

The advantage of this method is that the bit planes can be compressed using a bi-level compres-sion algorithm which is most importantly less complex than gray-scale compression algorithms.In terms of hardware implementation, the bit plane compression method with ITU-G3 is easierto implement, has a lower device utilization and therefore a lower energy consumption since itdoes the compression in one cycle. According to the article it is outperformed by Multiple-PassGAL+Arithmetic Coding, JPEG-LS and JPEG2000 in the case of 16 bit/pixel real-valued im-ages. However in the case of 32 bit/pixel complex-valued images it performs at least as good asthe JPEG-LS and JPEG2000 algorithms.

Binary image compression

The BACIC algorithm [9] which stands for Block Arithmetic Coding for Image Compression isdeveloped as a replacement for JBIG and ITU-G3, of which both have their disadvantages. ITU-G3 is easy to implement but has no high compression ratio whereas JBIG is the opposite. It hasa high compression ratio but is complex to implement and uses technologies that are patented.This includes the QM-coder patented by IBM. BACIC has both a high compression ratio andis easy to implement. It is a lossless compression algorithm developed for text documents andbi-level halftone images.

BACIC is based on Block Arithmetic Coding(BAC) which uses an adaptive probability tablewith a 12 bit context of previous pixel values as index. The content of the table is used todetermine the value of p1, the probability that a pixel equals 1. BACIC can use two differ-ent templates for the 12 bit context which are designed for text and bilevel halftone imagesrespectively.

It is not within the scope of this work to do a comparison of different algorithms. Although,it would be necessary to do a comparison of BACIC and ITU-G4 in order to see how they arerelated in terms of compression ratio since this article compares BACIC with ITU-G3 whichhas reasonable smaller compression ratio. The article does not explain in enough detail todetermine if the BACIC algorithm can be implemented in such a way that it compresses on thefly and no storage is needed before compression. Between the lines it suggests that firstly theadaptive probability table is created and then transfered into a code stream while creating theBAC coding tree on the fly.

2.1.2 Hardware implementation

The following article [1] describes an FPGA implementation of the BACIC algorithm andpresents a method to make the BACIC algorithm lossy and thereby increasing the compressionratio. This method is called Low-Latency Greedy Flipping Utilizing Forgetful Error Diffusion.In other words it creates certain changes in the image that increase the compression ratio butare not visible to the eye.

December 29, 2014 10 Aart Mulder

Page 13: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

The design is based on Xilinx Virtex XCV-300 FPGA, and it uses 92%(378998) of the deviceand has a maximum clock frequency of 41.259MHz. The power dissipation is 546.61mW and thecompression time for an error diffused image is 203ms. This results in an energy consumptionper image of

0.546mW × 0.203s = 0.11J (2.1)

The energy consumption of the ITU-G4 implementation represented in this work is

5V × 0.181mA× 0.0167s = 0.015J (2.2)

Virtex XCV300 Spartan-3E 1200

On chip Used Used(%) On chip Used Used(%)

System gates 411955 92 Slices 8672 1526 17

Logic Gates 82944 Slice FF’s 17344 716 4

CLB Array 32x48 4 input LUT’s 17344 2922 16

Logic Cells 6912

User IO 316 IO 250 24 9

BRAM bits 131k BRAM bits 504k 162k 32

According to the article, the BACIC algorithm with lossy compression compresses up to 8 timesbetter than ITU-G4. However, this comparison is based on Dot Dither images whereas ITU isdesigned for text files. The compression ratio of BACIC(lossless) is similar to that of JBIG. Soin terms of compression ratio ITU-G4 is outperformed by both JBIG and BACIC but in termsof energy consumption and implementation complexity they are not interesting for a VisualSensor Network. The energy used for compression could instead be used to do client-side imageprocessing.

2.2 Background

ITU-G4 is developed for lossless compression of fax messages but is nowadays used for compres-sion of images as well. It uses a combination of horizontal and vertical encoding; the horizontalmode is also known as run-length encoding. The algorithm looks for transitions in the currentline and refers them to the preceding line. Figure 2.1 shows an example of the three situationsthat can occur. The positions b1 and b2 are located at the reference line where b1 is the firstblack to white transition to the right of a1 and b2 the following. b1 is placed at the imaginaryposition ahead of the last pixel(right side). At the start of a line a0 is placed at the imaginaryposition before the first pixel, so position 0 if we start counting at 1. An imaginary white lineis used as reference for the first line. The same goes for b2. If the first transition(a1) is withinthree pixels of b1 then vertical coding is used, otherwise horizontal coding is used. The decoderexpects a1 to be before or the same as b2 so in case 2.1a an exception is made on the verticaland horizontal coding. The process described in this paragraph is schematically shown in Figure2.2.

December 29, 2014 11 Aart Mulder

Page 14: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

(a) Pass mode. (b) Vertical mode. (c) Horizontal mode.

Figure 2.1: The three different coding schemes.

[17]

December 29, 2014 12 Aart Mulder

Page 15: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Figure 2.2: ITU-T Rec. T.6. coding flow diagram

[17]

December 29, 2014 13 Aart Mulder

Page 16: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

In Figure 2.2 the term facsimile block is used to denote a fax message or image. Accordingto Yahoo Education [6] the definition of facsimile is: ”An exact copy or reproduction of adocument” or [another word for fax] according to The Free Dictionary [7].

2.3 Task description

1. Creating a debug interface.

• Create a PC application and VHDL module that simulate the camera input

• This means that the camera clock is bypassed which means it can be stopped and reg-isters be read out at any time to be able to find problems in the ITU-G4 compressionat any location in the image, i.e. make a replacement of the chipscope module.

• Extend the PC application with a readout of the registers.

• The reason for creation of this debug interface is that chipscope is not available whentesting the ITU-G4 algorithm on an Altera FPGA and the Nexys2 development boardhas to little memory to do a good analysis.

2. Optimizing the Fax4 module.

3. Optimizing the Huffman module.

4. Optimizing the Byte segmentation.

5. Optimizing the Transmission memory module.

6. Functional test of the design with the set of images defined in the Method 3 chapter

7. Testing the design on different ISE versions, different Xilinx FPGA families and otherFPGAs.

• Client side software application

December 29, 2014 14 Aart Mulder

Page 17: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

2.4 Motivation

This project is part of a bigger project that investigates the possibility to detect the presence ofeagles around wind mills by a vision system and interaction with them to stop the wind millswhen an eagle comes too close. The vision system consists of a network of battery poweredcamera systems and a centralized server. This is kind of network is often called a wirelessVSN(vision sensor network). The challenge in wireless VSNs is to create a system with aslong life time as possible. Image processing on the client side consumes too much power asconcluded in earlier research done by PhD student Imran Muhammad. The solution is to do aslittle processing as possible on the client and send the data in an efficient way to the server whereendless processing power is available. The most common solution is to use JPEG compressionand send the raw image in compressed form to the server. The advantage is short developmenttime because the algorithm is already available in VHDL; the disadvantage is that JPEG islossy. However, the aim of the eagle project is to capture color sky images and segment theminto monotone images where the sky is white and the eagles black. This requires a monotonecompression algorithm. Fahad Lateef has done a Masters thesis project where he compareddifferent compression algorithms and concluded that ITU-G4 was the most suitable algorithmfor this specific project. The disadvantage of this approach is a long development time, becausethe algorithm is not implemented yet, and higher energy consumption due to segmentation onthe client, however, even more energy is saved due to a reduced transmission time. Furthermore,ITU-G4 is lossless.

December 29, 2014 15 Aart Mulder

Page 18: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 3

Methodology

3.1 Functional testing of the system

The biggest challenge of implementing ITU-G4 in VHDL is probably to eliminate the smallerrors that show up when testing the design in a real world situation due to the random inputdata. To verify the system the following three methods are considered.

Real world deploymentThe most realistic test, considering that the system is designed to be used for eagle monitoring,is to deploy the system where it is going to be used later on. However, this is a very timeconsuming method since one needs to maintain and check the system regularly. Furthermore,there are no areas near Sundsvall with an eagle population.

All imagesThe ultimate method would be to simulate the design with every possible image in order toreach an error probability of 0%. The disadvantage of this method is that the number of testimages for a resolution of 752 × 480 equals:

2752×480 = 2360960 = 6 · 10108659 (3.1)

which is an practically impossible amount to verify.

Set of test imagesThe third method is to use a set of chosen test images in order to obtain a certain successionprobability. This image set must then be based on the environment where the design is going tobe used, or a set that is as complex as possible if there are no complex images in the deploymentenvironment. This method is chosen because it is the most suitable for this project. The setis divided into three groups: easy, medium and complex, where the complex group containsimages with bar-codes and Q-codes which are considered most complex from the point of viewof compression. Some of them contain a kind of dust at the edges making them even morecomplex. The complete set is presented in Appendix A.

December 29, 2014 16 Aart Mulder

Page 19: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

(a) Easy (b) Medium (c) Complex

Figure 3.1: Selection of the test set.

3.2 System diversity

The ITU-G4 algorithm is to be implemented in three different variations or system setups. Allof them are full ITU-G4 implementations but with different properties. The first is dedicatedto images of the easy set so that the memory is of minimum size. The second has maximummemory usage to guarantee compression of all the images in the test set and the last one isdedicated to rapid prototyping where it contains a debug interface capable of doing in-circuitcamera simulation, register and BRAM lookup. The last system setup makes it possible to dohardware acceleration when connecting the FPGA as a hardware-in-the-loop setup.

1. Minimized memory usage for the compression of easy images.

2. Maximum memory usage for the compression of complex images.

3. Rapid prototype setup for camera simulation, register and BRAM lookup.

The reason for development of the debug interface is that successful behavioural simulationdoes not guarantee successful execution on hardware and that the methods offered by Xilinx donot suite the needs of this project, of which camera emulation is most important. The methodsoffered by Xilinx are Post-Translate, Post-Map and Post-Place & Route simulation of whichthe last one has the exact same behaviour as a FPGA. Furthermore Xilinx ChipScope is offeredwhich is an in-circuit analyzer with a graph-based interface. Table 3.1 shows a comparison ofthe available verification and debug methods.

December 29, 2014 17 Aart Mulder

Page 20: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Method Advantages Disadvantages

Xilinx Post-Place & Route sim-ulation or ChipScope

• Quick setup

• No additional program-ming needed

• Fixed to Xilinx FPGAs

• Difficult in analysis(onlygraphs)

• Very heavy for a PC andtherefore time consum-ing(valid for Post-Place &Route)

On board debugger with self-made PC interface application

• Platform independent

• Fast in execution

• In-circuit camera simula-tion

• BRAM examination inthe form of a table or im-age(PC application).

• Register read and write

• Very suitable for fu-ture extensions such asRegion-Of-Interest andChange-Coding

• Initial setup is very timeconsuming

Table 3.1: Comparison of simulation and debugging.

The on board debugger with a self-made PC interface application gives the most benefits forcurrent and future developments as well despite the long setup time. Setting up the debuggeris a sub-project in itself because it is not a simple plug-and-play library. More backgroundinformation on this is given in section 4.1.

3.3 Resource optimization

The method used to decrease the device utilization is splitting the horizontal ITU-G4 code intotwo separate parts, the black and white part. This shrinks the code length from 45 bit to 28 bitwhich has impact on all the modules except the serial port and capture manager. The designcomplexity increases only slightly for the ITU-G4 run detector because of the time multiplexing,whereas it decreases significantly for the other modules.

December 29, 2014 18 Aart Mulder

Page 21: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 4

Implementation

4.1 Debugger and camera simulator

FPGALink is an open source package that can support in the setup of a PC application-FPGAcommunication link for exchange of live data, i.e. debug information or measurement data. Itconsists of a PC side plugin(FPGALink DLL) and a VHDL module(CommFPGA Module) asshown in Figure 4.1. The third small block represents the USB chip that carries out the USB- parallel and USB - JTAG conversion.

Figure 4.1: Block diagram of the FPGAlink as available online.

[19]

December 29, 2014 19 Aart Mulder

Page 22: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

The camera simulator is built on top of this package as shown in Figure 4.3 by the two grayblocks inside PC Debug Application and CamSim. It is built with a separate control anddata path where the data path consists of a line buffer, x and y counter and a delay counter,see Figure C.1 in Appendix C. The control path is shown in Figure 4.2 which works as follows:it waits until a complete line is received and stored in the line buffer, then sends this data outover the pix o signal synchronously with the pclk o signal, then creates a delay to simulate theline gap period and goes back to where it waits for the line buffer. Furthermore the rsync osignal is driven high while the line buffer contents is sent out and the fsync o signal is drivenhigh when sending out pixel (1, 1). The time between two lines depends on how fast the pixeldata is sent from the PC application.

Figure 4.2: Control path/state machine of the debugger.

The CamSim module is a debugger as well that it makes certain register and state machinevalues visible to the PC application. It stores the Huffman output for later examination, hasa side door to the transmission memory and reads the state machine state of the Capturemanager.

December 29, 2014 20 Aart Mulder

Page 23: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Figure 4.3: Block diagram of the complete system with debugger and camera simulator.

Figures 4.4 and 4.5 show the user interface of the PC application. The upper line is for selectionof the *.xsvf file and programming the FPGA and the line below that to select the image to bestreamed to the camera simulator which is shown in the tab called Image Viewer. The secondtab(Transmitted BW Image) shows the black/white version of the image in the previous tab,which is of course the same if that does not contain colours. The third tab(Tx Mem Image)shows the image constructed out of the data shown in the last tab(Tx Mem) which containsa table with contents of the transmission memory. The Seg In Mem tab contains a table withcontents of the Huffman output that is temporarily stored in a memory inside the CamSimmodule. The three buttons Single line, Stop CAM Sim and Start CAM Sim are used to controlthe streaming of the image data to the FPGA. The button Read Tx Memory, as indiceted isused to read the transmission memory contents and the button Read Seg In Memory to readthe Huffman output. The list with variables/registers at the right side is periodically updatedand the text field at the bottom is a debug log.

The Seg In Mem and Tx Mem tab have proven to be very helpful in locating errors because theycan tell in which module the error occurs and the tables can be compared with the outcome ofthe ISim simulation.

December 29, 2014 21 Aart Mulder

Page 24: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Figure 4.4: Table view of the Huffman output.

December 29, 2014 22 Aart Mulder

Page 25: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Figure 4.5: Table view of the transmission memory contents.

The FPGALink package comes with a makefile which is supposed to be used to synthesize anyVHDL project that includes the FPGALink package to prevent certain problems due to possibleincorrect settings when synthesizing with Xilinx ISE. Together with the Sigasi [20] VHDL pluginfor Eclipse that offers dynamic linking and code highlighting it is possible to develop VHDLinside the user-friendly environment of Eclipse. The disadvantage is that the makefile needs tobe edited manually, that the Xilinx ISE path and project path must be set and the necessaryVHDL files be listed. The project can be built by configuring the External Built option to callthe make file.

When connecting to the board by calling the connect function of the DLL, a temporary firmwareis loaded into the USB chip that is suited for further FPGALink library calls. This firmware isthen used to program the FPGA and can only handle xsvf, files which is another reason thatthe makefile must be used to built the project.

December 29, 2014 23 Aart Mulder

Page 26: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

4.2 Resource optimization

4.2.1 Splitting of the horizontal code

The ITU-G4 algorithm codes transitions in an image according to table 4.1 where the pass-modeand vertical-mode have a maximum code length of four and seven bit respectively. However,the horizontal mode has a maximum length of 45 bit which is immediately handled by the bytesegmentation. This means a high resource usage due to the complexity of the multiplexer anda very inefficient memory usage since it has an element size of 45 bit of which, in worse casescenario, only 1 bit(vertical-mode) is used. This in turn makes the byte segmentation moduleto slow to place it in front of the transmission memory and thus forces the memory to be 45 bitwide.

Mode Elements to be coded Code word

Pass 0001

Horizontal 001 + M(a0a1) + M(a1a2)

Vertical

a1 just under b1 a1b1=0 1

a1 to the right of b1a1b1=1 011a1b1=2 000011a1b1=3 0000011

a1 to the left of b1a1b1=1 010a1b1=2 000010a1b1=3 0000010

Table 4.1: Huffman coding table.

The code length can be optimized, i.e. shortened by splitting the horizontal-code into twoparts. This is possible because the black and white parts are encoded sequentially(see Figure4.6) which makes it possible to apply time multiplexing and reduce the maximum code lengthto 28 bit. The time multiplexing gives a small increase in design complexity because the Tiff-run detector and Huffman encoder need to know if they are handling the first part “001 +M(a0a1)” or the second part “M(a1a2)” although they increase their total code width.

Figure 4.6: The horizontal coding scheme.

4.2.2 Byte segmentation

In addition to the fact that the first version of the ITU-G4 implementation was very heavy it didnot synthesize well for the Spartan-6 platform due to the complex byte segmentation module.The reason for this is that the synthesizer appears to be different from the Spartan-3 versionso that it does not recognize the functional purpose and optimizes the byte segmentation insuch a way that it becomes non-functional and in reaction to that optimizes all other modules,

December 29, 2014 24 Aart Mulder

Page 27: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

resulting in a non-functioning design. This problem disappeared when downscaling the codesize from 45 to 28. The optimization increased the throughput so that it can be placed in frontof the Transmission memory(see Figure 4.3). The throughput is increased even more by themulti-byte output functionality. The shift register width comes from the sum of the minimumnumber of bits left over in the shift-register plus the maximum input data width(7 + 28 = 35).

The previous Byte segmentation module 4.7 contained a bug corrupting the output whennew input is parsed on every clock cycle for more clock cycles than the FIFO size, becausethe shift register needed two clock cycles(read-FIFO and write-output) to handle data. Thisproblem has been solved by spreading the read and write operation over the rising and fallingclock edge 4.8. The parallel de-multiplexer in the middle is to merge the signals of both clockdriven processes, since two processes cannot share a signal in VHDL. This solution has madethe FIFO obsolete.

Figure 4.7: Symbolic diagram of the old byte segmentation module.

Figure 4.8: Symbolic diagram of the new byte segmentation module.

December 29, 2014 25 Aart Mulder

Page 28: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

4.2.3 Transmission memory

A requirement for making multi-byte output of the Byte segmentation feasible is that theTransmission memory can handle a variable number of input bytes which a normal memorycan not. The solution was found in a Multiple-In-Multiple-Out multiplexer followed by fourequally sized memories. The data is ordered in a sequential parallel way such that incomingdata is stored in: RAM1 pos 1, RAM2 pos 1, RAM3 pos 1, RAM4 pos 1, RAM1 pos 2, etc.

Figure 4.9: Symbolic diagram of the variable-with-input transmission memory.

4.3 Multi-platform evaluation

The design has been tested and verified to work properly on both the Xilinx Spartan 3E andSpartan 6 platform, which can be compared with the previous design that only runs on a Spartan3E due to, different synthesizer behaviour of the two models.

4.4 Client application

The client application is shown below in Figure 4.10. It is a straight forward application thatconnects to the serial port, sends a new-frame command out and waits for data to be storedin a TIFF file. When a complete image is received, it is displayed in the image viewer whichhas pan and zoom capabilities for image inspection. Images stored in the same folder are listedin the panel on the right side. The storage folder can be changed by typing the path, or usingthe button on the right that opens a path selection window. The text window below the imageviewer is a receiving data debug log. The three buttons at the bottom left are respectivelyconnect/disconnect, single mode and continuous mode. The connect/disconnect button opensthe port configuration window shown in Figure 4.11. The application contains a timeout timerof currently 10 seconds to cancel the communication if the server(FPGA) does not reply quicklyenough. The communication can be canceled at any time by pressing the Esc key.

The application is developed on the Qt platform with C++ as underlaying language and runson both Windows and Ubuntu. It is uses two different threads, one for the GUI and one forthe serial communication in order to keep the GUI responsive when communication is ongoing.The operating system specific code parts for the serial port are selected based on preprocessordefines and no specific libraries are used for it. The source can be built if one has a version ofQt installed, so far only tested with 4.8 but according to the Qt community any version should

December 29, 2014 26 Aart Mulder

Page 29: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

work. On Ubuntu the standard GCC compiler is used to built but on Windows MinGW orVisual Studio must be installed.

Figure 4.10: The client application.

Figure 4.11: The port selection window.

December 29, 2014 27 Aart Mulder

Page 30: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 5

Results

Table 5.1 shows the maximum clock speed of every module. The Fax4 module that defines theTIFF-runs is nearly two times slower than all other modules except for the Capture managerand CamSim, but they instantiate the Fax4 module. The relatively low clock speed is dueto the fact that the ITU-G4 standard is developed for software implementation and not forhardware/FPGA.

Fax4 Huffman Byte seg-mentation

Transmissionmemory

Serial port Capturemanager

CamSim

37.5MHz 104.6MHz 60.4MHz 187.6MHz 187.6MHz 37.6MHz 39.8MHz

Table 5.1: Maximum clock speed of all the submodules in the design for a Spartan-3E 1200.

The BRAM utilization of 9 shown in Table 5.2 decreases to 5 if the transmission memory isneglected. The 5 BRAMs are occupied by the FIFO in the byte segmentation(1 BRAM), thetransition buffer in the Fax4(2 BRAMs) and the Huffman table which uses two BRAMs in ROMmode. The minimum amount of BRAMs used by the transmission memory module is 4 due thefour 8-bit parallel memories as shown in Figure 4.9. A Spartan-3E BRAM in 8-bit mode has2048 memory positions which results in a minimum transmission memory size of 2048×4 = 8192bytes. However, for small memory requirements it is preferable to use distributed RAM.

Spartan-3E 1200 Proposed(v2) Proposed(v1) Published[12]

On chip Used Used Used

Number of slices 8672 1526 17% 7452 85% 1872 21%

Number of slice Flip Flops 17344 716 4% 672 3% 722 4%

Number of 4 input LUTs 17344 2922 16% 14235 82% 3566 20%

Number used as logic 2912 14235 3560

Number used as shiftregisters

10 11 6

Number of IOs 35 31 33

Number of bonded IOBs 250 24 9% 25 10% 32 12%

Number of BRAMs 28 9 32% 28 100% 5 17%

Number GCLKs 24 1 4% 2 8% 2 8%

Table 5.2: Comparison of the device utilization for the Spartan-3E 1200.

December 29, 2014 28 Aart Mulder

Page 31: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Spartan-3E 1200 Proposed(v2) Proposed(v1) Published [12]

Maximum frequency 38.697MHz 42.387MHz 81.024MHz

Minimum input arrival time before clock 8.055ns 6.968ns 4.028ns

Maximum output required time after clock 7.155ns 6.227ns 5.273ns

Maximum combinatorial path delay 6.429ns 5.989ns 5.310ns

Table 5.3: Comparison of the timing results for the Spartan-3E 1200.

Spartan-6 LX45 Proposed(v2)

On chip Used Used(%)

Number of slice registers 54576 743 1

Number of slice LUTs 27288 3959 14

Number used as logic 27288 3949 14

Number used as memory 6408 10 1

Number of IOs 48

Number of bonded IOBs 218 22 10

Number of BRAMs 116 5 4

Number GCLKs 16 1 6

Table 5.4: Device utilization for the Spartan-6 LX45.

Spartan-6 LX45 Proposed(v2)

Maximum frequency 54.135MHz

Minimum input arrival time before clock 4.731ns

Maximum output required time after clock 6.544ns

Maximum combinatorial path delay No path found

Table 5.5: Timing results for the Spartan-6 LX45.

December 29, 2014 29 Aart Mulder

Page 32: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

The memory size of 8192 in proposed version 1 and 8192 in proposed version 2 cannot immedi-ately be compared since the memories are positioned differently in both designs and thereforehave different data sizes. In version 1 it has space for 8192 TIFF transitions which occupies 26BRAMs in contrast to 8192 bytes of segmented TIFF transitions in version 2 which occupies 4BRAMs. It depends on the transition type(horizontal, vertical, pass) how many bits a transitionuses, thus, it is image specific.

Figure 5.1b proves the improvement in memory usage, but the proposed v2 uses 82% lesstransmission memory than v1 for this particular image, while even lowering the logic usagewith over 75% against a 10% decreased clock speed.

(a) Typical ITU-G4 test image. (b) Compressed image with cropping side effect.

Figure 5.1: The ITU-G4 compressed test image with cropping effect due to memory size of 6000 forproposed v1 and 1000 for proposed v2.

Table 5.6 shows the comparison in terms of energy per frame with the published platformbeing a dedicated circuit board, whereas the proposed platform is an evaluation board withunnecessary component and the camera running on 5V instead of 3.3V.

Compressionalgorithm

Resolution(W×H)

Power(W)

Framerate(FPS)

Energyperframe(mJ)

Logicsused(%)

BRAM

Proposed-v2 ITU-G4 752×480 0.9 60 15 17 5

Published[12]

ITU-G4 640×400 0.67 48 13.9 35 3

Published[1]

BACIC N/A 0.55 5 109 92 N/A

Table 5.6: Energy comparison of the systems.

December 29, 2014 30 Aart Mulder

Page 33: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 6

Discussion

(a) Original (b) Proposed (c) Published [12]

Figure 6.1: Design comparison in a real world application.

Splitting up the horizontal code of 45 bit into two time-multiplexed codes of 28 bit has resultedin multiple advantages and one small disadvantage. The disadvantage is that the maximumclock frequency has decreased by 10%, however, the BRAM and logic usage have decreased by80% and 75% respectively. Another advantage compared to earlier publications is that thisversion runs well on multiple platforms. Development of the debugger has taken some timebut also saved time during failure search and has in overall led to a shorter implementationtime. Especially in the stage of finding failures that only appear in hardware and not duringsimulation.

Figure 6.1 shows that this design maintains a full ITU-G4 implementation whereas the comparedSENTIOF-CAM system gives incorrect output. One can argue that an industrial vision setupdoes not have to deal with such complicated and noisy images because of a better opticalsystem and morphological operations, but the aim in this project was to create a full ITU-G4implementation that is robust and functions in a real world situation. The advantage of thisapproach is that the design can be easily be adapted to any application. The CENTIOF-CAMsystem does work properly for the easy image set and for the medium set as well when theyare preprocessed with a morphological operation. The disadvantage of morphology is that thefingers at the tip of the wings of the eagle in Figure 6.2a disappear which is unfortunate as theyare of great importance for bird classification.

December 29, 2014 31 Aart Mulder

Page 34: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

(a) Medium 2 (b) Complex 1 (c) Complex 2

(d) Complex 3 (e) Complex 4 (f) Complex 5

Figure 6.2: Compression result of the complex image set with 100% success rate.

Table 6.1 shows the memory requirements for the different image sets. From here the systemcan be divided into three different setups as referred to the Methodology chapter with a lowmemory usage version for the groups easy and medium, a full memory usage version for thecomplex images, and finally the rapid prototyping version that contains a debug interface andhas hardware-in-the-loop functionality.

Image type Max size in bytes Max BRAM utilization

Easy 600 bytes Preferably distributed RAM

Medium 1000 bytes Preferably distributed RAM

Complex 12000 bytes 8 BRAMs

Table 6.1: Memory usage per image group.

In relation to the BACIC algorithm - discussed in the two articles [9] and [1] - the ITU-G4algorithm performs considerably lower in terms of compression ratio. However, it compressesinstantly, i.e. the image is compressed 4 clock cycles after the last pixel has left the camerawhereas the BACIC algorithm needs to run multiple sequential stages. First, to collect thestatistics and second, to create the codes. For the Low-Latency Greedy Flipping UtilizingForgetful Error Diffusion method it needs to two full runs over the complete image beforecreating the codes. Looking at this on the perspective of time, the FPGA awake-time is at least2 times shorter for the ITU-G4 than for the BACIC algorithm. In terms of energy usage thesaving depends on the device utilization as well.

December 29, 2014 32 Aart Mulder

Page 35: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

An interesting fact noticed while reading the different articles is that the compression algo-rithm proposed by every single article always performs better than others. This brings up thediscussion of how reliable scientific articles are. Furthermore, it is a fact that there is not oneideal algorithm that performs thr best in every situation. While reading articles, one mustexamine thoroughly what kind of patterns are used in the test images. Among the ones veri-fied, JPEG2000 has the widest application area, looking at the compression ratio. According toarticle [1], BACIC has a compression ratio which is at 8 times higher than ITU-G4. However,this result is based on Dot Dithered test images which are very specific.

The ITU-G4 implementation is available for download at opencores.com [18].

December 29, 2014 33 Aart Mulder

Page 36: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 7

Conclusion

This thesis work has fullfilled 3 out of the 4 main tasks. The debug interface has been successfullyimplemented and immediately proved its usefulness in finding errors that showed up whenworking on the other tasks. The Fax4 and byte segmentation modules have been optimized insuch a way that the device utilization has reduced by 75%, the BRAM usage by up to 80% atthe cost of a 10% clock speed decrease. The main reason for the great optimizations is reductionof the maximum code length from 45 to 28 which considerably reduces the complexity of the(de)multiplexers, not for the programmer but for the synthesizer. Furthermore the design hasbeen verified to function properly on the Spartan 6 platform as well.

The used approach of first designing and/or optimizing the block diagrams suited for digitaldesign and second translating this into a VHDL code has resulted in shorter development time.Initially, designing of the block diagrams took a considerable amount of time. However, thistime many times over is saved at the coding stage because problems where tackled at an earlierstage.

The aim of this project to optimize the ITU-G4 implementation of an earlier work and designof a debug interface and camera input emulator has been successfully completed within a timeof double the amount of a 30 credit thesis work. Furthermore a proper functioning clientapplication that handles the transmitted ITU-G4 stream has been developed. The test provesthat, when comparing the simulation and hardware results, the ITU-G4 compression works asit should.

The developed camera emulator could be of good use for further development of the eagleproject where stereo vision is going to be used. Pre-designed images by for example Matlabcould be streamed to both systems. Thereby the segmentation, recognition and communicationbetween both FPGA platforms can be tested while inputting predefined images. This creates asituation that is as close as possible to reality while working in a controlled environment.

What I have learned most from this thesis work is to never give up and put the project asidebecause there were times when I was stuck and no-one could help me out. The solution wasmost often to take a short break and then an answer would appear.

December 29, 2014 34 Aart Mulder

Page 37: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Chapter 8

Future work

• More intense testing protocol.

• Region of interrest coding(RIO).

• Change coding.

• Bit plane encoding for gray scale images, and possibly colour images.

• Extending the debug PC application to support video files as input.

December 29, 2014 35 Aart Mulder

Page 38: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Appendices

December 29, 2014 36 Aart Mulder

Page 39: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

December 29, 2014 37 Aart Mulder

Page 40: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Appendix A

Set of test images

(a) Easy 1 (b) Medium 1 (c) Complex 1

(d) Easy 2 (e) Medium 2 (f) Complex 2

(g) Easy 3 (h) Medium 3 (i) Complex 3

(j) Easy 4 (k) Medium 4 (l) Complex 4

(m) Easy 5 (n) Medium 5 (o) Complex 5

December 29, 2014 38 Aart Mulder

Page 41: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Appendix B

Huffman tables

White run length Code word Black run length Code word

64 11011 64 0000001111128 10010 128 000011001000192 010111 192 000011001001256 0110111 256 000001011011320 00110110 320 000000110011384 00110111 384 000000110100448 01100100 448 000000110101512 01100101 512 0000001101100576 01101000 576 0000001101101640 01100111 640 0000001001010704 011001100 704 0000001001011768 011001101 768 0000001001100832 011010010 832 0000001001101896 011010011 896 0000001110010960 011010100 960 00000011100111024 011010101 1024 00000011101001088 011010110 1088 00000011101011152 011010111 1152 00000011101101216 011011000 1216 00000011101111280 011011001 1280 00000010100101344 011011010 1344 00000010100111408 011011011 1408 00000010101001472 010011000 1472 00000010101011536 010011001 1536 00000010110101600 010011010 1600 00000010110111664 011000 1664 00000011001001728 010011011 1728 0000001100101

Table B.1: Make-up codes.

[17]

December 29, 2014 39 Aart Mulder

Page 42: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

White run length Code word Black run length Code word

0 00110101 0 00001101111 000111 1 0102 0111 2 113 1000 3 104 1011 4 0115 1100 5 00116 1110 6 00107 1111 7 000118 10011 8 0001019 10100 9 00010010 00111 10 000010011 01000 11 000010112 001000 12 000011113 000011 13 0000010014 110100 14 0000011115 110101 15 00001100016 101010 16 000001011117 101011 17 000001100018 0100111 18 000000100019 0001100 19 0000110011120 0001000 20 0000110100021 0010111 21 0000110110022 0000011 22 0000011011123 0000100 23 0000010100024 0101000 24 0000001011125 0101011 25 0000001100026 0010011 26 00001100101027 0100100 27 00001100101128 0011000 28 00001100110029 00000010 29 00001100110130 00000011 30 00000110100031 00011010 31 00000110100132 00011011 32 00000110101033 00010010 33 00000110101134 00010011 34 00001101001035 00010100 35 00001101001136 00010101 36 00001101010037 00010110 37 00001101010138 00010111 38 00001101011039 00101000 39 00001101011140 00101001 40 00000110110041 00101010 41 00000110110142 00101011 42 00001101101043 00101100 43 00001101101144 00101101 44 00000101010045 00000100 45 00000101010146 00000101 46 00000101011047 00001010 47 00000101011148 00001011 48 00000110010049 01010010 49 00000110010150 01010011 50 00000101001051 01010100 51 00000101001152 01010101 52 00000010010053 00100100 53 00000011011154 00100101 54 00000011100055 01011000 55 00000010011156 01011001 56 00000010100057 01011010 57 00000101100058 01011011 58 00000101100159 01001010 59 00000010101160 01001011 60 00000010110061 00110010 61 00000101101062 00110011 62 00000110011063 00110100 63 000001100111

Table B.2: Termination codes.

[17]

December 29, 2014 40 Aart Mulder

Page 43: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

December 29, 2014 41 Aart Mulder

Page 44: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Appendix C

Debugger diagram

Figure C.1: Data path of the Debugger.

December 29, 2014 42 Aart Mulder

Page 45: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

December 29, 2014 43 Aart Mulder

Page 46: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Appendix D

Transmission memory diagram

Figure D.1: Data path of the transmission memory module.

December 29, 2014 44 Aart Mulder

Page 47: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Appendix E

Source code

The source code is handed in as a zip file together with the report.[21] [4] [12] [9] [1] [13] [11] [21] [4] [15] [8] [14] [3] [5] [10] [22]

December 29, 2014 45 Aart Mulder

Page 48: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

Bibliography

[1] J. Pyle A. Savakis, M. Lukowiak. FPGA implementation of a lossless to lossy bitonal imagecompression system. 2008 15th International Conference on Mixed Design of IntegratedCircuits and Systems, June 2008, pp.563-566, June 19, 2008.

[2] Ticku Acharya and Ping-Sing Tsai. JPEG2000 Standard for Image Compression: Concepts,Algorithms and VLSI Architectures. Wiley, 2004, ISBN: 0-471-48422-9.

[3] M. Ahmadvand and A. Ezhdehakosh. A New Pipelined Architecture for JPEG2000 MQ-Coder. Proceedings of the World Congress on Engineering and Computer Science 2012 VolII WCECS 2012, October 24-26, 2012, San Francisco, USA.

[4] Member IEEE Alireza Aminlou, Hossein Badakhshannoory, M.R. Hashemi, and Mem-ber IEEE Omid Fatemi. A New Discrete Wavelet Transform Architecture with MinimumResource Requirements. 2006 IEEE International Conference on Electro/Information Tech-nology, May 2006, pp.470-473.

[5] Senior Member IEEE Charilaos Christopoulos, Senior Member IEEE Athanassios Sko-dras, and Member IEEE Touradj Ebrahimi. The JPEG2000 still image coding system: anoverview. IEEE Transactions on Consumer Electronics, Vol. 46, No. 4, November 2000.

[6] Yahoo Education. Facsimile definition according to Yahoo Education. 9 Januari, 2013.

[7] The Free Encyclopedia. Facsimile definition according to The Free Dictionary. 9 Januari,2013.

[8] Charles G. Boncelet Jr. Block Arithmetic Coding for Markov Sources. IEEE Transactionson Information Theory, Sept. 1991.

[9] IEEE member Maire D. Reavy and IEEE member Charles G. Boncelet. An algorithmfor compression of bilevel images. IEEE Transactions on Image Processing, May 2001,Vol.10(5), pp.669-676, May 5, 2001.

[10] Huaqing Mao, Zhiwen Hu, Li Zhu, and Hang Qin. PNG File Decoding Optimization BasedEmbedded System. The Paper is partially supported by Project of Wenzhou Science &Technology Bureau (Grant No. S20110012) and the excellent scientists Programs of HubeiProvince in China, September 2012.

[11] S. Medouakh and Algeria Z-E. Baarir, University of Mohamed Khider of Kiskra. Studyof the Standard JPEG2000 in Image Compression. International Journal of ComputerApplications, 2011, Vol.18(1), p.27, March 1, 2011.

December 29, 2014 46 Aart Mulder

Page 49: Robust and flexible hardware implementation of ITU-g4 ...774841/FULLTEXT01.pdf · 5.1 The ITU-G4 compressed test image with cropping e ect due to memory size of ... image segmentation,

Mid Sweden UniversityDepartment of ElectronicsDesign.

[12] Mattias O’Nils, Muhammad Ihmran, Khurram Shahzad, Naeem Ahmad, Najeem Lawal,and Bengt Oelmann. Energy Efficient SRAM FPGA based Wireless Vision Sensor Node:SENTIOF-CAM. 20 Januari, 2013.

[13] Delores M. Etter Robert W. Ives and Thad B. Welch. Bit-Plane Compression of High Dy-namic Range SAR Imaginary. IEEE CCECE2002, May 2002, Vol.1, pp.347-352, November3, 2002.

[14] S. Sahami and M.G. Shayesteh. Bi-level image compression technique using neural net-works. ET Image Process., 2012, Vol. 6, Iss. 5, pp. 496 –506 doi: 10.1049/iet-ipr.2011.0079,2012.

[15] Andreas E. Savakis. EVALUATION OF LOSSLESS COMPRESSION METHODS FORGRAY SCALE DOCUMENT IMAGES. Proceedings 2000 International Conference onImage Processing vol:1 sidor:136 -139, September 2000.

[16] David S. Taubman and Michael W. Marcellin. JPEG2000 Image Compression Funda-mentals, Standards and Practice. Springer Science+Business Media, LLC, 2002, ISBN:978-1-4613-5245-7.

[17] International Telecommunication Union(ITU). Itu-t (ccitt) t.6. facsimile coding schemesand coding control functions for group 4 facsimile apparatus, 6 December, 2012.

[18] http://opencores.org/project,bw_tiff_compression. Itu-g4 implementation down-load link at the opencores.com website, 9 Januari, 2013.

[19] http://www.swaton.ukfsn.org/docs/fpgalink/vhdl_paper.pdf. Block diagram of thefpgalink communication interface, 10 Februari, 2014.

[20] www.sigasi.com. The Sigasi VHDL Eclipse plugin. 11 Februari, 2014.

[21] K. Varma, A. E. Bell, H. B. Damecharia, and J. E. Carletta. A Fast JPEG2000 EBCOTTIER-1 Architecture that preserves coding efficiency. 2006 International Conference onImage Processing, Oct. 2006, pp.3297-3300.

[22] Zhuo Wei, Zhong Shu, and YaJuan Xie. Image lossless compression and secure transmissionsystem based on integer wavelet transform. Second International Conference on MultiMediaand Information Technology, September 2012.

December 29, 2014 47 Aart Mulder