theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank...

143
Dissertation Convergence Analysis of Distributed Consensus Algorithms ausgeführt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Wissenschaften eingereicht an der Technischen Universität Wien Fakultät für Elektrotechnik von Ondrej Slučiak Poštová 6, SK–81106 Bratislava, Slowakei geboren am 22. Jänner 1984 in Bratislava (SK) Matrikelnummer: 0828844 Wien, im Juni 2013

Transcript of theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank...

Page 1: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Dissertation

Convergence Analysis of Distributed

Consensus Algorithms

ausgeführt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen

Wissenschaften

eingereicht an der Technischen Universität Wien

Fakultät für Elektrotechnik

von

Ondrej SlučiakPoštová 6,

SK–81106 Bratislava, Slowakei

geboren am 22. Jänner 1984 in Bratislava (SK)

Matrikelnummer: 0828844

Wien, im Juni 2013

Page 2: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 3: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Die Begutachtung dieser Arbeit erfolgte durch:

1. Univ.Prof. Dipl.-Ing. Dr.techn. Markus Rupp

Institute of Telecommunications

Technische Universität Wien

2. Assoz. Prof. Dipl.-Ing. Dr. Wilfried Gansterer, Privatdoz.

Research Group Theory and Applications of Algorithms

Universität Wien

Page 4: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 5: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Acknowledgements

First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance,

support and help I have received from him throughout all the years. I also thank him for the attitude

and freedom in my research, which I always appreciated and which I always fully enjoyed.

I am also grateful to Prof. Wilfried Gansterer for his expert and detailed insight during our joint

work.

A very special thank you goes to Carolina del Socorro Reyes Membreno, without whom the years

spent at the Institute would not be so beautiful and from whom I learnt more than from anyone

else. I enjoyed sharing every minute with her inside the office as well as outside.

I would also like to thank Ondrej Hlinka for many interesting and fruitful debates in the train

between Vienna and Bratislava as well as at other occasions. The joint work, that emerged from

those debates, has been very inspiring to me and I am grateful to that.

I am also very grateful to Hana Straková for the brief moments of the “small but our” cooperation.

Financial support for this research came from the FWF under Award S10609 and S10611 within

the National Research Network SISE and is gratefully acknowledged.

My thanks go to all my colleagues, friends, co-runners, co-sailors, with whom I had the priviledge to

spend wonderful time, especially to Geetha, Arrate, Edhem, Thibault, Robert, Qi, Stefan, Markus,

Stefan, Florian, Gregor.

I have to also thank my dear Anička for her encouragement, support, and for being there for me

during the last months of writing my thesis.

Last but not least, I would like to thank my family for their support and taking care about the

important things, while I was “fooling around” in Vienna.

Page 6: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Bez cieľa bežím Líščím údolím,v studenom šere chytám ľahký krok.Ešte za tamtú zákrutu,ešte po tento roh –neviem, kam až to vydržím.Zrazu, rozmazaný, pohľadom cez pot,blíži sa ku mne svietiaci dom starých priateľov.A tak má môj beh cieľ.

Ivan Štrpka: Dežo Ursíny, Silvestrovský beh, album 4/4, 1983 (Opus).

Running without a goal through the Fox Valley,in the cold dusk I’m catching a light pace.Till that curve,till this corner –I don’t know till when I can hold.Suddenly, blurry, looking through the sweat,a shining house of my old friends is approaching me.And thus, my run has a goal.

Page 7: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Abstract

Inspired by new emerging technologies and networks of devices with high collective com-putational power, I focus my work on the problematics of distributed algorithms. While eachdevice runs a relatively simple algorithm with low complexity, the group of interconnectedunits (agents) determines a behavior of high complexity. Typically, such units have their ownmemory and processing unit, and are interconnected and capable to exchange information witheach other. More specifically, this work is focused on the distributed consensus algorithms.Such algorithms allow the agents to coordinate their behaviour and to distributively finda common agreement (consensus). To understand and analyze their behaviour, it is neces-sary to analyze the convergence of the consensus algorithm, i.e., under which conditions thealgorithm reaches a consensus and under which it does not. Naturally, the communicationchannel can change and the agents may function asynchronously and improperly. All sucherrors may lead to severe problems and influence the reached consensus. Note that the tar-get platform of the algorithms presented in this thesis is a wireless sensor network. Never-theless, since the area of potential applications of distributed algorithms is, in general, muchbroader, the results of this thesis may be applicable also elsewhere, without significant changes.The work can be divided into two main parts.

First, I focus on the convergence analysis of the distributed consensus algorithms.At the beginning, I review the spectral graph theory and classical results on the convergenceof the average consensus algorithm. Next, I propose a unifying framework for describing dis-tributed algorithms. In this framework, I analyze the behaviour of a quantized consensus al-gorithm and show bounds on the quantization error. Furthermore, I discuss the asynchronousconsensus algorithms and derive necessary conditions on the convergence. Later on, I de-rive also bounds on the mixing weights of such asynchronous algorithms. The asynchronousconsensus algorithms are analyzed by the concept of so-called “state-transition” matrices.The first part of the thesis is concluded with the analysis of a dynamic consensus algorithm,where I prove novel bounds on the convergence time and rate, and propose a generalizeddynamic consensus algorithm.

In the second part, I propose two novel distributed algorithms which directly utilize theconsensus algorithms discussed in the first part. The “likelihood consensus” algorithm allowsto distributively compute a joint-likelihood function, and thus, to distributively solve statisticalinference problems (e.g., target tracking). The distributed Gram-Schmidt orthogonalizationalgorithm, on the other hand, can find a set of orthogonal vectors from general vectors stored atthe nodes. As an application of the orthogonalization algorithm, I further propose algorithmsfor estimating the size of a network.

Page 8: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Kurzfassung

Motiviert durch eine Vielzahl neuartiger Technologien und Netzen aus Geräten mit hoherkollektiver Leistungsfähigkeit, konzentriere ich mich in der vorliegenden Arbeit auf die Pro-bleme verteilter Algorithmen. Währende einzelne Teilnehmer dieser Netze nur sehr beschränkteFähigkeiten aufweisen, wird die hohe Gesamtleistungsfähigkeit durch die Gruppe bestimmt.Üblicherweise verfügen die Geräte über einen eigenen Speicher und eine Prozessoreinheit,und können über ihre Verbindungen Informationen austauschen. Genau genommen kon-zentriert sich die vorliegende Arbeit auf verteilte Konsens-Algorithmen. Diese Algorithmenerlauben es den Teilnehmern, ihr Verhalten zu koordinieren und in einen gemeinschaftlichenKonsens zu finden. Um ihr Verhalten besser zu verstehen, ist es notwendig, das Konvergenzver-halten der Algorithmen zu analysieren, das heißt, unter welchen Umständen erzielen die Teil-nehmer des Netzes einen Konsens und unter welchen nicht. Es ist dabei natürlich anzunehmen,dass die Verbindungen sich über der Zeit verändern, die Teilnehmer asynchron kommunizierenund Einzelne sich möglicherweise auch fehlerhaft verhalten. All diese Fehler können zu ernst-haften Problemen führen und den Konsens beeinflussen. Auch wenn das betrachtete Netz einFunksensornetz darstellt, ist die Anwendung der verteilten Algorithmen viel breiter, die erziel-ten Ergebnisse mögen auch auf andere Gebiete übertragbar sein. Die vorliegende Arbeit kannin zwei Abschnitte geteilt werden.

Zum einen konzentriere ich mich auf die Konvergenzanalyse verteilter Konsens Algorith-men. Ich beginne mit der Spektralen Graphentheorie und präsentiere klassische Konvergenz-ergebnisse des Mittelnden Konsens Algorithmus. Als nächstes schlage ich eine allgemeineMethode zur Beschreibung verteilter Algorithmen vor. Mit dieser Methode analysiere ichdas Verhalten quantisierter Konsens Algorithmen und leite Grenzen des Quantisierungsfehlersher. Weiters diskutiere ich asynchrone Methoden und leite notwendige Konvergenzbedingun-gen her. Später gebe ich auch Grenzen der Gewichte solcher asynchronen Algorithmen an.Die asynchronen Algorithmen werden mit Hilfe so-genannter Zustandsmatrizen durchgeführt.Der erste Teil der vorliegenden Arbeit wird mit der Analyse des dynamischen Konsens Algo-rithmus abgeschlossen, wobei ich neuartige Grenzen zur Konvergenzrate und Konvergenzzeitaufzeige und einen allgemeinen dynamischen Konsens Algorithmus vorschlage.

Im zweiten Teil der Arbeit schlage ich zwei neue verteilte Algorithmen vor, die auf denErkenntnissen des ersten Teils beruhen. Der so-genannte “Likelihood Consensus” Algorith-mus erlaubt es, eine Joint-Likelihood Funktion verteilt zu berechnen und damit statistischeInferenz-Methoden (beispielsweise zur Zielnachführung) zu berechnen. Der verteilte Gram-Schmidt Orthogonalisierungs-Algorithmus ist in der Lage aus einer Menge von Vektoren,die auf den Knoten des Netzes verteilt vorliegt, einen Satz orthogonaler Vektoren zu finden.Eine Anwendung hierzu besteht in der Schätzung der Netzgröße wie weiter ausgeführt wird.

Page 9: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Contents

Contents i

1 Introduction 1

1.1 Outline and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 General notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Network notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Convergence of Distributed Consensus Algorithms 9

2.1 Average consensus algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.1 Convergence of the average consensus algorithm . . . . . . . . . . . . . 122.1.2 Selection of weight matrixW . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 General framework for describing distributed algorithms . . . . . . . . . . . . . 152.2.1 Homogeneously distributed algorithms . . . . . . . . . . . . . . . . . . . 172.2.2 Linear HDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Convergence of quantized consensus algorithms . . . . . . . . . . . . . . . . . . 202.3.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.2 Steady state of Censi’s algorithm . . . . . . . . . . . . . . . . . . . . . . 242.3.3 Bounds on the drift from the mean . . . . . . . . . . . . . . . . . . . . . 27

2.4 Analysis of convergence: Algebraic approach . . . . . . . . . . . . . . . . . . . . 302.4.1 Asynchronous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.2 State update analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4.3 Convergence analysis based on state matrix S . . . . . . . . . . . . . . . 342.4.4 When a node fails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.5 Almost sure convergence: Relaxed projection mappings . . . . . . . . . . . . . 372.5.1 Concept of projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5.2 Application on various consensus algorithms . . . . . . . . . . . . . . . 392.5.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

i

Page 10: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

ii CONTENTS

2.6 Dynamic average consensus algorithm . . . . . . . . . . . . . . . . . . . . . . . 432.6.1 Definition and proof of convergence . . . . . . . . . . . . . . . . . . . . . 432.6.2 Bounds on convergence time and rate . . . . . . . . . . . . . . . . . . . 452.6.3 Dynamic consensus algorithm with memory . . . . . . . . . . . . . . . . 49

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3 Likelihood Consensus 53

3.1 System model and sequential Bayesian estimation . . . . . . . . . . . . . . . . . 543.2 Approximation of the joint likelihood function . . . . . . . . . . . . . . . . . . . 56

3.2.1 Least squares approximation . . . . . . . . . . . . . . . . . . . . . . . . 583.3 Likelihood consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3.1 Distributed calculation of the approximate JLF – the LC algorithm . . 593.3.2 Distributed calculation of the exact JLF . . . . . . . . . . . . . . . . . . 61

3.4 Sequential likelihood consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.4.1 The SLC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.5 Distributed target tracking using likelihood consensus . . . . . . . . . . . . . . 643.5.1 LC-based distributed particle filter . . . . . . . . . . . . . . . . . . . . . 643.5.2 Communication requirements . . . . . . . . . . . . . . . . . . . . . . . . 663.5.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Distributed Gram-Schmidt Orthogonalization 75

4.1 Gram-Schmidt orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 764.1.1 Existing distributed methods . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 DC–CGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.1 Orthogonality and factorization error . . . . . . . . . . . . . . . . . . . . 814.3.2 Initial data distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.3.3 Numerical sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.3.4 Robustness to link failures . . . . . . . . . . . . . . . . . . . . . . . . . . 884.3.5 Performance comparison with existing algorithms . . . . . . . . . . . . . 89

4.4 Distributed network size estimation . . . . . . . . . . . . . . . . . . . . . . . . . 924.4.1 Determining network size . . . . . . . . . . . . . . . . . . . . . . . . . . 924.4.2 Distributed estimation of network size . . . . . . . . . . . . . . . . . . . 924.4.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5 Conclusions and Outlook 101

5.1 Outlook and future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

A Diffusion and Average Consensus 107

Page 11: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

CONTENTS iii

B Spectral Norm 109

B.1 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109B.1.1 Induced norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109B.1.2 Matrix norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

B.2 Spectral norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

C Perron-Frobenius Theorem 113

D Particle Filter 117

Bibliography 121

Page 12: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 13: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Chapter 1

Introduction

Distributed computing environments are studied in literature in many, often very differing,research areas and from various viewpoints. From the point of view of information technology,hardware design, and computer networks, a distributed system is usually considered to bea system of independently working interconnected computers, computer clusters, peer-to-peernetworks, with independent memory and interprocess communication with messages passingbetween the processes. Furthermore, the processes must interact with one another to meeta common goal [1, 2].Curiously, the tendency to find a common goal in a distributed fashion can be observed in

nature in many various scenarios [3–7] (Fig. 1.1) and seems to be an integral part of physicalprocesses of our world (see Appendix A).

Source: http://disruptivethinkers.blogspot.sk/2012/

01/swarms-you-can-run-but-you-cant-hide.html

A birds swarm avoiding a predator.

Source: http://yournews.cbc.ca/mediadetail/

7255901-New-Canadian-Record-for-larges (Photo fromthe Parachutisme Nouvel Air Facebook page)

Skydivers creating a formation.

Source: Janka Marková, www.jancaphotography.weebly.com

Diffusion.

Figure 1.1: Examples of distributed algorithms in nature.

Distributed algorithms describe and determine the behaviour of all such networks whichinterconnect independently working units – animals, humans, atoms, computers. Also, oneof the areas of computer science and hardware platforms, where distributed algorithms areessential and necessary for a proper functioning, are Wireless Sensor Networks (WSNs) [8–11].Wireless sensors are typically battery-equipped devices which monitor physical quantities, suchas temperature, light, sound, pressure. They have usually a short life-time, low computationalpower, and a limited communication range. Throughout my thesis, I aim especially at wirelesssensor networks as a potential target hardware platform of the proposed distributed algorithms.

1

Page 14: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2 CHAPTER 1. INTRODUCTION

Figure 1.2: Memsic c© wireless sensor. Frequency band: 2405 MHz – 2480 MHz; Flash memory:128 kB; Transmit data rate: 250 kbit/s; RF power: 2mW; Operating system: Tiny OS.

Therefore, I often name working units as “sensors” (see Fig. 1.2). However, with the recentadvances in hardware technology, increased popularity of smart mobile phones equipped withmany types of sensors, and future Network-on-Chip paradigms, distributed algorithms havebecome even more demanding and necessary. The findings of this work may be similarlyapplicable in such applications. Therefore, the term “sensor” does not strictly represent onlya sensor (node) in a wireless sensor network, but I use the term interchangeably, meaning anyindependently working agent/platform.Since the distributed algorithms work on some underlying communication network which

defines the connections between the sensors (nodes), the topology influences the properties ofsuch algorithm. From a topology point of view, there exist two basic approaches:

centralized – sensors measure and collect information from the environment, and then send thedata to a central (fusion) node. The fusion center may control the sensors and processthe measured data. Intuitively, the underlying communication topology is a star, andthus, there is a central point of failure. Though this is disadvantageous, the sensorsmay be very simple, by default asynchronous, and the algorithms do not have to be verysophisticated. Thus, this approach may be effective in some scenarios.

decentralized (fully distributed) – each sensor (node) has a computing processing capabilityand the sensors may be often mutually exchanged (unless some heterogeneous networkis considered). There is no single point of failure and by adding and removing nodes inthe network, this approach is fully scalable. However, the design of algorithms may becomplicated and often synchronicity is required.

In my thesis, I will distinguish two approaches of synchronization of nodes in a network:

asynchronous – each node has its local inner clock, which determines the sending and receivingof messages, computation cycles, and others. Such approach is more robust, but moredifficult for designing reliable algorithms. Typical example of asynchronous distributedalgorithms are so-called gossip (epidemic) algorithms [12].

synchronous – all nodes work according to a global clock. This clock can either be set andmaintained by some synchronizing node, or the nodes are mutually synchronized by

Page 15: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

1.1. OUTLINE AND CONTRIBUTIONS 3

themselves. Algorithms assuming such behaviour are typically easier to analyze, but verysensitive to any asynchronicity. A typical example of synchronous distributed algorithmsare so-called consensus algorithms [13].

Usually, distributed aggregation algorithms (gossip/consensus) serve as basic building blocksfor more sophisticated algorithms, e.g., load balancing [14], clock synchronization [15], targettracking [16], subspace estimation [17], radio resource allocation [18].To summarize, when considering distributed algorithms, it is necessary to keep in mind

that they possess many advantages, but also disadvantages. Typical advantages of distributedprocessing include:

• large distributed computational power,• fault-tolerance/robustness – the system is capable to replace the functionality of error-neous nodes with functioning nodes,

• scalable,• economically efficient.

On the other hand, disadvantages often are:

• limited capabilities of nodes,• limited transmission capacity,• often heterogeneous, dynamic networks,• complicated implementation and analysis of algorithms (e.g., data flow).

Later on, my thesis is focused mostly on fully distributed algorithms with synchronouslyworking wireless sensors. Also, as briefly mentioned in the beginning, those distributed algo-rithms which converge to a common goal – consensus – will be the main topic of this thesis.

1.1 Outline and Contributions

This thesis is divided into five chapters, with four additional appendices. The thesis maybe further divided into two main logical parts. The first part mostly deals with consensusalgorithms and their convergence analysis, while the second part presents the applications ofthe consensus algorithms in more sophisticated distributed algorithms.The outline of the thesis together with the main contributions is listed below. I also provide

references to my previously published work which the chapters are based upon.

• Chapter 1 is an introductory chapter, where I introduce the topic, mention my contribu-tion and motivation, as well as define the notation used in the thesis.

• Chapter 2 deals with the convergence analysis of consensus algorithms. First, I recall theclassical – spectral graph theory – approach in Sec. 2.1. In the latter Section 2.2 of thechapter, I provide a novel framework for encompassing any distributed algorithm. In thisframework I define a class of homogeneously distributed algorithms and show that the

Page 16: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4 CHAPTER 1. INTRODUCTION

consensus algorithms follows the formalism. This allows later to provide bounds (tighterthan previously known) of a quantized consensus algorithm in Sec. 2.3 (This part is basedon the previous publications: [19, 20].).

A discussion on convergence of asynchronous consensus algorithms in terms of linearspaces is provided in the subsequent Section 2.4. By defining so-called state-transitionmatrix, I provide a novel insight into necessary convergence condition in case of asyn-chronous consensus algorithms. (This part is based on the previous publication: [21].)

To provide further conditions on the mixing weights of consensus algorithms, especiallythe asynchronous ones and those with random transmissions, I use the conditions onconvergence from the so-called relaxed projection mapping algorithms, and found themixing weights corresponding to underrelaxed and overrelaxed cases (see Sec. 2.5) (Thispart is based on the previous publication: [22].)

In the last Section 2.6 of Chapter 2, I discuss a different class of consensus algorithms, so-called distributed dynamic average consensus algorithms. Here, I provide novel boundson the convergence time and rate of such algorithms, and propose a generalization of thealgorithm. (This part is based on the previous publications: [23, 24].)

• Chapter 3 presents a novel likelihood consensus algorithm which is capable to distribu-tively compute a joint likelihood function in statistical inference problems. By approx-imating the joint likelihood function in some linear base (see Sec. 3.2), it is possible toperform an average consensus algorithm on the coefficients of this approximation. Thisleads to the global knowledge of the joint likelihood function at each node – likelihoodconsensus algorithm (see Sec. 3.3). If the coefficients vary slowly, a sequential calculationof the coefficients, using the dynamic consensus algorithm is possible (see Sec. 3.4). Thisglobal knowledge, in turn, allows to do more sophisticated statistical inference algorithms,e.g., target tracking (see Sec. 3.5), which outperform other state-of-the-art algorithms inmany regards. (This part is based on the previous publications: [23, 25–27].)

• In Chapter 4, I present a novel distributed Gram-Schmidt orthogonalization algorithm,which again utilizes the dynamic consensus algorithm. By rewriting the classical Gram-Schmidt orthogonalization in an iterative manner, it is possible to compute all orthogonalcolumns and projection coefficients simultaneously (see Sec. 4.2). I provide the conver-gence proof as well as extensive simulations in Sec. 4.3. I further apply this algorithm todistributed estimation of the number of nodes in a network, without any apriori know-ledge (see Sec. 4.4). (This part is based on the previous publications: [24, 28,29].)

• Chapter 5 rounds up my thesis with conclusions and a future outlook.

This thesis contains also four appendices which provide additional information to the topic:

• Appendix A proves the connection between physical diffusion process and distributedconsensus algorithms.

• Appendix B provides the terminology and definitions of vector and matrix norms, morespecifically, aiming at the spectral norm of matrices which is the essential term in theconvergence analysis of consensus algorithms.

Page 17: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

1.1. OUTLINE AND CONTRIBUTIONS 5

• Appendix C reviews the Perron-Frobenius theorem of nonnegative matrices which is themain theorem upon which the classical spectral-graph convergence analysis of distributedconsensus algorithms is based.

• Appendix D briefly explains the concept of particle filters which provide a Monte Carloapproximation of sequential Bayesian state estimation. Particle filters are used in Chap-ter 3 in the target tracking application based on the likelihood consensus algorithms.

List of publications:

O. Slučiak, T. Hilaire, and M. Rupp. A general formalism for the analysis of distributedalgorithms. In Proc. of the 35th IEEE International Conference on Acoustics, Speech, andSignal Processing (ICASSP), pages 2890–2893, Dallas, USA, Mar. 2010

O. Slučiak and M. Rupp. Steady-state analysis of a quantized average consensus algo-rithm using state-space description. In Proc. of European Signal Processing Conference(EUSIPCO), pages 199–203, Aalborg, Denmark, Aug. 2010

O. Hlinka, O. Slučiak, F. Hlawatsch, P. M. Djuric, and M. Rupp. Likelihood consensus:Principles and application to distributed particle filtering. In Rec. of the 44th AsilomarConf. on Signals, Systems, and Computers, pages 349–353, Pacific Grove, CA, USA, Nov.2010

O. Slučiak and M. Rupp. Reaching consensus in asynchronous WSNs: Algebraic approach. InProc. of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing

(ICASSP), pages 3300–3303, Prague, Czech Rep., May 2011

O. Hlinka, O. Slučiak, F. Hlawatsch, P. M. Djuric, and M. Rupp. Distributed Gaussianparticle filtering using likelihood consensus. In Proc. of the 36th IEEE International Con-ference on Acoustics, Speech and Signal Processing (ICASSP), pages 3756–3759, Prague,Czech Rep., May 2011

O. Slučiak, O. Hlinka, M. Rupp, F. Hlawatsch, and P. M. Djuric. Sequential likelihoodconsensus and its application to distributed particle filtering with reduced communicationsand latency. In Rec. of the 45th Asilomar Conf. on Signals, Systems, and Computers, pages1766–1770, Pacific Grove, CA, USA, Nov. 2011

O. Slučiak and M. Rupp. Almost sure convergence of consensus algorithms by relaxedprojection mappings. In Proc. of Statistical Signal Processing Workshop (SSP), pages 632–635, Ann Arbor, MI, USA, Aug. 2012

O. Hlinka, O. Slučiak, F. Hlawatsch, P. M. Djuric, and M. Rupp. Likelihood consensusand its application to distributed particle filtering. IEEE Trans. on Signal Processing,60(8):4334–4349, Aug. 2012

O. Slučiak, H. Straková, M. Rupp, and W. N. Gansterer. Distributed Gram-Schmidt or-thogonalization based on dynamic consensus. In Rec. of the 46th Asilomar Conf. on Signals,Systems, and Computers, pages 1207–1211, Pacific Grove, CA, USA, Nov. 2012

O. Slučiak and M. Rupp. Network Size Estimation using Distributed Orthogonalization.IEEE Signal Processing Letters, 20(4):347–350, Apr. 2013

Page 18: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

6 CHAPTER 1. INTRODUCTION

O. Slučiak, H. Straková, M. Rupp, and W. N. Gansterer. Dynamic average consensus anddistributed orthogonalization. IEEE Trans. on Signal Processing, 2013. (submitted; [Online]Available: http://publik.tuwien.ac.at/files/PubDat 216777.pdf)

1.2 Notation

In this section, I introduce the notation which will be used throughout this thesis.

1.2.1 General notation

Throughout the thesis, I will use the following general notation.

Scalars are denoted by italic lowercase letters, e.g., x. Special letters are reserved forthe notion of time – k; and the node index – i. Next, boldface lowercase letters denotecolumn vectors, e.g., x; and boldface uppercase letters denote matrices, e.g., X. Opera-tion “” denotes an element-wise multiplication of two vectors, i.e., x y ≡ xiyi, ∀i, ormatrices, i.e., X Y ≡ xi,jyi,j , ∀i, j; operation “⊗” denotes the Kronecker product, i.e.,

A⊗B ≡

a1,1B a1,2B . . . a1,mB

a2,1B a2,2B . . . a2,mB...

. . ....

an,1B an,2B . . . an,mB

.

Moreover, especially in Chapter 4, the notation xj denotes the j-th column vector of amatrix X. Furthermore, operation [·]· “pick” an element from a vector or matrix, i.e., [xj ]idenotes the i-th element of the j-th column vector of a matrix X, i.e., [xj ]i ≡ [X]i,j ≡ xi,j .

A vector of all zeros is dentoed as 0 and a vector of all ones is 1 (size should be clear fromthe context). Matrix IN represents the identity matrix of size N (ones on the diagonal), anda matrix of all ones will be denoted as 11⊤, i.e., 11⊤ ≡ 1⊗ 1⊤.

Operation |·| represents the cardinality of a set (or, as usual, an absolute value of a scalar).Operation ‖·‖ represents the appropriate norm (see Appendix B).

1.2.2 Network notation

A network is, in general, represented by a directed graph G(k) = (V(k), E(k)), where V(k)is a set of |V(k)| vertices (nodes) and E(k) is a set of |E(k)| edges (links), which are two-element subsets of V(k), i.e., an edge is related with two ordered vertices (e = (u, v) ∈ E(k);u, v ∈ V(k)). Typically, the graph is supposed to have no self-loops (e = (u, u) /∈ E(k)) andeach element of E(k) is unique.In my thesis, I will mostly constrain myself on static networks, i.e., G(k) = (V(k), E(k))

is constant. Nevertheless, e.g., in Sections 2.4.3 and 2.5, and if stated, I will consider alsodynamic or time-varying networks, meaning that G(k) can be time-dependent, which modelthe cases of link and node failures, rewiring of nodes, or additions of nodes in the network.According to the usual practice, I will denote |V| as N .

Page 19: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

1.2. NOTATION 7

Furthermore, Pred(v) and Succ(v) denote the sets of all preceding and succeeding verticesof node v, i.e.,

Pred(v) , u ∈ V | ∃ e = (u, v) ∈ E, (1.1)

Succ(v) , w ∈ V | ∃ e = (v, w) ∈ E. (1.2)

In case of an undirected graph, the notion of neighborhood stands for the set of verticesNv =

∀u ∈ (u, v) ∈ E ∨ (v, u) ∈ E

≡ Pred(v) ∪ Succ(v).

I further associate the graph G with a numbering scheme (i.e., a bijection that associateseach node to a unique number, i.e., i ∈ 1, . . . , N), so that a node can serve as an index of a ma-trix. Then it is possible to define the ingoing-adjacency matrix of G, denoted AG∈0, 1N×N :

[AG]

i,j=

1 if (i, j) ∈ E0 if (i, j) /∈ E .

(1.3)

Since, in the most part of this thesis, the considered network (graph) does not change overtime and has all links bidirectional, the adjacency matrix is simply denoted as A.

Furthermore, an in-degree diagonal matrix of G, denoted DG ∈ RN×N by:

[DG]

i,j=

|Pred(i)| if i = j

0 if i 6= j.(1.4)

In case of an undirected static graph, the notation simplifies to D.

Here, a degree (valency) di of a node i is equal to the number of its neighbors, i.e., di ≡ |Ni|.Furthermore, the Laplacian matrix L is a matrix defined as L = D −A [30]. The Laplacianhas the following relevant properties (among others):

• L is always positive semidefinite (all eigenvalues are larger or equal to zero),

• L = M⊤M, where M is an incidence matrix describing the connections between thevertices and edges,

• L1 = 0, i.e., matrix L has at least one eigenvalue equal to 0 and an eigenvector 1,

• the smallest non-zero eigenvalue defines the spectral gap (see Sec. 2.1.1),

• the number of zero eigenvalues corresponds to the number of connected components in thegraph, i.e., if there is only one eigenvalue equal to zero, the graph is strongly connected.

A strongly connected graph is a graph where for any two vertices i, j there is a directedpath between the vertices at each time instant. A weakly connected graph is a graph where forany two vertices i, j there is a directed path between the vertices at least once during the timeiterations k. A disconnected graph is a graph with some vertices i, j for which there is nevera path connecting them.

Page 20: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

8 CHAPTER 1. INTRODUCTION

In this thesis I often consider the following network topologies:

fully connected (complete) – a network topology where all nodes are directly connected to allother nodes.

star – a network topology where all nodes are connected only to one central node. Thistopology models a network with a fusion center.

geometric – a network topology where each (randomly deployed) node is connected to all nodeswithin some radius. This topology models a wireless sensor network.

grid – a network topology where nodes are deployed on a (jittered) grid and communicate withneighbors within some radius, similarly to the geometric topology.

regular – a network topology where each node has the exact same number of neighbors (nodeshave the same degree).

ring – special case of the regular topology, where each node has degree 2.

random – topology with randomly deployed nodes (see Fig. 1.3).

Figure 1.3: Example of a communication network.

Page 21: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Chapter 2

Convergence of Distributed Consensus

Algorithms

Distributed computing environments, such as ad-hoc WSNs and computer clusters, providemany advantages over a centralized processing solution; larger computation power and robust-ness against node failures being examples [1, 2].However, comparing ad-hoc wireless sensor networks with computer clusters, it is clear that

the first mentioned solution is much more difficult to implement reliably. Since the nodes insensor networks usually have no information about the whole network topology and have onlya small computation power, and communication links are of limited bandwidth, proper dis-tributed algorithms which always converge to the desired result need to be designed carefully.Well-known algorithms that solve problems of distributed agreement are so-called gossip-

based [12, 31, 32], consensus-based [3, 13, 33], or other diverse (combined) consensus approaches[34, 35]. Although they all compute an agreement in a network in a fully distributed fashion,they differ in their approaches. While in the gossip (epidemic) algorithms, as the name suggests,the information is spread from node to node (hop-by-hop) randomly and asynchronously, theconsensus algorithms are considered to broadcast the information in a synchronous way. Sincethe main focus of this work is the consensus algorithms, I first provide the description of the ave-rage consensus algorithm in Sec. 2.1,which will be referenced throughout this thesis many times.As mentioned, since the consensus and gossip algorithms have the same goal, i.e., to com-

pute an agreement, it is of interest to describe such algorithms in a unifying framework. Also,as I will show later, these algorithms serve very well as building blocks for more sophisticatedalgorithms [17, 24, 27]. Some common “language”, which encompasses and helps to analyzedistributed algorithms in general, is therefore also of interest (see Sec. 2.2).Moving from idealized “theoretical” algorithms to their “real-world” implementations, one

can observe that, typically, all distributed algorithms assume infinite precision computation inevery node and unlimited bandwidth on communication links. However, as mentioned before,in many real-world implementations these assumptions do not hold and additional errors areintroduced. Such errors are caused by the following constraints:

• Quantized measurements – sensors have limited sensing capabilities [36, 37].• Quantized computation – sensors have limited computation precision, usually with fixed-

9

Page 22: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

10 CHAPTER 2. CONVERGENCE

point representation [38].

• Quantized communication links – sensors are able to transmit only few bits due to powerconstraints and limited bandwidth [39,40].

All these perturbations caused by quantization can dramatically change the accuracy, conver-gence speed, and stability of the distributed algorithms.It is therefore of interest to investigate the impact of these imperfections on the performance

of distributed algorithms. In order to propose solutions that are more robust and to deeplystudy also the impact of the implementation, I introduce a unifying description of distributedalgorithms (Sec. 2.2) which allows to give answers to such questions as how the quantizationinfluences the steady state of an algorithm (Sec. 2.3).Also, a typical simplification in the modeling of a WSN [13,19,41] and distributed algorithms

design [17,25,26] is that at a certain time instant tk all N nodes of the WSN are transmittingsynchronously so that a specific node i receives from all its neighbors in the neighborhood Ni

new information, i.e.,

xi(tk) = f(

xNi(tk−1)

)

, for i = 1, 2, . . . , N. (2.1)

The notation in Eq. (2.1) indicates that the state xi of node i is changed by the state informationcollected in a vector xNi

from its neighborhood Ni. First of all, the sensor nodes are typicallylow energy devices that need to be woken up only if they are in need of operation. This makesit rather difficult to synchronize the communication between each other. On the other hand,even if such synchronization has been established [15, 42–44], concurrent transmission of allnodes would lead to channel congestion and communication may not be possible. Also notethat typically nodes do not direct their transmission to a specific node, but rather transmitradially to all the nodes in their neighborhood. Thus, in a small fraction of time, a lot ofchannel resources or network capacity is required while at all other times no capacity wouldbe needed. Alternatively, the instant tk may represent a time epoch in which all nodes have totransmit one after the other, thus avoiding interference and utilizing spectrum resources moreefficiently. Then, after the operation of the last node is completed, the update may take place.This, however, requires careful synchronization [15, 42] or a controlling base station [10], justwhat is preferable to be avoided. Note that the notation for global time, tk, is changed toa simple event counter k as the time itself is of no further importance.Unlike Sec. 2.1, where I review the proof of the convergence of the average consensus

algorithm based on the classical Perron-Frobenius theorem (see App. C), an alternative viewon the convergence of consensus algorithms is provided in Sec. 2.4 and Sec. 2.5. The linear-algebra viewpoint in Sec. 2.4 allows to provide necessary conditions for convergence of consensusalgorithms even in asynchronous scenarios. Furthermore, in Sec. 2.5 I show the similaritiesbetween consensus algorithms and so-called projection algorithms, and by mapping one toanother I provide bounds on the mixing parameters.Although the consensus algorithms may be useful in applications which are aimed to com-

pute an agreement of a static, invariant measure, applications which require computation ofa consensus of time-varying observations may be even more challenging. Such tasks can besolved by so-called dynamic average consensus algorithms, which I discuss in Sec. 2.6 and whereI also provide bounds on convergence time and rate of such algorithms.

Page 23: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.1. AVERAGE CONSENSUS ALGORITHM 11

2.1 Average consensus algorithm

Distributed consensus algorithms appear in nature and human interactions in many variousforms, each time a group of independently existing units solve a common problem or behave ingroups, e.g., [3–7]. Such units typically have knowledge only about their nearest neighbors andhave no global information about the group. Based on the behaviour of their neighbors, theyadapt and update their own state (e.g., position, speed). Moreover, the algorithm is called anaverage consensus algorithm, if they reach some common average value of all local states. Mostcommonly such behaviour is observed in every-day life each time some substances, liquids aremixed or heated, and the matter stabilizes at some average consistency, color, or temperature.Physically this process is described by the heat/diffusion differential equation, which, as shownin Appendix A, can be interpreted as an average consensus algorithm, i.e., basically the samealgorithm which describes the behaviour of birds and bees in swarms.From a mathematical point of view, the distributed average consensus algorithm can be

formulated as a difference equation, i.e.,

xi(k) =N∑

j=1

[W(k)]i,j xj(k − 1), ∀i = 1, 2, . . . N, (2.2)

or more compactly (globally)x(k) = W(k)x(k − 1), (2.3)

with some x(0) ∈ CN×1, and where x(k) = (x1(k), x2(k), . . . , xN (k)) is a vector of all stacked

local states xi(k) at iteration k and the matrix W(k) ∈ RN×N is a matrix describing the

network (environment). Properties of this matrix influence the convergence of the algorithm,i.e., whether the limit lim

k→∞x(k) exists; and also the speed of convergence, i.e., how fast such

limit is reached.The distributed linear1 consensus average consensus algorithm can then be formalized as

follows:

Algorithm 2.1: Distributed Linear Average Consensus Algorithm

At time k = 0, each node i (i = 1, 2, . . . , N), measures (stores) a scalar value xi(0).For time k = 1, 2, . . . :

1. Each node i receives data (scalar) from its neighbors, i.e., xj(k − 1)j∈Ni.

2. The node i then multiplies all received data from neighbors j : j ∈ Ni with anappropriate weight [W(k)]i,j . Its local data are also weighted by a weight [W(k)]i,i.The selection of the weights is discussed in the Sec. 2.1.2.

3. Each node i then combines the weighted received data with its own data, i.e.,

xi(k) = [W(k)]i,ixi(k − 1) +∑

j∈Ni

[W(k)]i,jxj(k − 1).

4. Node i broadcasts the data xi(k).

1Note that Eq. (2.2) is a linear combination of the received and the stored values.

Page 24: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

12 CHAPTER 2. CONVERGENCE

In the following section, I discuss the convergence of the average consensus algorithm in theclassical (spectral graph theory) sense and show possible selections of the weight matrixW(k)

which I use later throughout this thesis. Since my work focuses mostly on the convergence instatic invariant networks, I also consider matrixW(k) to be invariant over the iterations, i.e.,if not explicitly mentioned, from now onW(k) = W.

2.1.1 Convergence of the average consensus algorithm

As mentioned, the convergence of the algorithm in Eq. (2.3) for constant W means that thelimit limk→∞ x(k) exists, for ∀x(0) <∞, i.e.,

limk→∞

x(k) = limk→∞

Wkx(0) = x⋆, (2.4)

and therefore the convergence of the algorithm depends only on the properties of matrix W.If the limit limk→∞Wk exists, then the algorithm converges. Here, I provide the classicalproof of the convergence of average consensus algorithm, based on results of the so-calledPerron-Frobenius theorem (see App. C) and spectral graph theory.

Proof. Performing the eigendecomposition [45] of matrixW, i.e.,

W = UΛU−1 (2.5)

where matrix U is a matrix of eigenvectors, and Λ is a matrix of eigenvalues, leads to

Wk = UΛkU−1, (2.6)

thus the eigenvalues Λ =

λ1λ2

. . .

λN

influence the convergence and the whole be-

haviour of the average consensus algorithm.Moreover, it can be concluded that the absolute value of the biggest eigenvalue must not

be bigger than 1 for the limit limk→∞Λk to exist. Since matrixW describes the topology ofthe underlying communication network, there is yet another constraint, i.e., it must be basedon the adjacency matrix (Eq. (1.3)) which describes the connections between nodes.Furthermore, from the spectral graph theory [46,47] it is known that the so-called Laplacian

matrix L (see Introduction Sec. 1.2.2) possesses following desired properties:

1. It is semidefinite, i.e., eigenvalues of the Laplacian matrix L are µi ≥ 0, i = 1, 2, . . . , N .

2. For undirected strongly connected graphs, the rows and columns sum up to zero, i.e.,L1 = 0, 1⊤L = 0⊤.

Moreover, from the Perron-Frobenius theorem (see App. C) which states (among others) thatfor any irreducible matrix [45]:

1. there is a real-valued unique vector u > 0, such thatWu = λ1u,

2. λ1 has geometric and algebraic multiplicity 1,

3. limk→∞ 1λk1Wk = uv⊤, where u and v are right and left eigenvectors corresponding to

λ1 such that v⊤u = 1,

Page 25: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.1. AVERAGE CONSENSUS ALGORITHM 13

-1.0

-0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

-1.0 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 1.0

1.0

ε = 0.5/∆ε = 0.75/∆ε = 1/∆ – worst case

ε decreasing

Rez

Imz

Figure 2.1: Example of the eigenvalues λi of a ring topology graph for various parameters ε.

it can be concluded that matrixW (sometimes called Perron matrix [13]) of form

W = I− εL, 0 < ε <1

∆, (2.7)

satisfies all the properties. Here, ∆ is the maximum node degree in the network, and matrixWhas eigenvalues λi = 1− εµi, with the simple (single) eigenvalue λ1 = 1 (µ1 ≡ µmin = 0)2 withcorresponding eigenvector u1 = c1, c 6= 0. Note that 1⊤W = 1⊤ andW1 = 1, and thus ma-trixW is a so-called doubly stochastic matrix. The Gershgorin theorem [48] further states thatall eigenvalues of L are inside a unit circle |z−∆| ≤ ∆ (z ∈ C), thus the largest eigenvalue is ob-tained for µmax ≤ 2∆. In general, for ε < 1

∆ , εµmax <2∆∆ < 2, and thus λ1 = 1 corresponds to

the largest (simple) eigenvalue in absolute value (1 > λN = 1−εµmax > 1−2 > −1)3 (Fig. 2.1).

Notice that since the eigenvector u1 corresponding to λmax = 1 is normed v⊤1 u1 = 1,

it follows that u1 = c1 = 1√N1.

Summarizing these results, observe that

limk→∞

x(k) = limk→∞

UΛkU−1x(0) =

= limk→∞

(1√N1 u2 . . .

)

1

λ2. . .

λN

k

1√N1⊤

v⊤2...

x(0) =1

N11⊤x(0) ≡ x(0)1. (2.8)

Thus, in each node the algorithm converges to the average of the initial values x(0).

2Note that λN ≡ λmin and µN ≡ µmax, thus, λmin ∼ µmax and vice versa.3In case ε = 1

∆, provided the graph is bipartite, two eigenvalues with absolute value 1 may be obtained.

For non-bipartite graphs, however, the maximum eigenvalue µmax is always smaller than 2∆, thus the algorithmconverges also for ε = 1

∆[47, 49].

Page 26: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

14 CHAPTER 2. CONVERGENCE

Second largest eigenvalue of matrix W

From the spectral graph theory it is known that the second smallest eigenvalue (algebraicconnectivity) of the Laplacian matrix L can tell much information about the graph itself andalso the behaviour of the average consensus algorithm. First of all, if µ2 = 0 (λ2 = 1) thegraph is not connected [46]. Since the smallest eigenvalue of the Laplacian matrix is µmin = 0,the second smallest eigenvalue is sometimes called also the spectral gap (“gap” between thesmallest and second smallest).In terms of matrix W this corresponds to its second largest eigenvalue, i.e.,

max|λ2| , |λN |4. From now on, I denote the second largest eigenvalue (in absolute value)of the matrixW as λ2.As shown in [13, 49], the spectral radius (see App. B) of the difference between W and

1N 11⊤ is also equal to the second largest eigenvalue λ2, i.e.,

ρ

(

W − 1

N11⊤)

= λ2. (2.9)

Also, it can be derived [50] that λ2 determines the speed of convergence, i.e.,∥∥∥∥x(k)− 1

N11⊤x(0)

∥∥∥∥2

=

∥∥∥∥Wkx(0)− 1

N11⊤x(0)

∥∥∥∥2

=

∥∥∥∥(Wk − 1

N11⊤)x(0)

∥∥∥∥2

≤∥∥∥∥Wk − 1

N11⊤∥∥∥∥2

‖x(0)‖2

≤ λk2 ‖x(0)‖2 .

Thus, the error decays exponentially with λ2. Clearly, the smaller λ2, the faster the conver-gence is.

2.1.2 Selection of weight matrix W

In the previous section I showed how to create the weight matrix W based on the Laplacianmatrix (Eq. (2.7)). Nevertheless, in the discussion about the second largest eigenvalue I con-sidered matrixW as is, not saying anything about the Laplacian matrix. Intuitively, one canargue that matrixW can be selected also in other ways.Here, I summarize a few possible weight models, which I later use in this thesis:

Constant weight model – is defined as follows (Eq. (2.7)):

[W]i,j =

ε if (i, j) ∈ E1− εdi if i = j

0 otherwise,

(2.10)

with 0 < ε < 1∆ , ∆ = maxdi being the maximum degree in the network. In case of

non-bipartite graphs (0 < ε ≤ 1∆) a model with ε =

1∆ is called a maximum-degree weight

model. Note that ∆ < N and thus, ε can be bounded by 0 < ε ≤ 1N [13].

4Note that in case of (2.7): |λN | > |λ2|, if µmax > 2ε− µ2.

Page 27: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.2. GENERAL FRAMEWORK 15

Metropolis weight model – is defined as follows [51]:

[W]i,j =

11+maxdi,dj , if (i, j) ∈ E1−∑j′ [W]ij′ , if i = j,

0 otherwise.

(2.11)

Optimized convex model – is a model proposed by Xiao and Boyd [49], and optimizes theweightsW according to λ2, i.e.,

minimize λ2subject to W ∈ L,

1⊤W = 1⊤, W1 = 1.

Thus, this weight model obtains the fastest convergence speed in terms of λ2.

For the sake of completeness, there exist also other weight models, e.g., [13, 52–54].

In the following chapters (Chap. 3 and Chap. 4) I will use these models frequently. In thenext sections I provide different views on the convergence of the distributed algorithms.

2.2 General framework for describing distributed algorithms

In this section, I present a general unifying description of distributed algorithms [19], consider-ing not only distributed average consensus (Sec. 2.1). This high-level framework allows to maplocal, node-based, algorithms onto a single global, network-based, form. As a first consequence,the new description offers to analyze their transient and steady-state behaviour by classicalmethods. A further consequence is the analysis of implementation issues as they appear dueto quantization in computing and communication links. Exemplarily, I apply this method onseveral distributed aggregation algorithms: the push-sum algorithm, average consensus as wellas its quantized form and furthermore examine the effects of quantization noise which is intro-duced by the bandwidth limited communication links and finite precision computation abilityof every node.

From the processing point of view, the operations are assumed to be performed in thefollowing manner. At the beginning of step k, each node receives data from its neighbors, thenuses them in its algorithm, and sends some data at the end of step k to its neighbors. Theywill receive them at the beginning of step k + 1 (see Fig. 2.2).

I define the following:

• zi(k) is the result of the main algorithm of node i at step k.

• ui(k) is the measurement (observation) of node i at step k.

• yj→i(k) is the data sent from node j to node i. Node j sends them at the end of stepk − 1, and node i receives them at the beginning of step k.

Page 28: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

16 CHAPTER 2. CONVERGENCE

node j

node i

k + 1k − 1 k

yi→i(k − 1) ≡ xi(k − 1) yi→i(k) ≡ xi(k) yi→i(k + 1) ≡ xi(k + 1)

zi(k − 1) zi(k)

yj→i(k − 1) yj→i(k) yi→j(k + 1)

Figure 2.2: Principle of a distributed algorithm locally at a node.

In some algorithms (e.g., average consensus), the result of computation at node i, zi(k), isequal to the data sent to its neighbors yi→j(k), but in general it is not always the case.Note that z(k), u(k) and y(k) can be scalars, vectors, matrices, or a collection (list) of

scalars/vectors/matrices (data types). Without loss of generality, I consider them as columnvectors. Their dimensions are constant in time.I will also consider yi→j(k + 1) the data computed at step k by node i and sent to itself

(referred as a state). Moreover, at each step k, nodes communicate with some (not necessarilyall) of their neighbors. I denote the following sets:

1. Sendi(k) ⊆ Succ(i): set of nodes (in the neighborhood of i) to which the node i is goingto send the messages at step k (Succ – “successors”; Sec. 1.2.2).

2. Reci(k) ⊆ Pred(i): set of nodes (in the neighborhood of i) from which the node i receivedthe messages at step k (Pred – “predecessors”; Sec. 1.2.2).

If there is no failure in the communication links, then:

∀i ∈ V , Reci(k) ≡ j ∈ V | i ∈ Sendj(k − 1),

otherwise the equivalence becomes an inclusion (subset).Note also that it is equivalent to define E ′(k) ⊆ E as the subset of currently commu-

nicating edges (in that case, Sendi(k) ≡ SuccE ′(k)(i) and Reci(k) ≡ PredE ′(k)(i)). ThenG′(k) = (V , E ′(k)) denotes the sub-network at time step k.For every node i, at every time step k, a local distributed algorithm is composed of following

procedures (see Fig. 2.2):

1. Define the set Sendi(k) (for broadcast, Sendi(k) ≡ Succ(i)).

2. Receive data from the communicating neighbors yj→i(k)j∈Reci(k).

3. Compute the computation result zi(k), the transmission datayi→j′(k + 1)

j′∈Sendi(k)and the local data to store yi→i(k + 1) (data to be sent to itself) from the current mea-surement ui(k), the data received yj→i(k)j∈Reci(k)

and the local stored data yi→i(k).

4. Send the datayi→j′(k + 1)

j′∈Sendi(k) to the selected neighbors Sendi(k).

Page 29: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.2. GENERAL FRAMEWORK 17

2.2.1 Homogeneously distributed algorithms

Definition 2.1. A Homogeneously Distributed Algorithm (HDA) is a local distributed algo-

rithm in a network where:

• each node can be replaced by any other node (all nodes have the same functionality),• each node sends a message with the same data-type (scalar/vector/etc.) to its neighbors,i.e., ∀i ∈ V , ∀j ∈ Succ(i):

yi→j(k) ≡ yi(k),

which can be formalized as follows:

zi(k) = fi

(

ui(k), yj(k)j∈Reci(k),xi(k)

)

, (2.12)

xi(k) = gi

(

ui(k − 1), yj(k − 1)j∈Reci(k−1) ,xi(k − 1))

, (2.13)

yi(k) = hi

(

ui(k − 1), yj(k − 1)j∈Reci(k−1) ,xi(k − 1))

, (2.14)

where xi(k) ≡ yi→i(k) is consistent with a state notation.

The size of the set yj→i(k)j∈Reci(k)may change at each time k, but the functions fi(·),

gi(·), hi(·) are the same5, capable to accept three inputs of time-varying sizes. Since the sizedoes not depend on the node, I denote by nX , nY , and nU the size of the column vector xi(k),yi(k), and ui(k), respectively.

2.2.2 Linear HDA

Definition 2.2 (Linear HDA). If the update strategy (2.13) and the communication stra-tegy (2.14) are linear functions, then the algorithm is said to be linear homogeneouslydistributed and can be described as follows:

zi(k) = fi

(

ui(k), yj(k)j∈Reci(k),xi(k)

)

, (2.15)

xi(k) = αi

j∈Reci(k−1)

yj(k − 1) + βixi(k − 1) + θiui(k − 1), (2.16)

yi(k) = γi∑

j∈Reci(k−1)

yj(k − 1) + δixi(k − 1) + ϑiui(k − 1), (2.17)

where αi ∈ RnX×nY (receptivity), βi ∈ R

nX×nX (self-transmissivity), θi ∈ RnX×nU (absorpti-

vity), γi ∈ RnY ×nY (transmissivity), δi ∈ R

nY ×nX (distributivity) and ϑi ∈ RnY ×nU (emissivity)

are constant matrices.

Remark: I do not consider the linearization of the outputs of the local algorithm (Eq. (2.15)),since this equation is not involved in the iteration loop, but represents only results of the localalgorithm at a node.

5Note that Eq. (2.14) may be understood as a generalization of Eq. (2.1) in the sense thatxNi

(tk−1) ≡ yj(k − 1)j∈Reci(k−1) and neighborhood Ni contains also i (self-loop). Intermediate measure-ments ui(k) (Eq. (2.14)) are not explicitly considered in Eq. (2.1).

Page 30: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

18 CHAPTER 2. CONVERGENCE

It is possible to aggregate the vectors xi(k)i∈V together in a column vectorx(k) ∈ R

(N ·nX)×1 (and the vectors yi(k)i∈V together in a column vector y(k) ∈ R(N ·nY )×1,

respectively).To make the connection between local and global algorithms, I denote:

α , diag(α1, . . . , αi, . . . , αN ) ∈ R(N ·nX)×(N ·nY ), (2.18a)

β , diag(β1, . . . , βi, . . . , βN ) ∈ R(N ·nX)×(N ·nX), (2.18b)

θ , diag(θ1, . . . , θi, . . . , θN ) ∈ R(N ·nX)×(N ·nU ), (2.18c)

γ , diag(γ1, . . . , γi, . . . , γN ) ∈ R(N ·nY )×(N ·nY ), (2.18d)

δ , diag(δ1, . . . , δi, . . . , δN ) ∈ R(N ·nY )×(N ·nX), (2.18e)

ϑ , diag(ϑ1, . . . , ϑi, . . . , ϑN ) ∈ R(N ·nY )×(N ·nU ). (2.18f)

Proposition 2.1 (Global algorithm). Using the set of parameters (2.18a–2.18f), a global

algorithm (the aggregation of the algorithms of all the nodes) can be formulated as:

zi(k) = f(

ui(k), yj(k)j∈Reci(k),xi(k)

)

, (2.19)

x(k) = α(AG′(k−1) ⊗ InY

)y(k − 1) + βx(k − 1) + θu(k − 1), (2.20)

y(k) = γ(AG′(k−1) ⊗ InY

)y(k − 1) + δx(k − 1) + ϑu(k − 1). (2.21)

where the communication links are implicitly described by the adjacency matrix AG′(k).

2.2.3 Examples

In the following section, I provide examples of algorithms which can be considered as linearHDAs (Sec. 2.2.2).

Push-Sum algorithm

The push-sum protocol [55] is a simple algorithm for distributed averaging. It belongs to theclass of gossip-based algorithms [12]. According to the proposed formalism, this is a linearhomogenously distributed algorithm with following properties:

nX = nY = 2, xi(0) =

(

ui(0)

1

)

, u(k) = 0, for k > 0,

and

x(k)

Seefootnote7

= y(k) =

(

sum

weight

)

, zi(k) =x(1)(k)

x(2)(k)=

sum

weight, ∀k > 0.

At time step k, each node randomly selects only one of its neighbors. Therefore, AG′(k)

contains at most one “1” in every row. It can be then found

α = β = γ = δ =1

2I2N .

7There is no difference between data kept and data sent.

Page 31: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.2. GENERAL FRAMEWORK 19

Average consensus

Average consensus (see Sec. 2.1) is also a distributed algorithm for solving distributed averagingproblems. It can be formalized as

nX = nY = 1, u(k) = 0, for k > 0,

z(k − 1) = x(k) = y(k),

and the nodes are initialized by x(0) = u(0).

The nodes broadcast their data, therefore AG′(k) = AG , ∀k > 0, and

β = δ = IN − εDG ,

α = γ = εIN ,

where ε > 0 is the step-size and DG is the degree matrix of the graph G.

Real-valued average consensus over noisy quantized channels

In [39], Censi and Murray proposed a new strategy to compute the average consensus whilehaving a quantized communication and preserve the convergence (not necessary to the trueaverage). Their algorithm is based on an integration of the quantized communication errorto be re-injected into the system. Following the notation from [39] (superscript “c” indicatesCensi’s equations, [39, Tab. I]):

cx(k) =(

I− η

∆D)

cx(k − 1) +η

∆Acy(k − 1)

cc(k) = cc(k − 1) + (cy(k − 1)− cx(k − 1))cy(k) = ψ(cx(k)− cc(k))

where ψ(·) is a quantization function which introduces a quantization noise ζ(k) (see Sec. 2.3.3);parameter η ∈ (0, 1) (constant consensus weight model (2.7), see Sec. 2.1.2); ∆ being themaximum node degree in the network; D diagonal degree matrix; and A adjacency matrix.

This set of equations can be rewritten in the formalism as follows:

nX = 3, nY = 1

x(k)=

x(1)(k)

x(2)(k)

x(3)(k)

cx(k)cy(k)cc(k)

, y(k) ≡ cy(k), z(k − 1) = x(1)(k) ≡ cx(k), (2.22)

with parameters

α= IN ⊗

η∆η∆

0

︸ ︷︷ ︸

L

, (2.23a)

Page 32: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

20 CHAPTER 2. CONVERGENCE

β =η

∆D⊗

K︷ ︸︸ ︷

−1 0 0

−1 0 0

0 0 0

+ IN ⊗

1 0 0

2 −1 −1

−1 1 1

, (2.23b)

δ =η

∆D⊗

(

−1 0 0)

︸ ︷︷ ︸

M

+ IN ⊗(

2 −1 −1)

︸ ︷︷ ︸

N

, (2.23c)

γ =η

∆IN . (2.23d)

Nodes are initialized by x(0) =(u(0)⊤, 0⊤N , 0

⊤N

)⊤and they communicate with all neigh-

bors at each time, i.e., AG′(k) = AG , ∀k ≥ 0.

The idea behind this approach is that the quantized communication error is fed-back intothe system, thus preserving convergence to a steady state which can, however, differ from thetrue average of the initial state (see the simulations ahead, Fig. 2.3).In general, having no assumptions on the noise ζ(k), except being bounded, it can model

any quantization noise on links as well as any independent disturbance in transmission.

Further on, I refer to this algorithm as to “Censi’s algorithm”.

2.3 Convergence of quantized consensus algorithms

Based on the formalism presented in the previous section, in this section I analyze the impactof quantization on the consensus algorithms.Let us define the first (µ) and second (σ2, Ψ) order moments [56] of a noise vector ξ(k) by:

µξ , E ξ(k) , (2.24)

Ψξ , E

(ξ(k)− µξ)(ξ(k)− µξ)⊤

, (2.25)

σ2ξ , E

(ξ(k)− µξ)⊤(ξ(k)− µξ)

= trace(Ψξ), (2.26)

where E· and trace(·) are the mean and the trace operator, respectively. Let us collect x(k)and y(k), i.e.,

Γ(k) ,

(

x(k)

y(k)

)

. (2.27)

Then the algorithm can be formulated as a state-space system

Γ(k) =

(

β α(AG′(k−1) ⊗ InY)

δ γ(AG′(k−1) ⊗ InY)

)

︸ ︷︷ ︸

P(k−1)

Γ(k − 1) +

(

θ

ϑ

)

︸ ︷︷ ︸

Q

u(k − 1). (2.28)

Due to a realistic implementation, two terms need to be added:

• ξ′(k) – noise due to the computations

• ξ(k) – noise due to the quantization of the sent data (apply only to y(k)).

The distortion in measurement is implicitly contained in u(k).

Page 33: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.3. CONVERGENCE OF QUANTIZED CONSENSUS ALGORITHMS 21

Remark: These noise sequences are determined by the implementation scheme. In a fixed-point scheme, these noise sequences can be modelled as independent white Gaussian randomnoise process with given moments defined by the word-lengths used in the algorithm and thealgorithm itself [57, 58].

The implemented system is then given by

Γ⋆(k) = P(k − 1)Γ⋆(k − 1) +Qu(k − 1) +Rζ(k − 1), (2.29)

where R can be, in general, any linear transformation of the noise term ζ(k)8.

For example, for “Censi’s algorithm” (see Section 2.2.3), by definition of the algorithm:

R =

IN ⊗

0

1

0

IN

, (2.30)

because the same communication noise applies to x(2)(k) and y(k) (cf. Eq. (2.22)).

Proposition 2.2. Considering a case where P (Eq. (2.28)) is constant in time. Then,the term ∆Γ⋆(k) , Γ⋆(k)− Γ(k) is the noise added to Γ(k) and satisfies

∆Γ⋆(k) = P∆Γ⋆(k − 1) +Rζ(k − 1). (2.31)

Thus, the first and second moment order of ∆Γ⋆(k) of the steady state are given by

µ∆Γ⋆ = µζ(I−P)−1R, σ2∆Γ⋆ = trace(Ω), (2.32)

where Ω is the solution of the Lyapunov equation Ω = PΩP⊤ +RΨζR⊤.

Proof. Considering Γ⋆(k) to be statistically independent from ζ(k), then for k → ∞:

E[∆Γ⋆(k)∆Γ⋆(k)H

]

︸ ︷︷ ︸

Ω

= E[

P∆Γ⋆(k − 1)∆Γ⋆(k − 1)HP⊤]

+ E[

Rζ(k − 1)ζ(k − 1)HR⊤]

=

= PE[∆Γ⋆(k − 1)∆Γ⋆(k − 1)H

]

︸ ︷︷ ︸

Ω

P⊤+RE[ζ(k − 1)ζ(k − 1)H

]

︸ ︷︷ ︸

Ψζ

R⊤

Ω = PΩP⊤ +RΨζR⊤.

Note that the noise ζ(k) is added to ∆Γ⋆ through the state-space system (P,R, I,0). Classicalresults on noise sequences through linear systems apply here [57,59].

8ζ(k) contains ξ(k) and/or ξ′(k); e.g., for the algorithm [39] (Section 2.2.3): ζ(k) ≡ ξ(k).

Page 34: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

22 CHAPTER 2. CONVERGENCE

Remark: For distributed algorithms solving averaging problems, matrix P always con-tains one eigenvalue of value 1. In that case, the term (I − P) is not invertible, and one canconclude that the added noise ζ(k) must be of zero mean for the algorithm to converge. How-ever, the term (I − P)−1R can be computed when R works as a stabilizing term, i.e., whenlim

K→∞∑K

k=0(PkR) exists. This is also the case for the “Censi’s algorithm” with R defined as

(2.30) as I will show later (see Sec. 2.3.2).

Thus, using this approach one can exactly compute the “drift from the mean” (µ∆Γ⋆) andthe “average disagreement” (σ2∆Γ⋆) for which Censi, though from a different point of view,derived only loose bounds (see Tab. 2.1).

As stated, although in this case matrix P is on the stability boundary, i.e., contains aneigenvalue 1, matrix R works as a stabilizing term, thus ensuring convergence for any (e.g.,quantization) non-zero mean noise, as can be observed from the simulations (see Fig. 2.3 ahead),and as it is proved in Sec. 2.3.2.

2.3.1 Simulations

To verify the above mentioned theoretical relations for average consensus and quantized averageconsensus, simulations for several various network topologies were performed.

For unquantized average consensus (see Sec. 2.2.3, Average consensus), the convergencewith an added zero-mean noise process was studied and it was shown that the theoreticalbounds (Eq. (2.32)) hold rather precisely (see Fig. 2.3a).

For average consensus with quantized communication (see Sec. 2.2.3, Censi’s algorithm) theconvergence of the algorithm even for non-zero mean noise was verified (see Fig. 2.3b). It canbe observed that this additive noise can be of any type (not only, e.g., round-off noise). Thetheoretical results of Proposition 2.2 were confirmed by simulations and compared with boundsderived by Censi (see Tab. 2.1). It must be noted, however, that (µζ , Ψζ) were computed fromtrue error noise added, while Censi’s bounds are predicted. Note however that the estimationof (µζ , Ψζ) of quantization noise, as well as any additive noise, can be calculated in advance(see Sec. 2.3.3), thus the Eq. (2.32) still holds.

Also, the equivalence of the algorithms formulated in the proposed formalism (see Sec. 2.2.3)with the original algorithms was examined and was proved to be, value by value, identical.

network topologyN = 10

µ∆Γ⋆theor

Censi’s boundon the meandrift [39]

σ2∆Γ⋆theor

Censi’s averagedisagreement [39]

star 0.0032 0.05 0.0000585 1.2247

complete 0.0121 0.05 0.00000908 1.2247

Table 2.1: Comparison of Censi’s bounds vs. the proposed exact theoretical approach for twotopologies.

Page 35: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.3. CONVERGENCE OF QUANTIZED CONSENSUS ALGORITHMS 23

0 100 200 300 400 500 600 70020

25

30

35

40

45

50

46.324

Node 1 – with zero-mean noiseNode 1 – without noise

k

x1(k)

True average value

(a) Behaviour of node 1 – average consensus (see Sec. 2.2.3): µ(sim)∆Γ⋆ = 0, µ

(theor)∆Γ⋆ = 0,

σ2∆Γ⋆

(sim)=3.5, σ2∆Γ⋆

(theor)=3.53.

0 100 200 300 400 500 600 700 800 900 1 0000

1

2

3

4

5

6

7

8

9

10Node 1Node 2

k

xi(k),i=

1,2

Steady-state average value

True average value

µ∆Γ⋆

(b) Behaviour of the first two nodes–“Censi’s algorithm”(see Sec. 2.2.3): µ(sim)∆Γ⋆ =0.50,

µ(theor)∆Γ⋆ =0.49, σ2∆Γ⋆

(sim)=82.86, σ2∆Γ⋆

(theor)=82.86.

Figure 2.3: Simulation for a regular network with random initial values. N=10,∆ = 6.

Page 36: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

24 CHAPTER 2. CONVERGENCE

2.3.2 Steady state of Censi’s algorithm

As it was showed in [20] under some assumptions on the network topology and the noise onthe links, Censi’s algorithm always converges to a steady state.

Theorem 2.1. Using the state-space Equation (2.31) with parameters (2.23a)–(2.23d), andsatisfying the conditions:

1. The graph G is strongly connected,

2. The random noise process on the links ζ(k) is bounded and converges to ζ,

the Censi’s algorithm (Section 2.2.3) asymptotically converges to a steady state:

∆Γ⋆ = P∞∆Γ⋆(0)︸ ︷︷ ︸

Mean value

+

IN ⊗

0

0

1

0N

+

([1N 11⊤A

]⊗ L

1N 11⊤η

∆A

)

ζ

︸ ︷︷ ︸

d – drift from the mean

(2.33)

where P∞ =

(

P∞,1 P∞,2

P∞,3 P∞,4

)

with

P∞,1 =

[1

N11⊤

(η2

∆2AD−

( η

∆D− I

)2)]

⊗K+

[1

N11⊤A

]

⊗ LN (2.34)

P∞,2 =

[1

N11⊤A

]

⊗ L (2.35)

P∞,3 =

[1

N11⊤

(η2

∆2AD−

( η

∆D− I

)2)]

⊗M+

[1

N11⊤

η

∆A

]

⊗N (2.36)

P∞,4 =1

N11⊤

η

∆A, (2.37)

and ζ being determined by the statistics of the noise ζ(k); A being the adjacency matrix, D

degree matrix of the given topology. K,L,M,N are appropriate matrices from (2.23a)–(2.23d).

Proof. Taking Eq. (2.31), for time K it can be written:

∆Γ⋆(K) = PK∆Γ⋆(0) +K−1∑

k=0

PkRζ(K − 1− k).

After inserting parameters (2.23a)–(2.23d) in P, knowing that (AB)⊗(CD) = (A⊗C)(B⊗D),and applying K ≥ 2 multiplications, it yields

PK =

(

PK,1 PK,2

PK,3 PK,4

)

, (2.38)

Page 37: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.3. CONVERGENCE OF QUANTIZED CONSENSUS ALGORITHMS 25

where

PK,1 =

[( η

∆(A−D) + I

)K−2(η2

∆2AD−

( η

∆D− I

)2)]

⊗K+

+

[( η

∆(A−D) + I

)K−2A

]

⊗ LN

PK,2 =

[( η

∆(A−D) + I

)K−1A

]

⊗ L

PK,3 =

[( η

∆(A−D) + I

)K−2(η2

∆2AD−

( η

∆D− I

)2)]

⊗M+

+

[( η

∆(A−D) + I

)K−2 η

∆A

]

⊗N

PK,4 =( η

∆(A−D) + I

)K−1 η

∆A.

where K,L,M,N are as in (2.23a)–(2.23d).

Since the term( η∆(A−D) + I

)Kis the only term growing with K, I only need to show

that this term converges as K → ∞. As the term is identified as the so-called weight matrixW (see Sec. 2.1.2), which, for strongly connected graphs, has a trivial maximum eigenvalueλ1 = 1 with corresponding eigenvector u1 = 1/

√N1 (see Sec. 2.1.1), then it follows that

limK→∞

( η

∆(A−D) + I

)K

“eigen-

decomposition”

= limK→∞

(UΛU−1

)K= lim

K→∞

(UΛKU−1

)=

= U

1 0 · · · 0

0 0 · · · 0...

. . ....

0 0 · · · 0

U−1 = u1u⊤1 =

1

N11⊤. (2.39)

Thus, I proved that P∞ converges.Now I prove also the convergence for the drift from the mean.

First, I assume that the disturbance noise ζ(k) asymptotically converges to a value ζ, i.e.,

limk→∞

∣∣ζ(k)− ζ

∣∣ < ǫ. (2.40)

As shown by Censi [39], for deterministic quantization this assumption holds only in the mean,i.e., converged states tend to oscillate around a common mean ζ. However, if the links aredisturbed by an independent noise, or a probablistic quantization scheme is used, e.g., [60],this assumption holds accurately.Secondly, I will show that PkR → 0 for some k ≥ k0 ≫ 0, i.e., the drift from the mean

equals to:

K−1∑

k=0

PkRζ(K − k − 1) =

k0−1∑

k=0

PkRζ +K−1∑

k=k0

PkRζ(K − k − 1).

︸ ︷︷ ︸

=0 as k0→K and ζ(k)=0, k<0

(2.41)

Page 38: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

26 CHAPTER 2. CONVERGENCE

Now, let me determine∑k0−1

k=0 PkR. Multiplying P by R I obtain

PR =

IN ⊗

0

−1

1

+A⊗ L

η∆A− IN

, (2.42)

respectively by taking the power of K ≥ 2

PKR =η

[

(A−D)( η∆(A−D) + I

)K−2A]

⊗ L[

(A−D)( η∆(A−D) + I

)K−2A]

η∆A

. (2.43)

From (2.39) it follows that

( η

∆(A−D) + I

)

u = 1u

⇒ (A−D)u = 0, (2.44)

and therefore for K ≥ k0

limK→∞

PKR = limK→∞

η

(

(A−D)( η∆(A−D) + I

)K−2A⊗ L

(A−D)( η∆(A−D) + I

)K−2 η∆A

)

=

(A−D)uu⊤A⊗ L

(A−D)u︸ ︷︷ ︸

=0

u⊤ η∆A

(

03N×N

0N×N

)

= 0. (2.45)

It means that for K large enough, i.e., K ≥ k0, PKR → 0 and since PkR is bounded anddecreasing for ∀k, also the sum ∑∞

k=0PkR must exists. Thus, the term R can be interpreted

as a “stabilizing” term without which the sum would not exist.

I can then directly show that, for K → ∞

limK→∞

K−1∑

k=0

PkR = R+PR+P2R+ · · · =

=

IN ⊗

0

1

0

IN

+

IN ⊗

0

−1

1

+A⊗

η∆η∆

0

η∆A− IN

+

(A−D)∞∑

k=0

( η

∆(A−D) + I

)kA⊗

η∆η∆

0

(A−D)∞∑

k=0

( η

∆(A−D) + I

)k η

∆A

. (2.46)

Page 39: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.3. CONVERGENCE OF QUANTIZED CONSENSUS ALGORITHMS 27

Thus, for 0 < η < 1, K ≥ k0

K−1∑

k=0

PkR =

=

IN ⊗

0

1

0

IN

+

IN ⊗

0

−1

1

+A⊗

η∆η∆

0

η∆A− IN

∆η

(1N 11⊤− I

)A⊗

η∆η∆

0

∆η

(1N 11⊤− I

) η∆A

=

=

IN ⊗

0

0

1

0N

+

1N 11⊤A⊗

η∆η∆

0

1N 11⊤η

∆A

. (2.47)

Thus, Theorem 2.1 is proved completely.

Note that the elements of Eq. (2.47) are bounds on the steady state of

x(1)(k)

x(2)(k)

x(3)(k)

y(k)

(cf. Eq. (2.22)).

It must be noted that comparing Eq. (2.32) with Eq. (2.33), ζ corresponds to µζ , and theterm (2.47) replaces the non-invertible term (I−P)−1R for Censi’s algorithm.As mentioned before, the term (I−P) is not invertible in this case. Nonetheless, I had made

an assumption that matrix R acts as a stabilizing term, thus ensuring convergence. Takingthe result of Theorem 2.1, I can conclude that this assumption was correct.After finding a general solution for the steady state, I can easily compute bounds on the

drift from the mean.

2.3.3 Bounds on the drift from the mean

In Eq. (2.33) I assumed that the noise sequence ζ(k) converges to some value ζ (in the senseof Eq. (2.40)). However, as mentioned before, in case of deterministic quantization noise thisvalue depends on the step size η, initial states, quantization scheme, and also on topology.Therefore, it is not easy to estimate it before-hand (see Tab. 2.3). However, bounds on thedrift from the mean can be derived straightforwardly.In Tab. 2.2 I recall the bounds on ζ for a few simplest quantization schemes. Note that

when considering Censi’s bound, I consider the bound as defined in Censi [39], i.e., ηβ where|q(x)− x| < β (see Tab. 2.2), for any quantization function q(·).

quantizationscheme

lowerbound

upperbound

round to nearestinteger

−0.5 0.5

round up 0 1

round down −1 0

Table 2.2: Lower and upper bounds on ζ (Eq. (2.33)) for three simple quantization schemes.

Page 40: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

28 CHAPTER 2. CONVERGENCE

Topologyquantizationscheme

N ζave

complete rounding 10 0.0128 ± 0.4630 −0.0077 ± 0.459

ceiling 10 0.49 ± 0.46830 0.502 ± 0.459

star rounding 10 0.0034 ± 0.4930 0.0106 ± 0.48

ceiling 10 0.4922 ± 0.493230 0.504 ± 0.396

ring rounding 10 −0.0093 ± 0.46830 −0.004 ± 0.46

ceiling 10 0.5139 ± 0.45530 0.5005 ± 0.403

geometric rounding 9 0.0012 ± 0.4730 −0.000462 ± 0.45

ceiling 9 0.4988 ± 0.472130 0.4925 ± 0.4612

Table 2.3: Average ζ (Eq. (2.33)) after 1 000 randomly initialized runs, for various topologiesand two different quantization schemes. Observe that the quantization error ζ may varydramatically (∼ ±0.5).

A-priori bounds on the drift from the mean

For several typical network topologies, an adjacency matrix A can be generated and a-prioribounds on the drift from the mean can be provided, if the sent data is rounded to the nearestinteger (see Tab. 2.4) and if the data is rounded up (operation ceiling) (see Tab. 2.5). Thevalues are for x(1)(k), i.e., for the stored data in the nodes (cf. Eq (2.22)).

Topology N

simulatedworst-casesteady-statevalue

boundEq. (2.47)

Censi’sbound

complete 10 0.0459 ±0.05 0.0530 −0.0459 ±0.05 0.05

star 10 0.0077 ±0.01 0.0530 −0.003 ±0.0033 0.05

ring 10 0.045 ±0.05 0.0530 −0.045 ±0.05 0.05

geometric 9 0.0315 ±0.033 0.0530 0.0382 ±0.0408 0.05

Table 2.4: Bounds on the drift dx(k), Eq. (2.33). Quantization scheme – rounding to the nearestinteger, η = 0.1, worst case of true steady-state value after 1 000 randomly initialized runs.

In Fig. 2.4 a typical behaviour of the Censi’s algorithm is shown for the first seven states

Page 41: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.3. CONVERGENCE OF QUANTIZED CONSENSUS ALGORITHMS 29

Topology Nsimulated worst-casesteady-state value(max/min)

boundEq. (2.47)(max/min)

Censi’sbound

complete 10 0.0705/0.0188 0.1/0 0.130 0.0961/0.0042 0.1/0 0.1

star 10 0.0188/0.0011 0.02/0 0.130 0.0051/0.0015 0.00666/0 0.1

ring 10 0.0953/0.0045 0.1/0 0.130 0.078/0.0236 0.1/0 0.1

geometric 9 0.0648/0.0019 0.066/0 0.130 0.0778/0.0048 0.0817/0 0.1

Table 2.5: Bounds on the drift dx(k), Eq. (2.33). Quantization scheme – ceiling, η = 0.1,worst case of true steady-state value after 1 000 randomly initialized runs (maximum/minimumvalue).

in case of a geometric random network with 30 nodes, with the bounded steady-state phase.

0 1 000 2 000 3 000 4 000 5 000 6 000 7 000 8 00025.0

25.1

25.2

25.3

25.4

25.5

25.6

25.7

xi(k)

k

Figure 2.4: Example of convergence behaviour of xi(k) (i = 1, 2, . . . , 7) and computed bounds(2.33) for a random geometric network topology; N = 30,∆ = 6.

Page 42: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

30 CHAPTER 2. CONVERGENCE

2.4 Analysis of convergence: Algebraic approach

As mentioned in the introduction of this chapter, for many reasons, it is more feasible tooperate on asynchronous transmissions, that is, each node independently decides at a certaintime instant to transmit to its neighbors regardless of who receives this information. Fromperspective of the transmitting node, its neighbors may or may not receive the informationand once they receive it correctly, they may update their own state, possibly followed byanother transmission of their own. On a practical note, such transmission scheme requires tosend not only the information, but also a source node identifier as well as a message numberto ensure that at the receiving node a repeatedly received information is not interpreted asnew but simply discarded. Also, as the transmission may be unreliable, some error-controlsuch as automated repeat request (ARQ) protocol between nodes to handle duplicate and lostmessages, might be required [61].Therefore, in this section, I move away from purely synchronous consensus algorithms,

and analyze the convergence of an asynchronous consensus algorithm, described by a state-transition matrix, in the mean and mean square sense. I put aside the graph-theory analysisof consensus algorithms, thus, unlike Sec. 2.1, I do not analyze the properties of the weightmatrix W. The convergence analysis provided here involves only methods of linear algebraand knowledge of linear vector spaces.

2.4.1 Asynchronous model

I will consider asynchronous linear updates every time an information is received in the neigh-borhood of a transmitting node i, Eq. (2.1). On the example of the average consensus algorithm(see Sec. 2.1), I will show the mechanics of such network. Take, for example, node i that re-ceives from its neighbor node m a value xm. Node i would then take its internal state xi andcompute

xi(k) = αxi(k − 1) + (1− α)xm(k − 1)

= xi(k − 1) + (1− α)(xm(k − 1)− xi(k − 1)

), (2.48)

where α ∈ (0, 1) is so-called mixing parameter.All other states remain unchanged by such operation. One can describe such basic state

transition by a matrix operation Sim on a current state vector x(k) that contains the collectedstate info of the entire network:

x(k) =

1 0 · · · 0

0. . . 0 · · · ...

0 0 1 0 · · ·

0 · · · 0 α 0 1− α 0... · · · 0 1 0

...

0 · · · 0. . . 0

0 · · · 0 1

︸ ︷︷ ︸

x(k − 1)

= Sim x(k − 1). (2.49)

The state-transition matrix Sim ∈ RN×N has a specific form: essentially a unit matrix with

ones on its diagonal, except [Sim]i,i = α and [Sim]i,m = 1−α, respectively. One can thus define

Page 43: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.4. ANALYSIS OF CONVERGENCE: ALGEBRAIC APPROACH 31

a matrix of transitions at time k, i.e., S(k) ∈ S = Sim∀(i,m)∈E(k), where S is the set of allallowed transitions in the network. Note that I slightly abuse the notation since at each time kmatrix Sim may be different, depending on the selected pair (i,m) at time k.

If node i simultaneously at time k transmits to its neighbors l,m, n, there are three con-catenated updates, i.e., S(k) = SniSliSmi, the order of which is meaningless9, i.e., SniSliSmi =

SmiSliSni. The neighborhood of node i defines how many different columns such an entrypotentially has.

Note that receiving new information does not necessarily mean that there is an updaterequired. A node could also collect receiving messages from neighborhood nodes and after acertain time period perform the update [63]. However, this requires some logic to decide howlong to wait and when to stop waiting. A simpler strategy is to perform an update every timenew information is received. In this model I assume that a node i starts transmitting its nextmessage (repeatedly) with probability pi (for example, p1 = · · · = pN ≡ p = 1/N). If indeedseveral nodes, say l,m, n, transmit their results to node i simultaneously one can describe thisas a concatenation of SilSimSin, the order of which, as shown, is relevant. Clearly, synchronousupdates can be included in this formulation as well.

A proper working of such WSN thus depends on the sequence of matrices S(k) that occurrandomly. Thus, it is of interest to know whether the result of such a random sequencewith initial states x(0) leads to a consensus, that is x(k) = γx(0)⊤1, as k → ∞, or evenmore specifically the average consensus with γ = 1/N . Without a formal proof I can alreadyconclude intuitively one important property: there exists always a sequence of operations Sim

that will not lead to such result. Thus, obviously, the WSN cannot be guaranteed to convergein the worst case. This conclusion requires to analyze such networks in a stochastic context.

Note further that the values of α may be varying, and not only fixed. In fact every nodepair (i,m) can have its own αi,m which also can vary in time, denoted αi,m(k). Such variationson the updates clearly include link failures [63,64], for which α simply turns to one for a certaintime, and also include quantization effects [20, 65].

2.4.2 State update analysis

Matrix Sim is thus the central element of the further analysis. Since the matrix is a so-calledrow-stochastic matrix, it is well known that its largest eigenvalue λ1 is equal to 1 with thecorresponding eigenvector 1. As the matrix has a simple form, it is not difficult to compute alleigenvalues and corresponding eigenvectors of it.

Lemma 2.1. Matrix Sim in Equation (2.49) has the following right eigenvalues and eigenvec-

tors:λ1 = 1, v⊤

1 = 1⊤,λj = 1, v⊤

j = (0, . . . , 0, 1, 0, . . . , 0), at entries j 6∈ i,m,λm= α, v⊤

m= (0, . . . , 0, 1, 0, . . . , 0), at entry i.

(2.50)

9Matrices Sji∀j are simultaneously diagonalizable [62, Def. 1.3.11], since they share the sameright eigenvectors (see Lemma 2.1), i.e., SpiSri = VΛpV

−1VΛrV−1 = VΛpΛrV

−1 = VΛrΛpV−1 =

VΛrV−1VΛpV

−1 = SriSpi.

Page 44: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

32 CHAPTER 2. CONVERGENCE

Proof. Since matrix Sim is a block diagonal matrix, its characteristic polynomial is:det(Sim − λI) = (1 − λ)N−1(α − λ) = 0, i.e., the eigenvalues are λ1 = 1, with multiplicityN − 1, and λ2 = α, with multiplicity 1. Since the rows of Sim sum up to 1, one eigenvec-tor is 1. Also, since matrix Sim is block diagonal, other eigenvectors corresponding to theseeigenvalues are then straightforward to find.

In other words, the first eigenvector has all ones entries. It is “transmitted” without anychange, that is λ = 1(= λ1), and remains unchanged over the iterations. Furthermore, thematrix of N states has N −1 eigenvectors that are unit vectors, N −2 of them (all but the i-thand the m-th entry) have eigenvalue one and thus are not changing on such entries. Finally,the unit vector with one entry at position m is also an eigenvector, but only a part equal toλm = α is transmitted of such input.

In the following I investigate under which conditions such WSN can arrive at the desiredresult. I will not employ explicit knowledge of topologies here, but I will use well-knownconcepts of linear vector spaces instead.

Definition 2.3. Let us name the linear hull of all eigenvectors with corresponding eigenvalue

λ = 1 the solution space of node i:

Si = span

1, vj∀j 6∈i,m

. (2.51)

Lemma 2.2. A necessary condition for a WSN with asynchronous updates Sim (Eq. (2.49))to converge to a consensus is that the intersection of all solution spaces is spanned by exactly

one eigenvector: 1, i.e., ∩iSi = span1.

Proof. Intuitively, if the zero vector would be the only solution, the WSN would result in allzeros asymptotically. If the intersection of all spaces would be spanned by more vectors than 1,other solutions would be possible and thus the solution would not be unique. Therefore, it isnecessary to have vector 1 for reaching a common (unique) consensus.

Note that Lemma 2.2 provides only a necessary but not a sufficient condition. It is alsoworth nothing that in [51] a more profound proof with a similar definition involving para-contracting Metropolis weight matrices has been shown, offering very similar conditions forreaching consensus as in Lemma 2.2.

Page 45: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.4. ANALYSIS OF CONVERGENCE: ALGEBRAIC APPROACH 33

Example 2.1. Let us assume a WSN with N = 4. Two pairs of nodes (1, 2) and (3, 4) are

connected (bidirectional), while the pairs themselves are not connected (see Fig. 2.5). Thus, the

following matrices exist:

S12=

α 1− α 0 0

0 1 0 0

0 0 1 0

0 0 0 1

, S21=

1 0 0 0

1− α α 0 0

0 0 1 0

0 0 0 1

,

S34=

1 0 0 0

0 1 0 0

0 0 α 1− α

0 0 0 1

, S43=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 1− α α

.

All four operations share the eigenvectors 1 and v⊤= (0, 0, 1, 1), the intersection of the solution

spaces is thus span1,v, and the solution is not unique.Example 2.2. Let us now consider Example 2.1 with a single additional connection from node 3

to node 1 (see Fig. 2.5). Additionally to the previous matrices, there exists yet another matrix

S13 =

α 0 1− α 0

0 1 0 0

0 0 1 0

0 0 0 1

.

Now the intersection of the solution spaces reduces to the linear hull of the eigenvector 1.

Example 2.1

Example 2.2

1

2

3

4

Figure 2.5: Examples

Note that these two examples clearly show the properties of such asynchronous algorithm.In the Example 2.2, a weakly connected graph is considered: nodes 3 and indirectly 4 (via

3) deliver information to node 1 and indirectly 2 (via 1) but from nodes 1 and 2 there is noinformation returned to nodes 3 and 4. Nevertheless a consensus is found, clearly, “injected”from nodes 3 and 4 to nodes 1 and 2 (see Sec. 2.4.3 ahead).Some further important properties of the state-transition matrix Sim are summarized in

the following lemma.

Lemma 2.3. The state-transition matrix Sim defined in Eq. (2.49) has the following properties(

I− 1

N11⊤

)

Sim =

(

I− 1

N11⊤

)

Sim

(

I− 1

N11⊤

)

S⊤im

(

I− 1

N11⊤

)

=

(

I− 1

N11⊤

)

S⊤im

(

I− 1

N11⊤

)

.

Proof. The proof follows straightforwardly from the fact that the matrix Sim is row-stochastic.

Page 46: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

34 CHAPTER 2. CONVERGENCE

2.4.3 Convergence analysis based on state matrix S

While there is some general convergence results for networks with fixed synchronous up-dates [13], the results for a time-variant (switching) network following the update (Eq. (2.3))are more limited. Typically, results are known for convergence to the average consensus in themean or mean square sense (e.g., [63,66]). In [51] it was proposed to use the notion of the jointspectral radius of paracontracting matrices to compute the convergence rate.Let us now consider a chain of update events assuming that the transmission from node i to

nodem occurs with probability pi,m. For simplicity, let us assume that the various probabilitiesp1,2, . . . , pN,N−1 remain constant over time. After K updates, the following state vector exists:

x(K) =K∏

k=1

S(k)x(0), (2.52)

where the matrices S(k) ∈ Sim are selected randomly with probability pi,m. As mentionedbefore (Sec. 2.4.1), the order in which the matrices occur, plays an important role. As timeaverage and ensemble average lead to different results, it can be concluded that x(k) is anon-ergodic random process [67]. From here the convergence in the mean can be found

E[x(K)] =K∏

k=1

E[S(k)

]x(0) =

(∑

i,m

pi,mSim

︸ ︷︷ ︸

S

)K

x(0). (2.53)

It can be recognized that matrix S maintains its largest eigenvalue at one with the eigen-vector 1. After decomposing, S = QΣQ−1, with Σ containing the eigenvalues χi ordered fromlargest to smallest, with limK→∞ΣK = Σ, a matrix with a single one element on its top leftcorner, it follows that10

limK→∞

E [x(K)] = QΣQ−1x(0) , x. (2.54)

Due to the independence of the events, the expectation appears in the various product terms.Thus, given the various αi,m and corresponding probabilities pi,m the sum term can be com-puted and the eigenvalues analyzed. Certainly, one eigenvalue remains associated with its eigen-vector 1. The remaining eigenvalues and corresponding eigenvectors may change though. Forthe above Example 2.2, with all pi,m = 1/5 and α = 0.5 the eigenvalues χi = 0.74, 0.8, 0.96, 1are obtained. Note, however, that in this example

QΣQ−1 =

0 0 1 1

0 0 1 1

0 0 1 1

0 0 1 1

,

thus obtaining the average of the third and fourth node, not including the information of thefirst two nodes. Whether the result is the desired average of all nodes’ state information,strongly depends on the structure of the WSN as it directly influences Q. As soon as the

10Note that matrix Q is, in general, not equal to matrix U (cf. Eq. (2.5)).

Page 47: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.4. ANALYSIS OF CONVERGENCE: ALGEBRAIC APPROACH 35

WSN has symmetric bidirectional connections, more precisely if pi,m(1−αi,m)=pm,i(1−αm,i),Q becomes unitary and the average consensus is found in the mean (independent of the choiceof α). This allows for summarizing the result:

Theorem 2.2. The asynchronous consensus algorithm as described by Eq. (2.49) convergesto any consensus in the mean if the conditions of Lemma 2.2 are satisfied. If furthermore

pi,m(1 − αi,m) = pm,i(1 − αm,i) the average consensus 1/Nx(0)⊤1 is obtained asymptoticallyin the mean.

Note that the last condition causes a certain symmetry so that the state-transition matrixin the mean becomes a stochastic matrix, in particular that a left eigenvector 1 exists. Oncesuch symmetry is guaranteed, many more explicit statements on convergence and convergencetime are possible [35, 63].

Practically speaking, the convergence in the mean has not much impact. As it is an ensembleaverage, it just means the averaging over an ensemble of runs (or even graphs) results in thedesired mean. Due to the, in general, non-ergodic behaviour, each realization may be far offfrom the desired consensus.

This is the reason to analyze better in terms of convergence in the mean square sense.In [66] such analysis has been applied to the standard average consensus algorithm obtainedby ensemble averaging over a set of graphs representing switching networks. As the obtainedMean Square Error (MSE) value is not zero, it shows that for such networks the individualresult may be off from the desired average consensus. In [66] the MSE with respect to a fixedconsensus, say c1, was computed, i.e.,

MSEf (k) = E[

‖x(k)− c1‖22]

.

Note, however, that the value c is not fixed for all runs. As the runs are not ergodic, asymptot-ically the outcome will be different every time and in consequence a different c appears for eachrun. Thus, the previous metric MSEf (k) is not describing the behaviour rightfully in terms ofany consensus, but explains the deviation with respect to a fixed (expected) consensus.

Now let me also compute the MSE(k) for a consensus being the average of all values attime instant k, i.e.,

ck =1⊤x(k)N

, x(k). (2.55)

For the MSE it holds:

MSE(k) = E[(x(k)− x(k)1)⊤(x(k)− x(k)1)

]= E

[x(k)⊤

(I− 1

N 11⊤)x(k)

]

= E[

x(k − 1)⊤ S(k)⊤(

I− 1

N11⊤

)

S(k)︸ ︷︷ ︸

B

x(k − 1)]

= E[(x(k − 1)− x(k − 1)1)⊤B(x(k − 1)− x(k − 1)1)

].

(2.56)

The last term is obtained using the properties of Lemma 2.3. Thus, I just proved the following.

Page 48: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

36 CHAPTER 2. CONVERGENCE

Theorem 2.3. A WSN with asynchronous updates according to Eq. (2.48), satisfying thenecessary condition of Lemma 2.2 converges to a consensus in the mean square sense iff all

eigenvalues of

E[B] =∑

i,m

pi,m S⊤im

(

I− 1

N11⊤

)

Sim

︸ ︷︷ ︸

B

(2.57)

are smaller than one.11

In Example 2.2 the largest eigenvalue is 0.95 while in Example 2.1 the largest eigenvalue is 1and thus no consensus, in the mean-square sense, is reached.For the case of a fixed value c, i.e., the true average,

c =1⊤x(0)N

, x(0) (2.58)

the MSE takes the form

MSEf (k)=E[

‖x(k)− x(0)1‖22]

=E[

‖(x(k)− x(k)1)+(x(k)− x(0))1‖22]

=E[

‖x(k)− x(k)1‖22]

+

+E[x(k)⊤x(k)1− x(k)⊤x(0)1− 2x(k)2N + 2x(k)x(0)N + 1⊤x(k)x(k)− 1⊤x(0)x(k)

]+

+E[

N |x(k)− x(0)|2]

=E[

‖x(k)− x(k)1‖22]

+ E[

N |x(k)− x(0)|2]

=MSE(k) +NE[

|x(k)− x(0)|2]

. (2.59)

It follows straightforwardly that for the steady state it holds that

limk→∞

E[

‖x(k)− x(0)1‖22]

= limk→∞

(

MSE(k) +NE[

|x(k)− x(0)|2])

= limk→∞

NE[|x(k)− x(0)|2

]

=N

(

limk→∞

E[x(k)2

]− x(0)2

)

.

Thus, the MSEf (k) grows linearly with the size of the network and, in general, is deter-mined by the limit limk→∞E

[x(k)2

]. Obviously, in case when limk→∞E

[x(k)2

]= x(0)2,

the MSEf (k) asymptotically goes to zero.

2.4.4 When a node fails

In case of a node failure, several scenarios are of interest.

1. Assume that node m is dead, thus not transmitting any more. In this case the desiredsolution, that is the average of N values, has to shrink to the new average of N − 1

values. The solution space then shrinks by one dimension. Thus, one expects the WSNto adapt to a new solution. However, if the condition of Lemma 2.2 becomes violated,

11For the sake of completeness, a similar result can be found in [68], where the rate of con-vergence of the so-called mean-square deviation (cf. Eq. (2.56)) is determined by the eigenvalueλmax(E[W(k)⊤

(I− 1

N11⊤

)W(k)]), where W(k) is a random weight matrix, such that, in every iteration k

the communication network changes randomly (neighbors are picked randomly), but is different from matrixS(k) (cf. Eq. (2.57)) (Note thatW(k) denotes a random matrix,whileW(k) in Sec. 2.1 is any time-varying, notnecessarily random.).

Page 49: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.5. ALMOST SURE CONVERGENCE: RELAXED PROJECTION MAPPINGS 37

a consensus cannot be reached any more. This allows to define a robustness condition bycounting how many nodes can die out at minimum before the condition is violated.

2. Assume that node m is not updating internally but sending a fixed value, say xf , allthe time it is activated to transmit. This would change the solution space Sm but notnecessarily the intersection of all solution spaces, as the eigenvector 1 remains in Sm.A consensus is indeed reached but instead of the average consensus it will be xf . Thisshows how simple it is to make a WSN to misbehave. Simply by planting a “bad” node,it would dominate the entire WSN and solely define its result.

3. Assume that node m is misbehaving by sending wrong values, for example, it passesthe received values without any change. This would also change the solution spacesignificantly. As long as the condition of Lemma 2.2 is not violated this case may notlead to severe problems unless the node turns into the previous case.

2.5 Almost sure convergence of consensus algorithms by relaxed

projection mappings

Following the theory introduced in the previous Sec. 2.4, by regarding the state-transitionmatrices (Eq. (2.49)) as projections onto some spaces, I extend now the convergence analysisusing the so-called relaxed projection mapping approach [22].

2.5.1 Concept of projections

Projection algorithms have recently attracted a lot of attention [69–73]. Consider the so-calledRelaxed Projection Mapping Algorithm (RPMA):

x(k) = x(k − 1) + µk (Pk(x(k − 1))− x(k − 1)) , (2.60)

that, in general, iteratively maps a vector in a Hilbert space x(k) ∈ H onto a closed linearsubspace Pk defined by the projection Pk with 0 < µk < 2 [74]. Moreover, following theterminology of RPMA, if 0< µk < 1, the projection is underrelaxed, and if 1 < µk < 2, theprojection is overrelaxed. For case µk = 1, x(k−1) is directly projected onto x(k) (see Fig. 2.6).The notation here suggests that the projection Pk is any function mapping [74], i.e.,

Pk : H → Pk. Nevertheless, later on (see Sec. 2.5.2), I assume Pk to be a matrix, i.e., Pk.It is known that if a finite set of projections, say Pi, i ∈ 1, 2, . . . ,M, with corresponding

subspaces Pi is applied, typically in a fixed order, the algorithm converges to a solution lyingin the intersection of all such subspaces [75, 76] (Fig. 2.6), i.e.,

limk→∞

x(k) = x⋆ ∈ P⋆ :=⋂

i

Pi.

I am, however, interested in the behaviour of the algorithm if the projections Pi are randomlychosen. To describe this, I will prove the following theorem.

Theorem 2.4. If the projections Pi onto linear subspaces Pi are from a finite set and selected

randomly each with a probability pi > 0, then the steady-state solution of the relaxed projection

algorithm (2.60) lies with probability one in the intersection of the subspaces ∩iPi.

Page 50: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

38 CHAPTER 2. CONVERGENCE

x(0)

P1x(0)

P2P1x(0)

(P2P1)3x(0)

µP1x(0)

µP2P1x(0)(µP2P1)2x(0)

P1

P2

P⋆

Figure 2.6: Relaxed projection mapping principle (red – overrelaxed case).

Proof. To revise the theory, it was shown by von Neumann [75] and by Práger [77] that ifthere is a set of M = 2 projections Pi, i ∈ 1, 2, the alternating product of the projectionsconverges strongly12, i.e.,

limk→∞

(P1P2)k → P ⋆ , P ⋆x ∈

i

Pi : ∀x ∈ H.

The proof was extended to a cyclic iteration of a finite number M of projections by Halperin[76], i.e.,

limk→∞

(P1P2 . . . PM )k → P ⋆ , P ⋆x ∈⋂

i

Pi : ∀x ∈ H.

Amemiya and Ando [78] later showed that a sequence x(k) (in a real Hilbert space H)defined by general iterations x(k) = Pkx(k − 1) converges weakly for any sequence of Pk.Bauschke [79] provided a proof of strong convergence as long as all projections from a finiteset are selected infinitely often times. However, a proof of strong convergence for the case ofrandomly selected projections Pk ∈ Pi, i ∈ 1, 2, . . . ,M, remains an open issue13.Let us now randomly (independently) select the projection Pi with probability pi > 0 and

construct L sufficiently long chains C(m) of Nm> M elements (m = 1, 2, . . . , L), i.e.,

C(m) =

Nm∏

k=1

Pmk, (2.61)

with mk ∈ 1, 2, . . . ,M. The probability that in the m-th chain of Nm elements there is atleast one specific projection Pl, l ∈ 1, 2, . . . ,M, is given by

P(exists at least one Pl) = 1− (1− pl)Nm .

12A sequence of points x(k) in a Hilbert space H converges weakly to a point x⋆ ∈ H, iflimk→∞

〈x(k),y〉 = 〈x⋆,y〉, ∀y ∈ H.

A sequence of points x(k) in a Hilbert space H converges strongly to a point x⋆∈H, if limk→∞

‖x(k)− x⋆‖ = 0.

13For completeness, I note that a novel insight for proving the strong convergence in the particular case oforthogonal projections (not considered here) was proposed by Baillon and Bruck [80,81].

Page 51: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.5. ALMOST SURE CONVERGENCE: RELAXED PROJECTION MAPPINGS 39

The probability that all M projections appear at least once in the m-th chain is given by

P(exists at least one Pl; l = 1, 2,. . . ,M)= 1−M∑

l=1

(1− pl)Nm .

If the number of elements of the m-th chain Nm grows, the probability that all projectionsoccur at least once tends to one, i.e.,

limNm→∞

P(exists at least one Pl; l = 1, 2, . . . ,M) = 1. (2.62)

Applying such chains (2.61) many times, thus assuring that the given projection appeared inthe chain, one finds that

limL→∞

L∏

m=1

limNm→∞

Nm∏

k=1

Pmk→ P ⋆ := P ⋆x ∈

i

Pi : ∀x ∈ H. (2.63)

Thus, with probability one (almost sure) the solution of the RPMA belongs to the intersectionof the subspaces.

2.5.2 Application on various consensus algorithms

I now discuss the consequences of Theorem 2.4 in the case of three simple consensus algorithms.

Asynchronous Approach

I will first consider asynchronous updates in which every time an information is received inthe neighborhood of a transmitting node i (see Sec. 2.4.1). Let us recall this algorithm in thestate-transition matrix formalism (Eq. (2.49)):

x(k) =

1 0 · · · 0

0. . . 0 · · · ...

0 0 1 0 · · ·

0 · · · 0 αi 0 1− αi 0... · · · 0 1 0

...

0 · · · 0. . . 0

0 · · · 0 1

x(k − 1)

= S(A)im x(k − 1). (2.64)

Note that I assume individual values of αi for each node and the values could even vary overtime (events)14. Such variations on the updates clearly include link failures [64], for which αi

simply turns to one for a certain time, and quantization effects [20]. To each of the existingtransition matrices I associate a probability of its occurence pij such that

ij pij =1. As thetransition matrices occur randomly, the typical approach to analyze such network is to studythe convergence in the mean and mean square sense (see Sec. 2.4.3).

14As in the Sec. 2.4.1, I abuse the notation and drop the time index k in S(·)ij although the selected pair

(i, j) ∈ E(k) is different for a different time.

Page 52: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

40 CHAPTER 2. CONVERGENCE

I now describe the learning (transient) process of the network in terms of the relaxedprojection mapping (2.60). For this I define a projection matrix Tij associated to the transitionmatrix Sij such that Sij = I− (1− αi)Tij . The projections Pk in Eq. (2.60) can be identifiedto be Pk(x) ≡ Pijx = (I−Tij)x and µi = 1−αi (Note that the values −1 > αi > 0 correspondto overrelaxed case, while 0 < αi < 1 correspond to underrelaxed case; Sec. 2.5.1).Thus,

x(k) = x(k − 1) + µk (Pk(x(k − 1))− x(k − 1))

= x(k − 1) + µk

(

(I−Tij)︸ ︷︷ ︸

Pij

x(k − 1)− x(k − 1))

= (I− µkTij)x(k − 1).

and from Theorem 2.4 the convergence is guaranteed.

Lemma 2.4. The asynchronous update algorithm (2.64) converges almost surely to a consen-sus, i.e., limk→∞ x(k) → c1, if 0 < αi < 1, the probabilities of the established connections

appear with pij > 0 and the intersection of all transition matrices contains only span1.

Conjecture 2.1 . Under some circumstances and conditions on the probability distribution

and the network topology, the algorithm (2.64) converges to a consensus also in the overrelaxedcase, i.e. if −1<αmin≤αi<0, as expected from the conditions on general relaxed projection

mapping algorithm.

Note that, Conjecture 2.1 is supported, for example, also by the convex weight selectionproposed in [49], where some weights are considered to have negative values.

Asynchronous Bidirectional Approach

If simultaneous bidirectional connections between nodes (pair-wise) are allowed, the learningis assumed to be more efficient. In this case the transition matrix takes on the form

x(k) =

1 0 · · · 0

0. . . 0 · · · ...

0 0 1 0 · · ·

0 · · · 0 αi 0 1− αi 0 0... · · · 0 1 0

...

0 1− αi 0 αi 0

0 · · ·. . . 0

0 · · · 0 1

x(k − 1)

= S(B)im x(k − 1). (2.65)

Similarly to the case before, one can define S(B)ij = I− 2(1− αi)Tij , where Tij describes a

projection matrix again. Then, it can be noticed that µi = 2(1− αi) and thus:

Lemma 2.5. The asynchronous bidirectional update algorithm (2.65) converges almost surelyto a consensus, i.e., limk→∞ x(k) → c1, if 0 < αi < 1, the probabilities of the established

connections appear with pij > 0 and the intersection of all transition matrices contains only

span1.Moreover, since the matrices S

(B)ij are doubly stochastic, they satisfy the so-called conser-

vation property [82], and thus c = 1N

i xi(0) (average of the initial values).

Page 53: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.5. ALMOST SURE CONVERGENCE: RELAXED PROJECTION MAPPINGS 41

Synchronous Approach

I finally extend my analysis to the case in which each node interacts in one time instance withall its neighbors nodes and updates its weights simultaneously. Let us assume that node i hasthree neighbors n,m, l. Its transition matrix would then exhibit the following form:

x(k) =

1 0 · · · 0

0. . . 0 · · · ...

0 0 1 0 · · ·

0 · · · βim αi 0 βin βil... · · · 0 1 0

...

0 0. . . 0

0 · · · 0 1

x(k − 1)

= S(S)im x(k − 1). (2.66)

for which I assume that βim, βin, βil > 0, and satisfying the condition on having the ma-trix S

(S)im row-stochastic, the condition

j βij = 1 − αi must also hold. Therefore, also for

−1 < αi < 0, all elements of S(S)ij are in absolute value smaller than 1, and thus, the contrac-

tion property [82] holds.

Lemma 2.6. The synchronous update algorithm (2.66) converges almost surely to a consensus,i.e., limk→∞ x(k) → c1, if −1<αi<1, the probabilities of the established connections appear

with pij > 0 and the intersection of all transition matrices contains only span1.

2.5.3 Simulations

I now show simulation results for Algorithms (2.64), (2.65), and (2.66), as functions of themixing parameter αi. I assume a static WSN of random geometric topology with N = 40 nodes.Transmissions are selected randomly with uniform distribution. The results show averagedmean-squared errors (MSE) of randomly initialized values x(0) (with uniform distribution)across the nodes (disagreement σ2(k)). Note that the average of the MSE is performed through100 averaging cycles with different initializations x(0).

It can be observed in Fig. 2.7a that in some cases Algorithm (2.64) converges to a consensusalso in the overrelaxed case (cf., αi = −0.2), as expected from Conjecture 2.1, which is basedon the conditions from the RPMA. The conditions on the network topology and initial valuesfor which this conjecture holds, remain, however, an open issue. Fig. 2.7b shows, as expectedin Lemma 2.5, that for αi < 0 the Algorithm (2.65) diverges. Next, since Algorithm (2.66)satisfies the contraction property [82], even for αi < 0, for all S(S)

ij , convergence is guaranteedfor all −1<αi<1 (see Fig. 2.7c), which is also in agreement with conditions from the RPMA(0 < µk < 2). Furthermore, notice that since Algorithm (2.66) mixes the information fromneighbors most rapidly (in one iteration with all neighbors), the convergence rate is the highestin comparison to the other algorithms.

Note that the error floors at each algorithm are caused only by the numerical computationaccuracy.

Page 54: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

42 CHAPTER 2. CONVERGENCE

αi = −0.2

0 500 1 000 1 500 2 000 2 500 3 000

αi=−0.8αi=−0.6αi=−0.4αi=−0.2αi=0.2αi=0.4αi=0.6αi=0.8

1040

1020

100

10−20

10−40

σ2(k)

k

(a) Averaged MSE σ2(k) for various αi (Alg. (2.64)).

0 500 1 000 1 500 2 000 2 500 3 000

αi=−0.8αi=−0.6αi=−0.4αi=−0.2αi=0.2αi=0.4αi=0.6αi=0.8

1040

1020

100

10−20

10−40

σ2(k)

k

(b) Averaged MSE σ2(k) for various αi (Alg. (2.65)).

0 500 1 000 1 500 2 000 2 500 3 000

αi=−0.8

αi=−0.6

αi=−0.4

αi=−0.2

αi=0.2

αi=0.4

αi=0.6

αi=0.8

105

100

10−5

10−10

10−15

10−20

10−25

10−30

σ2(k)

k

(c) Averaged MSE σ2(k) for various αi (Alg. (2.66)).

Figure 2.7: Averaged MSE (disagreement) among the nodes through 100 averaging cycles,geometric network topology with N = 40.

Page 55: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.6. DYNAMIC AVERAGE CONSENSUS 43

2.6 Dynamic average consensus algorithm

In the previous chapters the convergence of various types of distributed average consensusalgorithms was analyzed. Nevertheless, all mentioned consensus algorithms were designed tocompute some agreement of static, invariant values. This may be useful for finding an average,sum, or a maximum value of some constant state, e.g., average temperature, size of an arbitrarystate, number of nodes in a network.

However, there exist many applications which depend on a consensus (average) ofa varying state, e.g., tracking a moving target by sensors, measuring the average of a varyingtemperature field. Therefore, it has been of interest to design an algorithm which is capableof “tracking” the consensus in time (over iterations). A solution to this problem represent theso-called dynamic average consensus algorithms [23,83–87].

In this chapter I review the algorithm, analyze its properties, and provide a proof of itsconvergence as it was proposed in [24].

2.6.1 Definition and proof of convergence

In contrast to the “static” average consensus algorithm, see Sec. 2.1, which computes the mean(average) of constant initial values, the dynamic consensus algorithm is able to track the meanof a time-varying signal (see Fig. 2.8). According to [88], where a continuous version of thedynamic average consensus has been defined, in [24] an analogous discrete dynamic averageconsensus algorithm was proposed.

0

0 10 20 30 40 50 60 70 80 90

100

100

−100

−200

200

300

400

500

600

700

iterations k

xi(k),s i(k);i=

1,2,...,5

Figure 2.8: Typical behaviour of dynamic average consensus. Dotted lines are the trackedsignals si(k) on each node, solid lines are the internal “mean” signals xi(k) on each node.

Page 56: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

44 CHAPTER 2. CONVERGENCE

Theorem 2.5. For a (strongly) connected network, assuming a bounded input signal

s(k) = (s1(k), . . . , si(k), . . . , sN (k))⊤ with decaying time differences s(k) = s(k) − s(k − 1),

the internal state x(k) = (x1(k), . . . , xi(k), . . . , xN (k)) following the update equation

x(k) = W [x(k − 1) +s(k)] , (2.67)

with x(0) = s(0), converges to the instant average s(k) of the input signal s(k), i.e.,

limk→∞

(x(k)− s(k)) = 0.

Note that matrixW is the same as in the case of “static” consensus algorithm, see Sec. 2.1.

This can be formalized as follows:

Algorithm 2.2: Distributed Dynamic Average Consensus Algorithm

At time k = 0, each node i (i = 1, 2, . . . , N), measures a scalar value si(0).For time k = 1, 2, . . . , each node i

1. measures a value si(k) and makes a difference with previously measured value si(k − 1),i.e., ∆si(k) = si(k)− si(k − 1),

2. combines the measurement difference∆si(k) with previously stored internal state xi(k−1),i.e., ψi(k) = xi(k − 1) + ∆si(k).

3. The temporary internal state ψi(k) is broadcast to all neighbors j ∈ Ni.

4. The internal state at node i is then calculated from the local ψi(k) and neighboringtemporary states ψjj∈Ni

as:

xi(k) = [W]i,iψi(k) +∑

j∈Ni

[W]i,jψj(k),

where the weightsW are selected as in Sec. 2.1.2.

Proof. Following the reasoning from [88], let us define the error function:

e(k) = x(k)− 1

N11⊤s(k). (2.68)

Applying the Z-transform, Eq. (2.68) can be transformed in the Z-domain asE(z) =X(z)− 1/N11⊤S(z). Thus, transforming (2.67), it follows that

E(z)=W(X(z)z−1+x(0) + S(z)− z−1S(z)− s(0))− 1N 11⊤S(z)

=(I− z−1W

)−1W(I− z−1I

)S(z)− 1

N 11⊤S(z).

From the fact that the maximal eigenvalue of W, λ1 = 1, with eigenvector u1 = 1/√N1

(see Sec. 2.1), the error transfer function takes the form

Hes(z) = E(z)S(z) =

(I− z−1W

)−1W(I− z−1I

)− 1

N 11⊤

= U

0 0 . . .

0λ2(1−z−1)1−λ2z

−1 . . .

. . .

U−1,

where U contains the eigenvectors of W. Since |λi| < 1, for all i = 2,. . . ,N , all poles of the

Page 57: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.6. DYNAMIC AVERAGE CONSENSUS 45

transfer function are inside the unit circle, the transfer function is stable. Applying the finalvalue theorem15, for any bounded input signal s(k) (with at most one pole at 1 in the Z-domain)it follows that

limz→1

(1− z−1)E(z) = limz→1

U

0λ2(1−z−1)1−z−1λ2

. . .

U−1

(1− z−1

)S(z)

︸ ︷︷ ︸

∆S(z)

.

Knowing Z∆S(z) = s(k) − s(k − 1) ≡ ∆s(k), I assume that ∆s(k) ≤ e−ck; c > 0. TakingZ-transform ∆S(z) = 1

1−e−cz−1 , the pole is again inside the unit circle, and thus, the limitexists, i.e., lim

z→1(1− z−1)E(z) = 0. Thus, the error e(k)→0 as k→∞.

2.6.2 Bounds on convergence time and rate

Although theoretical bounds on convergence time and rate of static consensus are well studied[13,49,50,82], theoretical bounds for various types of dynamic consensus algorithms are studiedonly rarely [83,87]. However, these bounds are not directly applicable for this algorithm, sincethey consider slightly different types of dynamic consensus algorithms than [88], upon whichthis algorithm is based.Therefore, in this section I provide new definitions of the convergence time and rate for the

dynamic consensus algorithms as defined by Eq. (2.67) as well as theorems and proofs for thebounds. Note that these bounds crucially depend on the tracked signal s(k).

Convergence time of the dynamic consensus algorithm

The convergence time of the static average consensus algorithm can be defined [49,50] as

T(s)N (ǫ) , min

τ :‖x(k)− x∗‖∞‖x(0)− x∗‖∞

≤ ǫ, ∀k > τ, ∀x(0) 6= c1, c ∈ R

,

where x∗ = limk→∞

x(k).

In a similar way I define the convergence time of the dynamic consensus algorithm.

Definition 2.4 (Convergence time). Given matrix W with eigenvalues 1 = λmax > λ2 >

· · · > λmin > −1; then the convergence time of the distributed dynamic average consensus

algorithm (2.67) is defined as

T(d)N (ǫ) , min

τ : ‖x(k)− s(k)‖2 ≤ ǫ, ∀k > τ ; ǫ > 0

.

Theorem 2.6. Assuming a bounded input signal, such that ‖s(k)‖2 ≤ a, ∀k, then for theerror it holds that

‖x(k)− s(k)‖2 ≤ a

(

λk2λ2 + 1

λ2 − 1+

2λ21− λ2

)

, (2.69)

where s(k) = 1N 11⊤s(k) and λ2 is the second largest eigenvalue of matrixW.

15The final value theorem [89] in the Z-domain: limk→∞

e(k)= limz→1

(1− z−1)E(z).

Page 58: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

46 CHAPTER 2. CONVERGENCE

λ2 = 0.95λ2 = 0.9λ2 = 0.7λ2 = 0.5λ2 = 0.3

5

15

25

35

00

10

20

20

30

40

40 60 80 100iterations k

bounds(2.69)and(2.70)

(a) Bounds on the error for a constant bounded in-put signals, Eq. (2.69) (solid line) and Eq. (2.70)(dot-dashed line).

λ2 = 0.95λ2 = 0.9λ2 = 0.7λ2 = 0.5λ2 = 0.3

5

1

2

3

4

6

7

8

9

00

10

20 40 60 80 100iterations k

bounds(2.72)and(2.73)

(b) Bounds on the error for a decaying bounded in-put signal, Eq. (2.72) (solid line) and Eq. (2.73)(dot-dashed line).

Figure 2.9: Bounds on the errors for various λ2; parameters: a = 1, b = e1, c = 0.1.

Corollary 1. If matrixW is positive (semi-)definite (λmin ≥ 0),

‖x(k)− s(k)‖2 ≤ a

(λk2

λ2 − 1+

2λ2 − λ221− λ2

)

. (2.70)

Note that it can be observed from (2.69) that,

limk→∞

‖x(k)− s(k)‖2 ≤ a2λ2

1− λ2. (2.71)

This means that the error remains bounded as time k grows and is proportionally influencedby the maximum amplitude a of the input signal and the topology characterized by λ2 (seeFig. 2.9a). Nevertheless, it must be noted that e.g., in case of constant input signal s(k) = a,from Theorem 2.5 it follows that e(k) → 0. Thus, the bound Eq. (2.69) is very loose.

In case that ‖s(k)‖2 ≤ ab−ck, ∀k (a, b, c > 0), the following upper bounds can be obtained.

Theorem 2.7. Assuming a bounded decaying input signal, such that ‖s(k)‖2 ≤ ab−ck, ∀k;(a, b, c > 0), then for the error it holds that

‖x(k)− s(k)‖2 ≤ ab−ck

(

(λ2bc)k

λ2 + 1

λ2bc − 1+λ2b

c + λ21− λ2bc

)

, (2.72)

where s(k) = 1N 11⊤s(k) and λ2 is the second largest eigenvalue of matrixW.

Corollary 2. If matrixW is positive (semi-)definite (λmin ≥ 0),

‖x(k)− s(k)‖2 ≤ ab−ck

((λ2b

c)k

λ2bc − 1+λ2(b

c − λ2bc + 1)

1− λ2bc

)

. (2.73)

Unlike Eq. (2.71), in this case limk→∞

‖x(k)− s(k)‖2 = 0 (cf. Theorem 2.5) (see Fig. 2.9b).

Page 59: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.6. DYNAMIC AVERAGE CONSENSUS 47

Proof. Let us rewrite x(k) according to (2.67) and s(k) ≡ 1N 11⊤s(k). Applying the triangle

inequality it can be shown that (recall that: x(0) = s(0)):

‖x(k)− s(k)‖2 =

∥∥∥∥∥∥

k−1∑

j=1

(Wk−j+1 −Wk−j)s(j) +Ws(k)− 1

N11⊤s(k)

∥∥∥∥∥∥2

∥∥∥∥∥∥

k−1∑

j=1

(Wk−j+1 −Wk−j)s(j)

∥∥∥∥∥∥2

+

∥∥∥∥Ws(k)− 1

N11⊤s(k)

∥∥∥∥2

≤k−1∑

j=1

∥∥∥W

k−j+1 −Wk−j∥∥∥2‖s(j)‖2 +

∥∥∥∥W− 1

N11⊤∥∥∥∥2

‖s(k)‖2 .

Using eigendecomposition of the matrix W = UΛU−1, with Λ = diag(1, λ2, . . . , λmin), andknowing the fact that for any unitary matrix ‖Ux‖2 = ‖x‖2 then

k−1∑

j=1

∥∥∥W

k−j+1 −Wk−j∥∥∥2‖s(j)‖2 +

∥∥∥∥W − 1

N11⊤∥∥∥∥2

‖s(k)‖2 =

k−1∑

j=1

∥∥∥Λ

k−j+1 −Λk−j∥∥∥2‖s(j)‖2 +

∥∥∥∥∥Λ−

(1 0 . . .

0 0 . . ....

. . .

)∥∥∥∥∥2

‖s(k)‖2=

=k−1∑

j=1

∥∥∥∥∥∥∥∥

0

λk−j2 (λ2 − 1)

. . .

λk−jmin (λmin − 1)

∥∥∥∥∥∥∥∥2

‖s(j)‖2 +

∥∥∥∥∥∥∥∥

0

λ2

. . .

λmin

∥∥∥∥∥∥∥∥2

‖s(k)‖2 ≤

≤ |λmin − 1|k−1∑

i=1

λi2 ‖s(k − i)‖2 + λ2 ‖s(k)‖2 . (2.74)

Proof of Theorem 2.6. Assuming that ‖s(k)‖2≤a; ∀k, a > 0, from Eq. (2.74) one obtains

|λmin − 1|k−1∑

i=1

λi2 ‖s(k − i)‖2 + λ2 ‖s(k)‖2 ≤ a

(

|λmin − 1| λk2 − λ2λ2 − 1

+ λ2

)

. (2.75)

In caseW is not positive definite, i.e., −1 < λmin < 0, but |λ2| ≥ |λmin|, then |λmin − 1| ≤(1 + λ2), which leads to the upper bound

‖x(k)− s(k)‖2 ≤ a

(

λk2λ2 + 1

λ2 − 1+

2λ21− λ2

)

. (2.76)

In caseW is positive definite, I use the fact that |λmin − 1|≤ 1, and thus,

‖x(k)− s(k)‖2 ≤ a

(λk2

λ2 − 1+

2λ2 − λ221− λ2

)

.

Proof of Theorem 2.7. Assuming that ‖s(k)‖2 ≤ab−ck; ∀k; a, b, c > 0, I obtain from Eq. (2.74)

|λmin − 1|k−1∑

i=1

λi2 ‖s(k − i)‖2 + λ2 ‖s(k)‖2 ≤ ab−ck

(

|λmin − 1| (λ2bc)k − λ2b

c

λ2bc − 1+ λ2

)

.

Page 60: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

48 CHAPTER 2. CONVERGENCE

In the same manner as in the proof of Theorem 2.6 (above), for a non-positive definitematrixW it can be shown that

‖x(k)− s(k)‖2 ≤ ab−ck

(

(λ2bc)k

λ2 + 1

λ2bc − 1+λ2b

c + λ21− λ2bc

)

,

and for a positive (semi-)definite matrixW

‖x(k)− s(k)‖2 ≤ ab−ck

((λ2b

c)k

λ2bc − 1+λ2(b

c + 1− λ2bc)

1− λ2bc

)

.

This concludes the proofs of Theorem 2.6 and Theorem 2.7.

Note that to obtain the bounds on the convergence time as defined by Def. 2.4, one canfurther bound Eq. (2.72) by some ǫ>0 and find the appropriate τ . However, finding such boundcorresponds to finding roots of an exponential polynomial16, and therefore such solution, tomy best knowledge, can be obtained only numerically.Similarly to the bounds on convergence time, the bounds on convergence rate are defined.

Convergence rate of the dynamic consensus algorithm

Similarly to the definition of the convergence rate for a static average consensus algorithm[49,50], i.e.,

ρ(s) , supx(0) 6= c1

c ∈ R

limk→∞

(‖x(k)− x∗‖∞‖x(0)− x∗‖∞

)1/k

, (2.77)

where x∗ = limk→∞

x(k), the convergence rate for the dynamic average consensus algorithm can

be defined as follows17.

Definition 2.5 (Convergence rate). Given matrix W with eigenvalues 1 = λmax > λ2 >

· · · > λmin > −1, then the convergence rate of the distributed dynamic average consensus

algorithm (2.67) is defined as

ρ(d) , limk→∞

(‖x(k)− s(k)‖2)1/k . (2.78)

Theorem 2.8. Assuming a bounded input signal, such that ‖s(k)‖2 ≤ a, ∀k, then for theconvergence rate it holds that

ρ(d) = 1. (2.79)

Theorem 2.9. Assuming a bounded decaying input signal, such that ‖s(k)‖2 ≤ ab−ck, ∀k;(a, b, c > 0), then for the convergence rate it holds that

ρ(d) =

b−c if 0 < |λ2bc| < 1

λ2 if |λ2bc| > 1. (2.80)

16Exponential polynomial of type: c1xk1 + c2x

k2 < ǫ. Literature regarding finding the roots of exponential

polynomials include [90,91].17It is known for the “static” consensus algorithm that ρ(s) = λ2 (see Sec. 2.1.1).

Page 61: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.6. DYNAMIC AVERAGE CONSENSUS 49

Proof of Theorem 2.8 and Theorem 2.9. The proofs follow from the proofs of the convergencetime and computing the given limit (2.78), i.e.,.

ρ(d) , limk→∞

(‖x(k)− s(k)‖2)1/k

= limk→∞

(

a

(

λk2λ2 + 1

λ2 − 1+

2λ21− λ2

))1/k

= limk→∞

e1klog(λk

2λ2+1λ2−1

+2λ21−λ2

)= e

0 log2λ21−λ2 = e0 = 1.

and respectively.

ρ(d) , limk→∞

(‖x(k)− s(k)‖2)1/k

= limk→∞

(

ab−ck

(

(λ2bc)k

λ2 + 1

λ2bc − 1+λ2b

c + λ21− λ2bc

))1/k

= b−c limk→∞

elog

(

(λ2bc)k

λ2+1λ2b

c−1+

λ2bc+λ2

1−λ2bc

)

k ,

and having two cases:

(1) |λ2bc| < 1: ρ(d) = b−ce0 log

λ2bc+λ2

1−λ2bc = b−c,

(2) |λ2bc| > 1: ρ(d) = b−c limk→∞

elog

(

(λ2bc)k

λ2+1λ2b

c−1+

λ2bc+λ2

1−λ2bc

)

k

l’Hospital’srule= b−c lim

k→∞e

log(λ2bc)

1−λ2b

c+λ2λ2+1

(λ2bc)−k

= b−celog(λ2bc) = λ2.

This concludes the proofs.

2.6.3 Dynamic consensus algorithm with memory

I here now present a slightly different dynamic consensus algorithm which takes into accountmore preceding values of the input signal [23] and allows more consensus iterations within onetime step k. As in the previous section, the goal is to track a time-varying sum (average) ofinput signals, i.e., s(k) =

∑Ni=1 si(k) (s(k) =

1N s(k)).

As before, let xi(k) denote a time-dependent internal state at sensor i, initialized withxi(0) = si(0). At a given time k ≥ 1, a temporary internal state ψ(0)

i (k) is first calculated ateach sensor i from the previous state xi(k− 1) and the current and past inputs si(k − l)L−1

l=0

according to

ψ(0)i (k) = µxi(k − 1) + (1−µ)

L−1∑

l=0

ωl si(k − l) . (2.81)

Here, ωl are temporal weights (or filter coefficients) and µ ∈ [0, 1] is a (global) tuning parameterthat controls the contribution of the previous internal state xi(k − 1) to ψ(0)

i (k). Note thatunlike the mixing parameters in “static” consensus algorithms (e.g., ε in Eq. (2.10)) whichlocally “mix” the received data at the sensor, the tuning parameter is known globally to allthe sensors and “tunes” only the contribution of the filtered input signal s(k) to the internalstate x(k). Next, I≥1 consensus iterations are performed at each sensor, i.e.,

ψ(m)i (k) = [W]i,iψ

(m−1)i (k) +

i′∈Ni

[W]i,i′ψ(m−1)i′ (k) , m = 1, 2, . . . , I, (2.82)

Page 62: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

50 CHAPTER 2. CONVERGENCE

where Ni denotes a predefined set of neighbors of sensor i and the [W]i,i′ are suitable weightsthat depend on the network topology (see Sec. 2.1.2). The new internal state is given byxi(k) = ψ

(I)i (k). Finally, the desired estimate of the sum si(k) is obtained at sensor i as

si(k) = Nxi(k). (Note that the number of sensors N is assumed known at each sensor;distributed methods for computing N are available, see Sec. 4.4 ahead.)Using (2.81), (2.82), and xi(k) = ψ

(I)i (k), the internal-state update relation for the entire

network (globally) follows the recursion (cf. Eq. (2.67))

x(k) = WI

[

µx(k − 1) + (1−µ)L−1∑

l=0

ωl s(k − l)

]

, (2.83)

with the vectors x(k) , (x1(k), . . . , xi(k), . . . xN (k))⊤ and s(k) , (s1(k), . . . , si(k), . . . , sN (k))⊤

and weight matrixW selected as mentioned in Sec. 2.1.2.This can be viewed as a generalization of the dynamic consensus algorithms proposed in the

previous section and as proposed for example in [85,87,88]. The main difference is the fact thatthe input sequence s(k) is filtered by a nonrecursive filter with coefficients ωl. Furthermore,the tuning parameter µ allows an adjustment of the influence of the previous internal statex(k−1) relative to that of the filtered input:

∑L−1l=0 ωls(k− l). Note that Eq. (2.67) is a special

case of Eq. (2.83), with parameters selected as: L = 2, coefficients ω0 = 1, ω1 = −1; numberof consensus iterations per time step I = 1; and the tuning parameter µ = 0.5 is included inthe weight matrixW.Procedures for choosing the optimal filter length L, filter coefficients ωl, tuning parameter µ,

and number of consensus iterations I, remain, however, an open issue.Note that this form of dynamic consensus algorithm will be used later in the Sec. 3.4 where

I also briefly discuss the influence of consensus iterations I on the performance of the givenalgorithm.

2.7 Conclusion

To summarize this chapter, I provide a chart showing the relations between the sections andbrief descriptions of their contents (solid arrows = section directly follows the previous section;dashed arrows = section is only related to the previous section):

General framework (Sec. 2.2): A general unifying formalism for describing any distributed al-gorithm. Simple algorithms solving a distributed averaging problem are comparedby a set of few parameters. Based on this framework, the impact of a noise onthe consensus algorithms is studied in the latter Sec. 2.3.

Quantized consensus algorithm (Sec. 2.3): Bounds on the quantization error of the steadystate of a quantized consensus algorithm derived from the framework.

Average consensus (Sec. 2.1): Classical convergence analysis of the average consensus al-gorithm based on spectral graph theory. The selection of mixing weights andthe definition of convergence speed follows straightforwardly from this theory.The average consensus algorithm is included as a special case of so-called linearhomogeneously distributed algorithm in the proposed framework.

Dynamic average consensus (Sec. 2.6): Extended average consensus algorithm for track-ing a time-varying average. Novel bounds on the convergence error and rateare provided as well as a generalization of the dynamic consensus algorithm.

Page 63: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

2.7. CONCLUSION 51

Algebraic approach (Sec. 2.4): Convergence in the mean and mean square sense of an asyn-chronous consensus algorithms is analyzed in terms of state-transition ma-trices, without any knowledge about spectral graph theory.

Relaxed projections (Sec. 2.5): Extending the analysis of state-transition matrices by map-ping them onto relaxed projections, bounds on the mixing weights ofasynchronous consensus algorithms are provided.

Although I provided only brief examples showing the capability of the formalism in Sec. 2.2,the framework might be used for comparing many different algorithms, based only on theset of few parameters (defining the relations between the transmission, internal computation,measurements). As the most potential usage of such formalism, I see also in some automated“mapper/optimizer” which could map any algorithm onto a network of nodes and optimizeit according to some criteria. This, however, is out of scope of my thesis. Yet, the tightbounds on the quantization error of a quantized consensus algorithm (Sec. 2.3), were derivedusing directly the framework. Also, not included in this thesis, the rewriting of the dynamicconsensus (Sec. 2.6) in the proposed formalism, could possibly help to analyze the impact ofquantization noise on a quantized dynamic consensus algorithm.The idea of state-transition matrices, on the other hand, is not as general as the previous

formalism, and includes only consensus algorithms. Nevertheless, unlike the classical analysis(Sec. 2.1), I was able to find the necessary conditions for convergence of the asynchronousconsensus algorithm (Sec. 2.4). However, the sufficient conditions for convergence remainan open issue. Also, an extension including the dynamic average consensus may lead to anasynchronous version of the algorithm.Furthermore, regarding the state-transition matrices as projections onto some linear spaces

(Sec. 2.5), I showed the bounds on the mixing parameters of the asynchronous consensusalgorithm.Finally, although the derived bounds on the convergence time and rate of the dynamic

average consensus algorithm are novel, they are too loose, and tighter bounds would be morepractical. Moreover, the generalized dynamic consensus algorithm with memory needs furtherinvestigation, especially in terms of finding optimal parameters. Building on the results fromthe theory of adaptive filters might potentially lead to the desired answers. However, for thetime being, this remains an issue for future research.

In the next chapters, I provide more sophisticated algorithms which are built on the simpleconsensus algorithms presented in this chapter.

Page 64: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 65: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Chapter 3

Likelihood Consensus

The algorithm presented in this chapter1 takes advantage of a simple consensus algorithm,which is able to distributively compute an average over the values in the network, as presentedin the previous chapter, to compute an approximation of a joint likelihood function. The jointlikelihood function is the key element in the statistical estimation problem which I will considerlater in the chapter. Due to the consensus algorithm, the computation requires communicationsonly with the neighboring nodes and operates without any routing protocol.

This chapter is organized as follows. In Sec. 3.1 I define the system model and brieflyreview Bayesian estimation considered throughout this chapter. In Sec. 3.2 I propose the wayto approximate the joint likelihood function suitable for distributed implementation. Laterin Sec. 3.3 I use this approximation in the Likelihood Consensus (LC) algorithm. The ex-tension of the likelihood consensus algorithm based on the dynamic consensus algorithm –Sequential Likelihood Consensus (SLC) – is provided in the consequent Sec. 3.4. As an appli-cation of the likelihood consensus and the sequential likelihood consensus algorithms a targettracking application is proposed in the Sec. 3.5.

Notation

For the sake of clarity and the number of used indices, I change the notation of time inbraces – “(k)” – to a subscript index – “k”, which is considered to be an “estimation time ofthe sequential Bayesian estimation” and may differ from the consensus “iteration time” (seeSec. 3.3 ahead). The sensor index will be denoted, as throughout this thesis, i. Also lateron, the notation f(zk,i|xk) suggests that xk is a random vector. However, for the likelihoodconsensus method to be presented in Section 3.3, xk is also allowed to be deterministic, inwhich case the notation f(zk,i;xk) would be more appropriate (see e.g. [96]).

1The work, which this chapter is based upon, was done jointly in collaboration with O. Hlinka et al.[23,25–27].

53

Page 66: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

54 CHAPTER 3. LIKELIHOOD CONSENSUS

3.1 System model and sequential Bayesian estimation

At a given discrete time instant k, each sensor estimates a global M -dimensional statexk = (xk,1, xk,2, . . . , xk,M )⊤ ∈ R

M based on all sensor measurements. The state evolvesaccording to the state-transition model:

xk = gk(xk−1,υk−1), (3.1)

where gk(·, ·) is, in general, a nonlinear function, υk is an independent white “driving” noisewhose Probability Density Function (pdf) is known. The models describing the evolution ofthe state include:

• random-accelaration model – which is commonly used in target tracking applications [26,92].State xk evolves according to equation

xk = Gxk−1 +Hυk−1 (3.2)

where υk is the driving noise of the state xk; matrices G and H define the relations betweenthe states.For example, if the state is a moving object in a 2D plane with position (xk, yk) and

velocity (xk, yk), the state is xk = (xk, yk, xk, yk)⊤ and in case υk ∼ N (02, σ

2vI2) the given

matrices may be [92]: G =

1 0 1 0

0 1 0 1

0 0 1 0

0 0 0 1

, H =

0.5 0

0 0.5

1 0

0 1

.

• random-walk model – which models only the time variances in position but not the velocityof the state. In a special case to Eq. (3.2), the state evolves according to

xk = xk−1 + υk−1. (3.3)

The statistical assumptions on the gk(·, ·) and υk (cf. Eq. (3.1)) determine the probabilisticformulation by the state-transition pdf f(xk|xk−1).I further assume that at time k, the i-th sensor acquires an Nk,i-dimensional measurement

zk,i ∈ RNk,i , which is related to the (global) state xk via the measurement model

zk = hk(xk,νk), (3.4)

where hk(·, ·) is again, in general, a non-linear function, and νk is a white measurement noisethat is independent from the past and current states and of the driving noise υk (Eq. (3.1)),and whose pdf is known. Typical measurement models include

• inverse decay model – which models a group of p transmitted signals each with amplitudeAp and which decay from the transmitter with the distance, i.e.,

zk,i = hk,i(xk) + νk,i, with hk,i(xk) =P∑

p=1

Ap∥∥∥

(p)k − ςk,i

∥∥∥

κ , (3.5)

Page 67: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.1. SYSTEM MODEL AND SEQUENTIAL BAYESIAN ESTIMATION 55

where (p) is the position of the p-th transmitter, ςk,i position of the i-th receiver (sensor)at time k; κ is the path loss exponent [93]. Here, νk,i ∼ N (0, σ2ν) is independent zero-mean Gaussian noise at node i and time k. Such measurement models lead to the locallikelihood function

f(zk,i|xk) =1

2πσ2νexp

(

− 1

2σ2ν[zk,i − hk,i(xk)]

2

)

, (3.6)

with hk,i(xk) given by Eq. (3.5).

• bearings-only model – which measures only the direction to the target [94], i.e., in a 2Dplane,

h(p)i (xk) = arctan

ςi,1 −

(p)k,1

ςi,2 − (p)k,2

, (3.7)

where xk = ((p)k,1,

(p)k,2) is a 2D position of the p-th target (emitter) at time k;

(ςi,1, ςi,2) is 2D position of the i-th receiver (sensor).

The relationship between zk,i and xk is described by the local likelihood function f(zk,i|xk),which in the case of an inverse decay measurement model with Gaussian noise takes the form ofEq. (3.6). The relationship between the all-sensors measurement vector zk,(z⊤k,1, z

⊤k,2, . . . , z

⊤k,N )⊤

and xk is described by the Joint Likelihood Function (JLF) f(zk|xk). All zk,i are assumed condi-tionally independent given xk, thus, the JLF is the product of all local likelihood functions, i.e.,

f(zk|xk) =N∏

i=1

f(zk,i|xk) . (3.8)

In case of the inverse decay model with Gaussian noise (Eq. (3.6)) the JLF takes the form

f(zk|xk) =N∏

i=1

f(zk,i|xk) =1

(2πσ2ν)N

exp

(

− 1

2σ2ν

N∑

i=1

[zk,i − hk,i(xk)]2

)

. (3.9)

Furthermore, the notation z1:k,(z⊤1 , z⊤2 , . . . , z

⊤k )

⊤ stands for the vector of all measurementsof all sensors up to time k.In the sequel, I will use the following assumptions. First, the current state xk is conditionally

independent of all past measurements, z1:k−1, given the previous state xk−1, i.e.,

f(xk|xk−1, z1:k−1) = f(xk|xk−1) . (3.10)

Second, the current measurement zk is conditionally independent of all past measurements,z1:k−1, given the current state xk, i.e.,

f(zk|xk, z1:k−1) = f(zk|xk) . (3.11)

Finally, sensor i knows the state-transition pdf f(xk|xk−1) and its own local likelihood functionf(zk,i|xk) as well as the prior pdf f(x0) of the initial state x0, but it does not know the locallikelihood functions of the other sensors, i.e., f(zk,i′ |xk) for i′ 6= i.

Page 68: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

56 CHAPTER 3. LIKELIHOOD CONSENSUS

Sequential Bayesian state estimation

I briefly review sequential Bayesian state estimation [95], which will be considered as a motivat-ing application of the likelihood consensus method later on. At time k, each sensor estimatesthe current state xk from the measurements z1:k of all sensors up to time k. For this task,I will use the Minimum Mean Square Error (MMSE) estimator [96],

xMMSEk , Exk|z1:k =

xkf(xk|z1:k) dxk , (3.12)

which is implemented at each sensor. Here, a major problem–even in a centralized scenario–is tocalculate the posterior pdf f(xk|z1:k). Using (3.10) and (3.11), the current posterior f(xk|z1:k)can be obtained sequentially from the previous posterior f(xk−1|z1:k−1) and the JLF f(zk|xk)

by means of the following temporal recursion [95]:

f(xk|z1:k) =f(zk|xk)

f(xk|xk−1)f(xk−1|z1:k−1)dxk−1

f(zk|z1:k−1). (3.13)

It is worth noting that for nonlinear/non-Gaussian cases, the computational complexity of se-quential MMSE state estimation as given by (3.12) and (3.13) is typically prohibitive.A computationally feasible approximation is provided by the Particle Filter (PF) [95, 97, 98]which will be used later in the target-tracking application. In a PF, the (non-Gaussian) pos-terior f(xk|z1:k) is represented by a set of samples (particles) x

(j)k , (j = 1, 2, . . . , J), and

corresponding weights w(j)k (see Appendix D).

As can be seen from (3.12) and (3.13), obtaining the global estimate xMMSEk at eachsensor presupposes that each sensor knows the JLF f(zk|xk) as a function of the state xk

(zk is observed and thus fixed, and f(xk−1|z1:k−1) used in (3.13) was calculated by each sensorat the previous time k−1). In particular, a PF approximation of xMMSEk relies on the point-

wise evaluation of the JLF at the particles x(j)k — i.e., on the evaluation of f(zk|x(j)

k ) — to

obtain the weights w(j)k . Since each sensor knows only its local likelihood function f(zk,i|xk),

a distributed method for calculating the JLF at each sensor is needed. Such method is proposedin Section 3.3 ahead.

It is important to note that, although I consider distributed sequential Bayesian estimationand distributed particle filtering as a motivating application [25–27], the proposed methodcan also be used for other distributed statistical inference tasks that require the pointwiseevaluation of the JLF at the individual sensors.

3.2 Approximation of the joint likelihood function

I consider the generalized approximation of the JLF as it was proposed in [99], which doesnot assume a specific form of the local likelihood functions or of the local measurement func-tions. Likelihood functions of exponential form [27] and measurement functions with additiveGaussian noise [25] are included as special cases.

Page 69: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.2. APPROXIMATION OF THE JOINT LIKELIHOOD FUNCTION 57

Assuming that local likelihood function f(zk,i|xk) > 0 for all xk, then by taking the loga-rithm of the JLF Eq. (3.8) yields

log f(zk|xk) =N∑

i=1

log f(zk,i|xk) . (3.14)

It might be tempting to directly use a consensus algorithm to compute the Eq.(3.14). However,a consensus-based distributed calculation of the sum in Eq. (3.14) is impossible because theterms of the sum depend on the unknown state xk. Therefore, the following (finite-dimensional)basis expansion approximations of order R of the local log-likelihood functions is used:

log f(zk,i|xk) ≈R∑

r=1

αk,i,r(zk,i)ϕk,r(xk) , (3.15)

where the αk,i,r(zk,i) are expansion coefficients that contain all sensor-local information (inparticular, the respective sensor measurement zk,i) and the ϕk,r(xk) are fixed, sensor-indepen-dent basis functions that are assumed to be known to all sensors. A simple example of (3.15)would be a multivariate polynomial. Inserting the approximation (3.15) into (3.14) leads to

log f(zk|xk) ≈R∑

r=1

ak,r(zk)ϕk,r(xk) , (3.16)

with

ak,r(zk) ,N∑

i=1

αk,i,r(zk,i) r ∈ 1, 2, . . . , R . (3.17)

Finally, by exponentiating (3.16), locally at each sensor, the following approximation of theJLF, denoted f(zk|xk), is obtained:

f(zk|xk) ≈ f(zk|xk) , exp

(R∑

r=1

ak,r(zk)ϕk,r(xk)

)

. (3.18)

Therefore, a sensor that knows the coefficients ak,r(zk) is able to evaluate the approximateJLF f(zk|xk) for all values of xk. Indeed, the collection of all ak,r(zk), i.e.,tk(zk) , (ak,1(zk) · · · ak,R(zk))⊤, can be viewed as a sufficient statistic [100] within the lim-its of the approximation (3.16). Finally, due to the sum expression (3.17), and the fact thatαk,i,r(zk,i) can be computed locally at each sensor, the sufficient statistic tk(zk) can be com-puted in a distributed way using any consensus algorithm.

These facts result in the Likelihood consensus algorithm for a distributed calculation of theapproximate JLF f(zk|xk) (see Sec. 3.3).

Note that the basis functions ϕk,r(·) include polynomials, Fourier bases, and others.The choice of the basis functions affects not only the accuracy of the approximation, butlater also the computational complexity and communication requirements of the Likelihoodconsensus algorithm.

Page 70: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

58 CHAPTER 3. LIKELIHOOD CONSENSUS

Example—polynomial approximation. A simple example of a basis expansion approxi-mation (3.15) is given by the polynomial approximation

log f(zk,i|xk) ≈Rp∑

r=0

αk,i,r

M∏

m=1

xrmk,m , (3.19)

where r , (r1, r2, . . . rM ) ∈ 0, 1, . . . , RpM ; Rp is the degree of the multivariate vector-valued polynomial;

∑Rp

r=0 is short for∑Rp

r1=0 · · ·∑Rp

rM=0 with the constraint∑M

m=1rm ≤ Rp;and αk,i,r ∈ R

q is the coefficient vector associated with the basis function (monomial)ϕk,r(xk) =

∏Mm=1 x

rmk,m (here, xk,m denotes the mth entry of xk). Eq. (3.19) can be written in

the form of (3.15) by a suitable index mapping

(r1 · · · rM )∈0, 1, . . . , RpM ↔ r ∈ 1, 2, . . . , R, where R =

(Rp +M

Rp

)

=(Rp +M)!

Rp!M !.

(3.20)This polynomial basis expansion will be further considered in Section 3.5.3.

3.2.1 Least squares approximation

A convenient method for calculating the approximations αk,i,rRr=1 in (3.15) is given by Least

Squares (LS) fitting [101–103]. Consider J data pairs(

x(j)k,i , log f(zk,i|x

(j)k,i))J

j=1, where the

state points x(j)k,i are chosen to “cover” those regions of the xk space RM where the JLF is

expected to be evaluated when estimating xk. In particular, in the distributed particle filterapplication to be considered in Sec. 3.5, the x(j)

k,i will be the predicted particles. With LS fitting,the coefficients αk,i,r(zk,i) are calculated such that the sum of the squared approximation errors

at the state points x(j)k,i , i.e.,

∑Jj=1

[

log f(zk,i|x(j)k,i)−

∑Rr=1 αk,i,r(zk,i)ϕk,r(x

(j)k,i)]2, is minimized.

To describe the solution to this minimization problem, the coefficient vectorαk,i(zk,i) , (αk,i,1(zk,i), αk,i,2(zk,i), . . . , αk,i,R(zk,i))

⊤ ∈ RR is defined.

Furthermore, let

Φk,i ,

ϕk,1(x(1)k,i ) · · · ϕk,R(x

(1)k,i )......

ϕk,1(x(J)k,i ) · · · ϕk,R(x

(J)k,i )

∈ R

J×R,

Ak,i ,(

log f(zk,i|x(1)k,i ), log f(zk,i|x

(2)k,i ), . . . , log f(zk,i|x

(J)k,i ))⊤

∈ RJ .

Then the LS solution for the coefficients αk,i,r(zk,i)Rr=1 is given by [101]

αk,i =(Φ⊤

k,iΦk,i

)−1Φ⊤

k,iAk,i .

Here, I assume that J ≥ R and that the columns of Φk,i are linearly independent, so that

Φ⊤k,iΦk,i is nonsingular. Note that J ≥R means that the number of state points x(j)

k,i is notsmaller than the number of basis functions ϕk,r(xk), for any given time k and sensor i. In ourexample application in Sec. 3.5.3.1 ahead, J ≫ R, and thus, the inversion of Φ⊤

k,iΦk,i is alsonumerically stable. Nevertheless, in case of devices with limited memory (such as sensors), thesize of J and R must be constrained, and numerical issues may appear.

Page 71: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.3. LIKELIHOOD CONSENSUS 59

3.3 Likelihood consensus

I provide here a method for distributed calculation of the approximate Joint Likelihood Func-tion (JLF) based on the approximation proposed in the previous Section 3.2. The distributedcomputation is conducted using a simple distributed consensus algorithm (see Sec. 2.1). In thissection I also present an extension based on the dynamic consensus algorithm (see Sec. 2.6.3)for sequential updating of the JLF, and thus reducing the communication overhead.

3.3.1 Distributed calculation of the approximate JLF – the LC algorithm

Based on the sum expressions (3.17), the sufficient statistic tk(zk) , (ak,1(zk) · · · ak,R(zk))⊤can be computed at each sensor by means of a distributed, iterative consensus algorithm thatrequires only communications between neighboring sensors. Here, I use a linear consensusalgorithm (see Sec. 2.1) for simplicity; however, other consensus algorithms (e.g., a gossipalgorithm [12, 55]) could be used as well. In what follows, since the consensus iterations maydiffer from the (sequential Bayesian) state estimation iterations k, the superscript (m) denotesthe consensus iteration index. I explain the operations performed by a fixed sensor i; note thatsuch operations are performed by all sensors simultaneously.At time k, to compute ak,r(zk) =

∑Ni=1 αk,i,r (Eq. (3.17)) sensor i first initializes its local

internal “state” as ψ(0)k,i , αk,i,r(zk,i). Since αk,i,r(zk,i) is available at sensor i, no communi-

cation is required at this initialization stage. Then, at the m-th iteration of the consensusalgorithm (m = 1, 2, . . .), the following two steps are performed by sensor i:

• Using the previous local internal state ψ(m−1)k,i and the previous neighbor internal states

ψ(m−1)k,i′ , i′ ∈ Ni (which were received by sensor i at the previous iteration), the localinternal state of sensor i is updated according to

ψ(m)k,i = [W]i,i ψ

(m−1)k,i +

i′∈Ni

[W]i,i′ ψ(m−1)k,i′ ,

or compactly (global point of view)

ψ(m)k = Wψ

(m−1)k ,

where ψ(m)k ,

(

ψ(m)k,1 , ψ

(m)k,2 , . . . , ψ

(m)k,N

)⊤with

ψ(0)k , (αk,1,r(zk,1), αk,2,r(zk,2), . . . , αk,N,r(zk,N ))⊤. The choice of the weights W is dis-cussed in Sec. 2.1.2.

• The new local internal state ψ(m)k,i is broadcast to all neighbors i

′∈Ni.

These two steps are repeated in an iterative manner until a desired degree of convergence isreached.If the communication graph of the sensor network is connected, the internal state ψ(m)

i ofeach sensor i converges to the average, i.e.,

limm→∞

ψ(m)k,i =

1

N

N∑

i′=1

αk,i′,r(zk,i′) =1

Nak,r(zk) (3.21)

Page 72: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

60 CHAPTER 3. LIKELIHOOD CONSENSUS

as m→∞ (see Sec. 2.1.1). Therefore, after convergence is reached, the internal states ψ(m→∞)k,i

of all sensors are equal and hence a consensus on the value of 1N ak,r(zk) is achieved. For

a finite number I of iterations, the states ψ(I)k,i will be (slightly) different for different sensors i

and also from the desired value 1N ak,r(zk). In what follows, I assume that I is sufficiently large

so that Nψ(I)k,i ≈ ak,r(zk) with sufficient accuracy, for all i. Note that in order to calculate the

coefficient ak,r(zk) from ψ(I)k,i , each sensor needs to know N . This information may be provided

to each sensor beforehand, or some distributed algorithm for counting the number of sensorsmay be employed (see Sec. 4.4 ahead).More compactly, the Likelihood Consensus (LC) algorithm can be summarized as follows:

Algorithm 3.1: Likelihood Consensus (LC)

At time k, the following steps are performed by sensor i (analogous steps are performed byall sensors simultaneously).

1. Calculate the coefficients αk,i,r(zk,i)Rr=1 of the approximations (3.15) (any approx-imation method can be used, e.g, the LS approximation from Sec. 3.2.1). The basisfunctions ϕk,r(xk)Rr=1 are assumed to be known to sensor i.

2. Consensus algorithm—ak,r(zk): For each r = 1, 2, . . . , R:

a) Initialize the local state as ψ(0)k,i = αk,i,r(zk,i).

b) For m = 1, 2, . . . , I (here, I is a predetermined iteration count or determined bythe condition that

∣∣ψ

(m)k,i − ψ

(m−1)k,i

∣∣ falls below a given threshold):

• Update the local state according to

ψ(m)k,i = [W]i,iψ

(m−1)k,i +

i′∈Ni

[W]i,i′ψ(m−1)k,i′

.• Broadcast the new state ψ(i)

k,i to all neighbors i′ ∈ Ni.

c) Calculate ak,r(zk) , Nψ(I)k,i .

3. Finally, by substituting ak,r(zk) for ak,r(zk) in (3.18), sensor i obtains the approximateJLF f(zk|xk) for any given value of xk.

Because one consensus algorithm has to be executed for each ak,r(zk), (r = 1, 2, . . . , R)the number of consensus algorithms that are executed in parallel is Nc = R. This is also thenumber of real numbers broadcast by each sensor in each iteration of the LC algorithm. Thenumber of real numbers transmitted in the entire network at time k is then

CLC = NINc = NIR,

where I is the number of consensus iterations performed by each consensus algorithm of theLC algorithm. It is important to note that R does not depend on the dimensions Nk,i of themeasurement vectors zk,i, and thus the communication requirements of the LC do not dependon Nk,i. This is particularly advantageous in the case of high-dimensional measurements.However, R usually grows with the dimension M of the state vector xk (cf. Eq. (3.20)).

Page 73: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.3. LIKELIHOOD CONSENSUS 61

In particular, if the M -dimensional basis ϕk,r(xk)Rr=1 is constructed as the M -fold tensor

product of a 1D basis ϕk,r(x)Rr=1, then R = RM .

Note in the case of a Gaussian measurement noise and the polynomial approximation, it canbe shown (see [25]) that the number of consensus algorithms executed in parallel (cf Eq. (3.20))becomes

Nc =

(

2Rp +M

2Rp

)

− 1, (3.22)

where Rp is the degree of the (multivariate) polynomial.

Note:

A special case exists, when the local likelihood functions f(zk,i|xk) belong to the class of expo-nential families, as the likelihood consensus is then performed only on the exponent (sufficientstatistic), i.e., log f(zk,i|xk) = log eg(zk,i|xk) = g (zk,i|xk) (see Eq. (3.9)). More details andconcrete approximations for such case can be found in [26,27,104].

3.3.2 Distributed calculation of the exact JLF

The basis expansion approximations (3.15) can be avoided if the JLF f(zk|xk) has a specialstructure. In that case, the exact JLF can be computed in a distributed way, up to errors thatare only due to the limited number of consensus iterations performed.

Let tk(zk) =(tk,1(zk) · · · tk,P (zk)

)⊤be a sufficient statistic for the estimation problem cor-

responding to f(zk|xk). According to the Neyman-Fisher factorization theorem [96], f(zk|xk)

can then be written as

f(zk|xk) = f1(zk)f2(tk(zk),xk

). (3.23)

Typically, the factor f1(zk) can be disregarded since it does not depend on xk and is henceirrelevant to statistical inference. Thus, tk(zk) encompasses the total measurement zk, in sensethat a sensor which knows tk(zk) and f2(·, ·) is able to evaluate the JLF f(zk|xk) (up to anirrelevant factor) for any value of xk. Suppose further that the components of tk(zk) have theform

tk,p(zk) =N∑

i=1

ηk,i,p(zk,i) , p = 1, 2, . . . , P , (3.24)

with arbitrary functions ηk,i,p(·) and that sensor i knows its own functions ηk,i,p(·) but notηk,i′,p(·), i′ 6= i. Based on the sum expression (3.24), consensus algorithms can be then used,as described in Sec. 3.3.1, with obvious modifications, to calculate tk(zk) and, thus, the JLFf(zk|xk) in a distributed manner.

An example where an exact calculation of the JLF is possible is the case where f(zk|xk)

factorizes according to (3.8) and the local log-likelihood log f(zk,i|xk) at each sensor i can be ex-actly represented using expansions of the form (3.15), i.e., log f(zk,i|xk) =

∑Rr=1 αk,i,rϕk,r(xk),

or equivalently, f(zk,i|xk) = exp(∑R

r=1 αk,i,r(zk,i)ϕk,r(xk))

. Thus, no (least-square) approx-

imation is necessary. This is a special case of (3.23) and (3.24), having the following struc-

Page 74: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

62 CHAPTER 3. LIKELIHOOD CONSENSUS

ture (cf. (3.18)):

f2(tk(zk),xk) = exp

P∑

p=1

tk,p(zk)ϕk,p(xk)

,

with P = R and the elements of tk(zk) have the form (cf. (3.17))

tk,p(zk) ≡ ak,p(zk) =N∑

i=1

αk,i,p(zk,i), p ∈ 1, 2, . . . , R.

Thus, the elements tk,p(zk) are of form (3.24), with ηk,i,p(zk,i) = αk,i,p(zk,i).

3.4 Sequential likelihood consensus

As in previous sections, I consider decentralized sequential state estimation in a WSN, usingonly local processing and low-rate, local inter-sensor communications. To obtain a globalestimate at each sensor, the sensors must know the joint (all-sensors) likelihood function (JLF).In the previous sections I proposed the Likelihood Consensus (LC) method for a distributedapproximate computation of the JLF.In this section, I provide an extension of the LC method for computing an approximation of

the JLF sequentially. I refer to this algorithm as Sequential Likelihood Consensus (SLC) [23].The SLC is based on the modified dynamic consensus algorithm (see Sec. 2.6.3), of which onlya single iteration is performed per time step. By merging the recursion steps of the SLC (“m”)with those of the sequential estimation procedure (“k”), it is possible to significantly reducethe number of transmitted messages and thus decrease the communication load as well as boththe inter-sensor communications and the latency.As shown, due to the sum expression (3.17), the sufficient statistic ti(zi) can be computed

in a distributed way using a consensus algorithm (see Sec. 2.1). The resulting LC scheme fora distributed calculation of the approximate JLF f(zi|xi) requires each sensor to communicateonly with its neighbors within a predetermined radius, and it avoids the use of complex routingprotocols. However, the consensus algorithm must be sufficiently converged, which means thata sufficient number of consensus iterations must be executed for each posterior update (3.13)(i.e., for each k). Since each consensus iteration contributes to the latency and the amount ofinter-sensor communications of the sequential state estimation scheme, this may be a limitingfactor in practice, especially for real-time applications such as target tracking. Therefore,I next incorporate the dynamic consensus algorithm that allows an approximate sequentialcalculation of the sum in (3.17) using only a single consensus iteration for each k.

3.4.1 The SLC algorithm

The sequential likelihood consensus is an evolution of the LC in which the sufficient statistictk(zk) is updated sequentially, in a single iteration per time k, based on its previous value andthe current measurements. The sequential likelihood consensus uses the dynamic consensusalgorithm of Sec. 2.6.3 with a single consensus iteration I = 1 to perform an approximatedistributed calculation of the sum in (3.17). The sequential likelihood consensus algorithm canbe summarized as follows.

Page 75: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.4. SEQUENTIAL LIKELIHOOD CONSENSUS 63

Algorithm 3.2: Sequential Likelihood Consensus (SLC)

At each time step k ≥ 1, each sensor i performs the following tasks:

1. The coefficients αk,i,r(zk,i)Rr=1 of the approximation (3.15) are calculated, e.g., by usingleast squares fitting as discussed in Sec. 3.2.1.

2. Dynamic consensus algorithm (see Sec. 2.6.3)–The following steps are executed simulta-neously for each r∈1, 2, . . . , R:

a) If k =1, the internal state of sensor i is initialized as ζi,0,r = αi,1,r(zi,1).

b) A temporary internal state is computed according to Eq. (2.81)

ψk,i,r = µ ζk−1,i,r + (1−µ)αk,i,r(zk,i), (3.25)

where µ ∈ [0, 1] is a tuning parameter (see Sec. 2.6.3).

c) The temporary internal state ψk,i,r is broadcast to all neighbors i′∈ Ni.

d) The internal state ζk,i,r is calculated from the local and neighboring temporaryinternal states, ψk,i,r and ψk,i′,ri′∈Ni

, by a single consensus step (cf. (2.82)):

ζk,i,r = [W]i,iψk,i,r +∑

i′∈Ni

[W]i,i′ψk,i′,r . (3.26)

The weights [W]i,i′ depend on the network topology (see Sec. 2.1.2)

e) The internal state ζk,i,r is scaled as ak,i,r(zk) , Nζk,i,r.

3. By substituting ak,i,r(zk) for ak,r(zk) and evaluating (3.18), sensor i is able to computethe approximate JLF f(zk|xk) for any value of xk. (Note that the ak,i,r(zk) for different imay differ slightly.)

From a global point of view, let ζk,r , (ζk,1,r, ζk,2,r, . . . , ζk,N,r)⊤ collect the internal states

of all sensors. Combining (3.25) and (3.26), the following internal-state update relation for theentire network (globally) is obtained (cf. Eq. (2.83)):

ζk,r = W [µ ζk−1,r + (1−µ)αk,r(zk)] , (3.27)

with the coefficient vector αk,r(zk) ,(αk,1,r(zk,1), . . . , αk,N,r(zk,N )

)⊤.

The SLC update (3.27) is a simplified form of the dynamic consensus algorithm (2.83),with internal-state vector x(k) ≡ ζk,r and “tracked” input vector s(k) ≡ αk,r(zk), filter lengthL=1, filter coefficient ω0=1, and a single consensus iteration, I=1 (cf Eq. (2.83)). Assumingαk,r(zk) is not changing rapidly in time, the dynamic consensus is able to track

1Nαk,r(zk). As

I will show in Section 3.5.3.1, this simplified algorithm still leads to satisfactory performanceif the time variation of αk,r(zk) is not too large.

The SLC scheme requires the transmission of R real numbers by each sensor at each time k.The (nonsequential) LC scheme (Sec. 3.3) requires the transmission of RI real numbers by eachsensor at time k, where I is the number of consensus iterations performed within time step k.Therefore, using the SLC instead of the LC can lead to significant savings in inter-sensor

Page 76: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

64 CHAPTER 3. LIKELIHOOD CONSENSUS

communications, in addition to the reduced latency. The SLC and the LC share the advantagethat the number of transmissions does not depend on the measurement dimensions Nk,i; thisfeature is particularly attractive in the case of high-dimensional measurements.

3.5 Distributed target tracking using likelihood consensus

In this section I provide an application of the likelihood consensus (LC) as well as the sequentiallikelihood method (SLC) proposed in the previous sections. I consider a distributed targettracking application when multiple targets move in an area covered by a sensor network. Eachsensor measures a received signal (amplitude), exchange the information with the neighboringsensors, and estimate the position of the emitters (targets). As throughout this thesis, I considersensors to be low-power with low-transmission range, using no routing protocols whatsoeverto disseminate the data.

For this goal I will use the sequential Bayesian estimation (see Sec. 3.1) where the targetwill be estimated using the MMSE estimator (Eq. (3.12)). Also as I mentioned in Sec. 3.1,since it is computationally infeasible to evaluate Eq. (3.12), for the approximation of the jointlikelihood function JLF, I will use the so-called Particle Filter (PF) (see Appendix D), i.e.,

xk ,

xkfδ(xk|z1:k)dxk =

xk

J∑

j=1

w(j)k δ

(

xk − x(j)k

)

dxk =J∑

j=1

w(j)k x

(j)k , (3.28)

where x(j)k Jj=1 is the set of the “particles” (samples) and w

(j)k Jj=1 are the weights associated

with the particles. Each time a new measurement zk is obtained, new particles and weightsare calculated by the PF algorithm.

3.5.1 LC-based distributed particle filter

As shown in the previous Sections 3.3 and 3.4, the LC (SLC) scheme is able to distributivelycompute an approximated joint likelihood funtion necessary for the MMSE estimate. Here Ishow how to apply this approach to a joint likelihood function represented by the particle filterapproximation which is practical and computationally feasible.

Page 77: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.5. DISTRIBUTED TARGET TRACKING USING LIKELIHOOD CONSENSUS 65

Algorithm 3.3: LC–based Distributed Particle Filter (LC–DPF)

The local PF at each sensor i is initialized at time k = 0 by J particles x(j)0,iJj=1 drawn from

the prior pdf f(x0). The weights are initially equal, i.e., w(j)0,i ≡ 1/J for all j.

At time k > 0, the local PF at sensor i performs the following steps, which are identicalfor all i.

1. From each x(j)k−1,i, a new, predicted particle x

(j)k,i is randomly sampled from

f(xk|x(j)k−1,i) ≡ f(xk|xk−1)

∣∣xk−1= x

(j)k−1,i

.

2. An approximation f(zk|xk) of the JLF f(zk|xk) is computed by means of the LikelihoodConsensus (LC) as described in Sec. 3.3.1. This step requires communications with neigh-boring sensors. The local approximation at sensor i can be calculated by means of LSfitting as described in Sec. 3.2.1, using the predicted particles

x(j)k,i

J

j=1.

3. The weights associated with the predicted particles x(j)k,i obtained in Step 1 are calculated

according to

w(j)k,i =

f(zk|x(j)k,i)

J∑

j′=1

f(zk|x(j′)k,i )

, j = 1, 2, . . . , J . (3.29)

This involves the approximate JLF f(zk|xk) calculated in Step 2, which is evaluated atall predicted particles x(j)

k,i.

Note that the set (x(j)k,i , w

(j)k,i)Jj=1 provides a particle representation of the current global

posterior f(xk|z1:k).

4. From the particle representation(

x(j)k,i, w

(j)k,i

)J

j=1, an approximation of the global MMSE

state estimate (3.12) is computed according to (3.28), i.e.,

xk,i =

J∑

j=1

w(j)k,ix

(j)k,i .

5. The particle representation(

x(j)k,i, w

(j)k,i

)J

j=1is resampled, which produces J new resam-

pled particles x(j)k,i with identical weights. The particles x

(j)k,i are obtained by sampling

with replacement from the setx(j′)k,i

J

j′=1, where x(j′)

k,i is sampled with probability w(j′)k,i .

Through the above recursion, each sensor obtains a global quasi-MMSE state estimatethat involves the past and current measurements of all sensors. Because of the use of LC,this is achieved without communicating between distant sensors or employing complex routingprotocols. Also, no particles, local state estimates, or measurements need to be communicatedbetween sensors. The local PF algorithms running at different sensors are identical. Therefore,any differences between the state estimates xk,i at different sensors i are only due to the randomsampling of the particles (using nonsynchronized local random generators) and errors causedby insufficiently converged consensus algorithms.

Page 78: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

66 CHAPTER 3. LIKELIHOOD CONSENSUS

In case the particle filter is replaced with a so-called Gaussian Particle Filter (GPF) [26], thecomplexity of the algorithm reduces further in such way that no resampling (Step 5) is required.In the simulations in Sec. 3.5.3 I denote such modified version of the LC-based distributedparticle filter as LC–based Distributed Gaussian Particle Filter (LC–DGPF). A centralized(non-distributed) GPF is referred to as Centralized Gaussian Particle Filter (CGPF).

Similarly to the LC-based distributed particle filter, by replacing Step 2 in the Algorithmwith the Sequential Likelihood Consensus (see Sec. 3.4), one can describe an SLC–based Dis-tributed Particle Filter (SLC–DPF).

3.5.2 Communication requirements

I now discuss the communication requirements of the LC-based distributed PF (LC–DPF).For comparison, I also consider the Centralized Particle Filter (CPF), in which all sensormeasurements are transmitted to a fusion center, and a Straightforward Distributed ParticleFilter (S–DPF) implementation in which the measurements of each sensor are transmittedto all other sensors. Note that with the S–DPF, each sensor performs exactly the same PFoperations as does the fusion center in the CPF scheme.

For the CPF, communicating all sensor measurements at time k to the fusion center re-quires the transmission of a total of

∑Ni=1HiNk,i real numbers within the sensor network [105].

Here, Hi denotes the number of communication hops from sensor i to the fusion center,and Nk,i is the dimension of the measurement zk,i. Additional information needs to be trans-mitted to the fusion center if the fusion center does not possess prior knowledge of the JLF.Finally, if the state estimate calculated at the fusion center is required to be available at the sen-sors, additionalMH ′ real numbers need to be transmitted at each time k. Here, H ′ denotes thenumber of communication hops needed to disseminate the state estimate throughout the net-work. Thus,the total number of real numbers transmitted within the entire network at time k is

CCPF =N∑

i=1

HiNk,i +MH ′.

A problem of the CPF using multihop transmission is that all data pass through a small subsetof sensors surrounding the fusion center, which can lead to fast depletion of the batteries ofthese sensors.

With the S–DPF, disseminating the measurements of all sensors at time k to all othersensors requires the transmission of

CS–DPF =N∑

i=1

H ′′i Nk,i

real numbers [105], where H ′′i is the number of communication hops required to disseminate

the measurement of sensor i throughout the network. Again, additional information needs tobe transmitted if the JLF is not known to all sensors.

Finally, the proposed LC–DPF requires the transmission of

CLC–DPF = NINc

Page 79: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.5. DISTRIBUTED TARGET TRACKING USING LIKELIHOOD CONSENSUS 67

real numbers at each time k, where I is the number of consensus iterations performed by eachconsensus algorithm and Nc is the number of consensus algorithms executed in parallel (seeSection 3.3.1). In contrast to the CPF and S–DPF, this number of transmissions does notdepend on the measurement dimensions Nk,i; this makes the LC–DPF particularly attractivein the case of high-dimensional measurements. Another advantage of the LC–DPF is that noadditional communications are needed (e.g., to transmit local likelihood functions between sen-sors). Furthermore, the LC–DPF does not require multihop transmissions or routing protocolssince each sensor simply broadcasts information to its neighbors. This makes the LC–DPFparticularly suited to wireless sensor networks with dynamic network topologies (e.g., movingsensors or a time-varying number of active sensors): in contrast to the CPF and S–DPF, thereis no need to rebuild routing tables each time the network topology changes.

On the other hand, the computational complexity of the LC–DPF is higher than that ofthe S–DPF because the approximation described in Section 3.3.1 needs to be computed at eachsensor. Overall, the LC–DPF performs more local computations than the S–DPF in order toreduce communications; this is especially true for high-dimensional measurements and/or high-dimensional parametrizations of the local likelihood functions. Since the energy consumptionof local computations is typically much smaller than that of communication, the total energyconsumption is reduced and thus the operation lifetime is extended. This advantage of theLC–DPF comes at the cost of a certain performance loss (compared to the CPF or S–DPF)due to the approximate JLF used by the local PFs. This will be analyzed experimentally inSection 3.5.3.

It is worth noting that in [26] a modification of the LC–DPF was proposed, which requiresmuch less number of particles at each node – Reduced-complexity Likelihood-consensus Gaus-sian Particle Filter (R–LC–DGPF) – and which I consider in the simulations in Sec. 3.5.3.Such approach reduces the computational complexity at each node dramatically, and requiresonly slightly increased communication cost:

CR–LC–DGPF = NI

(

Nc +M +M(M + 1)

2+ 1

)

,

where M is again the dimension of the estimated state xk.

For the sake of completeness, I recall that the Sequential Likelihood Consensus (see Sec. 3.4.1)requires to exchange only

CSLC = NNc

real numbers within the entire network at each time k, thus it leads to a significant savings inthe inter-sensor communications.

3.5.3 Simulations

In this section I assess the performance of the distributed target tracking algorithm based onthe likelihood consensus (LC–DPF) as well as on sequential likelihood consensus (SLC–DPF)and compare it with other target tracking algorithms.

Page 80: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

68 CHAPTER 3. LIKELIHOOD CONSENSUS

3.5.3.1 LC-DPF-based target tracking

I assume P = 2 targets moving in a 2D plane of size 40m×40m each with dynamics as described

by Eq. (3.2) with Gp =

1 0 1 0

0 1 0 1

0 0 1 0

0 0 0 1

, Hp =

0.5 0

0 0.5

1 0

0 1

(p ∈ 1, 2). Each of the targets

emits a signal with amplitude Ap = 10 and the inverse decay measurement model (Eq. (3.5)),with κ = 1 and measurement noise variance σ2ν = 0.05, is assumed. The initial prior pdff(x

(p)0 ) = N (µ

(p)0 ,C0) is different for the two targets, with µ

(1)0 = (36, 36,−0.05,−0.05)⊤ and

µ(2)0 = (4, 4, 0.05, 0.05)⊤ (see on Fig. 3.1 the targets starting in the upper right corner andlower left corner); C0 = diag(1, 1, 0.001, 0.001).The network consists of N = 25 acoustic sensors that are deployed on a jittered grid. The

communication radius for each sensor is 18m and the Metropolis weights (Eq. (2.11)) for theconsensus algorithm are used.

5t 15 25 35

0

5

35

15

25

0 10 20 30 40

10

20

30

40

x [m]

y[m]

True trajectories

Estimated trajectories

Figure 3.1: Example of the tracked targets using the LC–DPF.

Due to the considered additive Gaussian noise, the measurement function hk,i(xk) (Eq. (3.5))is approximated by a polynomial of degree Rp = 2 (Eq. (3.22)). This results in the followingapproximation of the exponent Sk(zk,xk) of the JLF (Eq. (3.19))

log f(zk|xk) ≈ Sk(zk,xk) =25∑

i=1

4∑

r=0

αk,i,r(zk,i)(

x(1)k

)r1 (

y(1)k

)r2 (

x(2)k

)r3 (

y(2)k

)r4.

To obtain αk,i,r I use LS fitting (see Sec. 3.2.1). Notice that M = 4 since I estimate only

positions of the targets and not the velocities, i.e., xk ≡ (x(1)k , y

(1)k , x

(2)k , y

(2)k ).

This leads to Nc =

(

4 + 4

4

)

− 1 = 69 consensus algorithms that are executed in parallel,

each using I = 8 iterations, unless stated otherwise.

Page 81: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.5. DISTRIBUTED TARGET TRACKING USING LIKELIHOOD CONSENSUS 69

k

RMSEk[m]

0.4

0.6

0.8

1

1.2

1.4

0 50 100 150 200

GSHL-DPF [106]OC-DPF [107]FRG-DPF [108]LC–DGPFCGPF

(a)

k

TracklossadjustedRMSEk[m]

0.4

0.6

0.8

1

1.2

1.4

0 50 100 150 200

GSHL-DPF[106]OC-DPF [107]FRG-DPF [108]LC–DGPFCGPF

(b)

Figure 3.2: (a) RMSEk and (b) track loss adjusted RMSEk versus time k for the proposedLC–DGPF, for the CGPF, and for state-of-the-art distributed PFs (GSHL-DPF [106], OC-DPF [107], and FRG-DPF [108]). All distributed PFs use eight consensus iterations.

I apply the following perfomance measures:

• RMSEk – a time-dependent root-mean-square error for the targets’ position estimatesˆk,i. This is computed as the square root of the average of ‖ ˆ(p)k,i −

(p)k,i‖2 over the two

targets p ∈ 1, 2, all sensors i = 1, 2, . . . , 25, and 5 000 independent simulation runs.

• Average Root Mean Squared Error (ARMSE) – computed as the square root of theaverage of RMSE2k over all 200 simulated time instants k.

• σARMSE – denotes the standard deviation of an i-dependent error defined as the squareroot of the average of ‖ ˆ(p)k,i −

(p)k,i‖2 over the two targets (p ∈ 1, 2), all 200 time instants

k, and 5 000 simulation runs. This measures the error variation across the sensors.

The “track loss percentage” is defined as the percentage of simulation runs during which theestimation error at time k = 200 exceeded 5m, which is half the average inter-sensor distance.Such simulation runs were excluded in the calculation of the “track loss adjusted” RMSEk,ARMSE, and σARMSE.In Fig. 3.2, I compare the RMSEk and track loss adjusted RMSEk of the proposed LC–DGPF

with that of CGPF and the state-of-the-art distributed PFs (GSHL-DPF [106], OC-DPF [107],FRG-DPF [108]). In terms of track loss adjusted RMSEk (Fig. 3.2(b)), LC–DGPF outper-forms GSHL-DPF and OC-DPF, and it performs almost as well as FRG-DPF and CGPF. Theincrease in RMSEk over time in Fig. 3.2(a) is caused by the lost tracks.Table 3.1 summarizes the estimation performance (ARMSE, track loss adjusted ARMSE,

σARMSE, track loss adjusted σARMSE, and track loss percentage) and the communicationrequirements of the proposed consensus-based distributed PFs (LC–DPF, LC–DGPF, andR–LC–DGPF), of the state-of-the-art consensus-based distributed PFs (GSHL–DPF [106],OC–DPF [107], and FRG–DPF [108]), and of the centralized methods (CPF and CGPF).The “communication requirements” are defined as the total number of real numbers trans-

mitted (over one hop between neighboring sensors) at one time instant within the entire net-work. For the centralized methods (CPF and CGPF), I used multi-hop routing of measure-

Page 82: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

70 CHAPTER 3. LIKELIHOOD CONSENSUS

ARMSE[m]

Track lossadjustedARMSE[m]

σARMSE[m]

Track lossadjustedσARMSE[m]

percentage[%]

Communicationrequirements

LC–DPF 0.6225 0.5424 0.0860 0.0222 0.95 13 800LC–DGPF 0.6187 0.5387 0.0889 0.0205 0.7 13 800R–LC–DGPF 0.5531 0.5204 0.0005 0.0005 0.46 22 800GSHL-DPF [106] 1.3022 1.2841 0.0032 0.0032 0.74 8 800OC-DPF [107] 0.9992 0.8399 0.0022 0.0024 1.1 8 800FRG-DPF [108] 0.5553 0.5335 0 0 0.2 400 000

CPF 0.4975 0.4975 – – 0 770CGPF 0.5156 0.5086 – – 0.18 770

Table 3.1: Estimation performance and communication requirements of the proposedconsensus-based distributed PFs (LC–DPF, LC–DGPF, and R–LC–DGPF), of state-of-the-artconsensus-based distributed PFs (GSHL–DPF, OC–DPF, and FRG–DPF), and of centralizedPFs (CPF and CGPF).

ments and sensor locations from every sensor to the fusion center (located in one of the cornersof the network). Furthermore, the estimates calculated at the fusion center are disseminatedthroughout the network, such that every sensor obtains the centralized estimate.

Table 3.1 also shows that the track loss adjusted ARMSEs of the proposed distributedPFs are quite similar and that they are close to those of the centralized methods; they areslightly higher than that of FRG-DPF, slightly lower than that of OC-DPF, and about halfthat of GSHL-DPF. For FRG-DPF, σARMSE is zero, since max and min consensus algorithms2

are employed to ensure identical results at each sensor. Furthermore, σARMSE is higher forLC–DPF and LC–DGPF than for R–LC–DGPF, GSHL–DPF, and OC–DPF. This is becauseR–LC–DGPF, GSHL–DPF, and OC–DPF employ a consensus step whereby Gaussian approx-imations of the partial/local posterior pdfs are combined to obtain a global posterior, thusachieving a tighter coupling between the sensors. By contrast, the local PFs of LC–DPF andLC–DGPF operate completely independently; only the JLF is computed in a distributed wayusing the LC scheme. Note, however, that the ARMSE and track loss adjusted ARMSE ofLC–DPF and LC–DGPF are lower than for GSHL–DPF and OC–DPF. Finally, the track losspercentages of the proposed distributed PFs are below 1% and similar to those of GSHL–DPF,OC–DPF, and FRG–DPF. As a consequence, the ARMSEs are generally very close to the trackloss adjusted ARMSEs.

It is observed that the communication requirements of the distributed PFs are much higherthan those of the centralized methods. This is due to the used low-dimensional (scalar) mea-surements and the fact that each local likelihood function is parametrized only by the sensorlocation, i.e., three real numbers must be transmitted in one hop. For high-dimensional mea-surements and/or a different parametrization of the local likelihood functions, resulting inabout 190 or more real numbers to be transmitted in one hop, the opposite will be true.

2Max and min consensus algorithms are algorithms which distributively find the maximum and minimumvalue in a network.

Page 83: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.5. DISTRIBUTED TARGET TRACKING USING LIKELIHOOD CONSENSUS 71

Note that even when the consensus-based methods require more communications, they maybe preferable over centralized methods because they are more robust (no possibility of fusioncenter failure), they require no routing protocols, and each sensor obtains an approximationof the global posterior (in the centralized schemes, each sensor obtains from the fusion centeronly the state estimate). It is furthermore evident that the communication requirements of theproposed distributed PFs are higher than those of GSHL-DPF and OC-DPF but much lowerthan those of FRG–DPF. Note, however, that the communication requirements of FRG–DPFdepend on the number of particles and thus could be reduced by using fewer particles, whereasthose of the other methods do not depend on the number of particles. Finally, among the pro-posed distributed PFs, the communication requirements of R–LC–DGPF are higher by about65% than those of LC–DPF and LC–DGPF.

For a deeper analysis of the target tracking application based on LC-based DistributedParticle Filter (DPF) I refer the reader to [16,104].

3.5.3.2 SLC-DPF-based target tracking

For the SLC–DPF-based target tracking I slightly change the simulation setting. I simulateone moving target evolving according to the same dynamics as in Sec. 3.5.3.1. The targetemits a sound of constant amplitude A = 1, which is sensed by N = 25 acoustic amplitudesensors randomly deployed on a jittered grid within a rectangular region of (normalized) size1 × 1. Each sensor communicates with other sensors within a radius of 0.45. The (scalar)measurement zk,i of sensor i is again given by the inverse decay model (Eq. (3.5))) with noisevariance σ2ν = 0.5.

For the SLC, log f(zk,i|xk) is approximated by a (multivariate) polynomial of degreeRp = 4,which results in a basis expansion (3.15) of dimension R =

(4+24

)− 1 = 14. Therefore, 14 real

numbers are broadcast by each sensor to its neighbors at each time k. The polynomial approx-imation is again calculated by means of least squares fitting (see Sec. 3.2.1). The consensusalgorithm uses Metropolis weights (Eq. (2.11)). Each local PF employs J=5000 particles.

Analogously to Sec. 3.5.3.1, the estimation performance is measured by the root-mean-square error (RMSEk) of the state estimates xk,i. At each time k, RMSEk is computed as thesquare root of the averaged squared estimation error, where the averaging is over all sensorsand 200 simulation runs. I also compute the ARMSE by averaging RMSE2k over all 1 500simulated time instants k and taking the square root.

Fig. 3.3a depicts the dependence of the ARMSE of the proposed SLC-based distributedPF (abbreviated SLC–DPF) on the tuning parameter µ (see Sec. 3.4). It can be seen that thelowest ARMSE is achieved for µ = 0.7. Therefore, unless stated otherwise, this value is usedthroughout the simulations.

Fig. 3.3b shows the temporal evolution of RMSEk. The proposed SLC–DPF is comparedwith the LC–DPF (see Sec. 3.5.1), which uses one or eight consensus iterations. SLC–DPFperforms significantly better than LC–DPF with one consensus iteration; note that the com-munication requirements are equal in this case. Furthermore, SLC–DPF performs slightlyworse than LC–DPF with eight iterations; however, here the communication requirements andlatency of SLC–DPF are lower by a factor of eight (see Sec. 3.5.2).

Page 84: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

72 CHAPTER 3. LIKELIHOOD CONSENSUS

µ

ARMSE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.91.0

1.2

1.4

1.6

1.8

×10−2

(a) ARMSE of SLC–DPF versus tuning parameter µ.

LC–DPF – 1 cons. iter.SLC–DPFLC–DPF – 8 cons. iter.

k

RMSEk

0 500 1 000 1 5000

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5×10−2

(b) RMSEi of SLC–DPFSLC-DPF and LC–DPFLC-DPF versus time k.

MA–DPF [109]SLC–DPFFRG–DPF [108]CPF

k

RMSEk

0 500 1 000 1 5000.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8×10−2

(c) RMSEk of SLC–DPF, state-of-the-art distributedPFs MA–DPF and FRG–DPF, and CPF versustime k.

LC–DPF

M–SLC–DPF – µ = 0.2

M–SLC–DPF – µ = 0.7

I

ARMSE

1 2 3 4 5 6 7 80.5

1.0

1.5

2.0

2.5

3.0×10−2

(d) ARMSE of M–SLC–DPF and LC–DPF versusnumber of consensus iterations I.

Figure 3.3: Performance of SLC-based DPF.

In Fig. 3.3c, I compare SLC–DPF with the state-of-the-art distributed PFs of [109] and [108](referred to as MA–DPF and FRG–DPF, respectively) and with a centralized PF (CPF) thatprocesses all measurements at a fusion center. All PFs use J = 5000 particles. Both MA–DPFand FRG–DPF use eight consensus iterations.

The performance of SLC–DPF is not dramatically poorer than that of CPF, and roughlysimilar to that of MA–DPF and FRG–DPF. However, the communication requirements ofSLC–DPF are significantly lower than those of MA–DPF and FRG–DPF: the total counts ofreal numbers transmitted per time step by SLC–DPF, MA–DPF, and FRG–DPF in the entirenetwork (all sensors) are 350, 103, and 106, respectively.

Finally, I consider a Modified SLC–based Distributed Particle Filter (M–SLC–DPF) thatemploys multiple consensus iterations per time step k. That is, I≥1, similar to LC–DPF and,also, to the general dynamic consensus in (2.83).

Note that LC–DPF is a special case of M–SLC–DPF with µ=0.

In Fig. 3.3d, I compare the ARMSE of M–SLC–DPF (with µ = 0.7 and µ = 0.2) andLC–DPF, plotted as a function of the number I of consensus iterations. For both methods,

Page 85: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

3.6. CONCLUSION 73

each sensor broadcasts RI real numbers to its neighbors. It is apparent that for I = 1 andI=2, M–SLC–DPF with µ=0.7 performs best; for I=3 and I=4, M–SLC–DPF with µ=0.2

performs best; and for I ≥ 5, LC–DPF (i.e., µ=0) performs best. These results suggest thatthe optimal value of µ decreases to 0 as I increases.

3.6 Conclusion

The likelihood consensus and the sequential likelihood consensus represent two algorithms builton the simple average and the dynamic average consensus algorithms. As I showed, this canbe efficiently used in even more complicated applications, e.g., distributed target tracking.Also, this chapter showed the importance of the right selection of the consensus algorithm.The dynamic consensus algorithm used in the SLC, in comparison to the “static” consensusalgorithm used in the LC, brings many advantages, but also disadvantages.

In the following table I summarize the properties of both likelihood consensus algorithms:

Advantages Disadvantages

LC • Based on the average consensus algo-rithm→ no routing protocol, fully dis-tributed.

• Small number of coefficients, definingthe approximated likelihood function,is transmitted.

• Applicable to any class of likelihoodfunctions.

• Advantageous for high-dimensionalstate estimation.

• Synchronicity must be ensured.• Necessity to wait for consensus to con-verge before proceeding with the nextestimation step.

SLC • Based on the dynamic average con-sensus algorithm→ no routing proto-col, fully distributed.

• No need to wait for consensus beforeproceeding with the next estimationstep.

• Lower communication requirementsthan in the LC.

• Synchronicity must be ensured.• Higher estimation error than in the LC.• If the estimated state changes rapidly,the estimation is not possible.

The convergence analysis of the average consensus algorithms from the previous Chapter 2also holds here. Nevertheless, to study the convergence of the distributed sequential Bayesianestimation based on the consensus algorithm is a non-trivial problem and remains unsolvedhere. Especially, in the case of the sequential likelihood consensus, to estimate the bounds onthe input signal for the dynamic consensus, i.e., the coefficients of the basis used at the approx-imation of the joint likelihood function, would be certainly of further interest. Consequently,

Page 86: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

74 CHAPTER 3. LIKELIHOOD CONSENSUS

the bounds on the convergence of the dynamic consensus algorithm derived in Sec. 2.6 maypossibly support the derivation of the convergence bounds for the sequential state estimation.Note also that the algorithms presented here require synchronously working nodes with

synchronous observations and transmissions. This can be an unvoidable obstacle in real appli-cations. Therefore, an asynchronous algorithm might be more practical and desired.

Page 87: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Chapter 4

Distributed Gram-Schmidt

Orthogonalization

The problem of orthogonalizing a set of vectors is a well-known problem in linear algebra. Bycollecting the set of vectors in a matrix A several orthogonalization methods are possible. Oneexample is computing the so-called QR factorization (decomposition), A = QR, with a matrixQ having orthonormal columns, and an upper triangular matrix R containing the coefficientsof the basis transformation [45,48,110]. In the signal processing area, QR factorization is usedwidely in many applications, e. g., when solving linear least-squares problems [111–113].

From an algorithmic point of view, there exist many methods for computing the QR factor-ization with different numerical properties. A standard approach is to use the Gram-Schmidtorthogonalization algorithm (see Sec. 4.1), which computes a set of orthonormal vectors span-ning the same space as the given set of vectors. Other methods involve Householder reflectionsor Givens rotations [48], although numerically more stable, they lack the applicability in iter-ative (thus distributed) methods [110].

The optimization of QR factorization algorithms for specific target hardware has been ad-dressed frequently in the literature (e. g., [114, 115]). Parallel algorithms for computing QRfactorization, which are applicable for reliable systems with fixed, regular, and globally knowntopology have been investigated extensively (e. g., [116–118]). In contrast, the investigation ofdistributed QR factorization algorithms designed for loosely coupled distributed systems withindependently operating distributed memory nodes and with possibly unreliable communica-tion links has only started recently [112,119].

Besides parallel algorithms, as also mentioned in Chapter 1, there are two other potentialapproaches for a distributed computation in a network. The standard – centralized – approach,which first collects the data from all nodes and performs the computation with all data at thefusion center, has severe disadvantages in WSNs. Complex routing, uneven energy consump-tion, since the nodes close to the fusion center have to relay all the data from other nodes,and the presence of a single point of failure are some of them. Another approach is to con-sider distributed algorithms for fully decentralized networks without any fusion center whereall nodes have the same functionality and each of them communicates only with its neighbors.Such an approach is typical for sensor-actuator networks or autonomous swarms of roboticnetworks [10]. In the following, I focus on such algorithms for decentralized networks.

75

Page 88: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

76 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

In this chapter1, a novel distributed QR factorization algorithm based on Classical Gram-Schmidt orthogonalization using the Dynamic Consensus algorithm (see Sec. 2.6)–DC–CGS [28]is presented. A proof of its convergence as well as a thorough analysis of its properties is pro-vided as well.As before, the algorithm does not require any fusion center and assumes only local commu-

nication between neighboring nodes without any global knowledge about the network topology.In contrast to existing approaches, the DC–CGS algorithm computes all elements of the neworthogonal basis simultaneously and as the algorithm proceeds, the values at all nodes arerefined iteratively and approximate the exact values of Q and R. Therefore, it can deliver anestimate of the full matrix result at any moment of the computation.

Notation

I denote element-wise division of two vectors as z = xy ≡ xi

yi, ∀i, element-wise multiplication

of two vectors as z = x y ≡ xiyi, ∀i, and of two matrices as Z = X Y. The operationX ⊛ Y is defined as follows: Having two matrices X = (x1,x2, . . . ,xm) ∈ R

n×m and Y =

(y1,y2, . . . ,ym) ∈ Rn×m, the resulting matrix2 Z = X⊛Y is a stacked matrix of all matrices

Zi such that Zi=(x1,x2, . . . ,xi) ((1, 1, . . . , 1︸ ︷︷ ︸

i

) ⊗ yi+1) (⊗ denotes the Kronecker product;

i = 1, 2, . . . ,m−1), i.e., Z = (x1 y2︸ ︷︷ ︸

Z1

,x1 y3,x2 y3︸ ︷︷ ︸

Z2

, . . . ,x1 ym, . . . ,xm−2 ym,xm−1 ym︸ ︷︷ ︸· · ·

Zm−1

),

thus creating a big matrix containing combinations of column vectors; Z ∈ Rn×m2−m

2 . Thislater corresponds in the DC–CGS algorithm to the off-diagonal elements of matrix R.

4.1 Gram-Schmidt orthogonalization

Given matrix A = (a1,a2, . . . ,am) ∈ Rn×m, n ≥ m, Classical Gram-Schmidt Orthogonaliza-

tion (CGS) [45, 48] computes a matrix Q ∈ Rn×m with orthonormal columns and the upper-

triangular matrix R ∈ Rm×m, such that A = QR. Denoting

Q =(

q1 q2 . . . qm

)

R =

〈q1,a1〉 〈q1,a2〉 . . . 〈q1,am〉0 〈q2,a2〉 〈q2,a3〉

......

.... . .

...0 . . . 0 〈qm,am〉

(4.1)

whereqj =

ui

‖uj‖2, j = 1, 2, . . . ,m, (4.2)

and

uj = aj −j−1∑

j′=1

〈qj′ ,aj〉〈qj′ ,qj′〉

qj′ , j = 1, 2, . . . ,m, (4.3)

with ‖u‖2 =√∑n

i=1 u2i and 〈q,a〉 =∑n

i=1 qiai.

1The work, which this chapter is based upon, was done jointly in collaboration with H. Straková et al. [24,28].2Note that xm and y1 do not contribute to the resulting matrix Z.

Page 89: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.1. GRAM-SCHMIDT ORTHOGONALIZATION 77

a1 = u1

a2

a3

q1

q2

q3

u2u3

〈q1 ,a

2 〉〈q1 ,q

1 〉 q1

〈q1,a3〉

〈q1,q1〉q1 +

〈q2,a3〉

〈q2,q2〉q2

〈q1,a3〉

〈q1,q1〉q1 +

〈q2,a3〉

〈q2,q2〉q2

Figure 4.1: Principle of the Gram-Schmidt orthogonalization. From general vectorsa1,a2,a3, the set of orthogonal vectors q1,q2,q3 is obtained.

Fig. 4.1 illustrates the principle of the Gram-Schmidt orthogonalization process, where froma given set of vectors ai a set of orthogonal (orthonormal) vectors qi is computed.It is known that the CGS algorithm is numerically sensitive depending on the singular

values (condition number) of matrix A as well as it can produce vectors qj far from orthog-onal when matrix A is close to being rank-deficient even in a floating-point precision [110].Numerical stability can be improved by other methods, e. g., modified Gram-Schmidt method,Householder transformations, or Givens rotations [48,110].

4.1.1 Existing distributed methods

A straightforward redefinition of the CGS in a distributed way suitable for a WSN follows fromthe definition of the ℓ2 norm, i.e., ‖u‖2 =

u21 + u22 + · · ·+ u2n (cf. (4.2)), and inner products,〈q,a〉 = q1a1 + q2a2 + · · · + qnan (cf. (4.3)), where the summations can be computed byany distributed aggregation algorithm3, e.g., average consensus [13], gossiping algorithms [55],etc.), depending only on communication with the neighbors.

Nevertheless, to my knowledge, all existing distributed orthogonalization algorithms arebased on the gossip-based push-sum algorithm [55]. Specifically, in [112] authors used dis-tributed CGS based on gossiping for solving a distributed LS problem and in [119] a gossip-based distributed algorithm forModified Gram-Schmidt orthogonalization (MGS) was designedand analyzed. The authors also provided a quantitative comparison to existing parallel algo-rithms for QR factorization. A slight modification of the latter algorithm was introducedin [120], which I later use for comparison. I denote the two gossip-based distributed Gram-Schmidt orthogonalization algorithms as G–CGS [112] and G–MGS [120], respectively. Bothmethods compute columns of matrix Q sequentially, meaning that the algorithm needs to waitfor the push-sum to be converged before proceeding with the next column.

3E.g., using the static average consensus algorithm (see Sec. 2.1), ‖u‖2 = N limk→∞ Wk(u u) =∑N

i=1 u2i .

Page 90: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

78 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

4.2 Distributed classical Gram-Schmidt based on dynamic average

consensus – DC–CGS

As mentioned in Sec. 4.1.1, the Gram-Schmidt orthogonalization method can be computed ina distributed way by using any distributed aggregation algorithm. I refer to a CGS algorithmbased on the Static Consensus algorithm (see Sec. 2.1) as SC–CGS. SC–CGS as well asG–CGS [112] and G–MGS [120] have the following substantial drawback.In Gram-Schmidt orthogonalization methods, the computation of the norms ‖uj‖ and the

inner products 〈qj′ ,aj〉, 〈qj′ ,qj′〉, occurring in matrices Q and R (cf. Eq. (4.1)), depends onthe norms and inner products computed from previous columns of input matrix A. Therefore,each node i must wait until the estimates of the previous norms

∥∥uj′

∥∥ (j′ < j) have achieved

an acceptable accuracy before processing the next norm ‖uj‖. The same holds also for com-puting the inner products. I here present a novel approach overcoming this drawback by usingthe dynamic consensus algorithm which processes time-varying input values. In this case, thetime-varying input is the estimates (approximations) of the appropriate norms and inner prod-ucts. Thus, all elements of the orthonormal basis Q are computed simultaneously and as thealgorithm proceeds, the values at all nodes are iteratively refined, eventually converging to thedesired Q and R matrices. Therefore, it delivers estimates of Q and R at any intermediatestage.Rewriting Eqs. (4.2) and (4.3) (Sec. 4.1) implicitly containing these dependencies, leads to

qj(k) =uj(k)

√N uj(k − 1)

, j = 1, 2, . . . ,m, (4.4)

uj(k) = aj − pj(k), j = 1, 2, . . . ,m, (4.5)

where uj(k) is the approximation of 1/N ‖uj‖22 1 at time k and

pj(k) =

j−1∑

j′=1

p(2)l (k − 1) qj′(k − 1)

qj′(k − 1), (4.6)

with p(2)l (k) being the approximation of the off-diagonal inner products 1/N〈qj′ ,aj〉1 (∀j′ < j;

l = j′ + (j−1)(j−2)2 ) of matrix R (cf. Eq. (4.1)) and qj′(k) the approximation of 1/N〈qj′ ,qj′〉1

at time k. Similarly, I define p(1)j (k) to be the approximation of 1/N〈qj ,aj〉1. Using the

dynamic average consensus algorithm from Sec. 2.6, uj(k), qj(k), p(1)j (k), and p(2)

l (k) converge

to 1/N ‖uj‖22 1, 1/N〈qj′ ,qj′〉1, 1/N〈qj ,aj〉1, and 1/N〈qj′ ,aj〉1 (for all j′ < j), respectively.I furthermore assume that the matrices A ∈ R

n×m and Q ∈ Rn×m are distributed over the

network row-wise, meaning that each node stores at least one row of matrix A and correspond-ing rows of matrix Q, and each node stores the whole matrix R ∈ R

m×m. In case n > N , morerows must be stored at the node and each node must sum the data locally before broadcastingto neighbors.Notation: Here [A]n′

i∈ R

|n′i|×m, [Q(k)]n′

i∈ R

|n′i|×m represent the rows of the matrices A

and Q at a given node i at time k. If more rows are stored in one node, [A]n′iand [Q(k)]n′

i

denotes matrices, otherwise they are row vectors. Matrix R(i)(k) ∈ Rm×m represents the whole

matrix R at node i at time k.The DC–CGS algorithm can be formalized as follows:

Page 91: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.2. DC–CGS 79

Algorithm 4.1: Distributed Gram-Schmidt Orthogonalization

based on Dynamic Consensus Algorithm (DC–CGS)

Input matrix A = (a1,a2, . . . ,am) ∈ Rn×m with n ≥ m is distributed row-wise across nodes. If

n > N , some nodes store more than one row. Each node computes the rows of Q correspondingto the stored rows of A and an estimate of the whole matrix R.Used indices: i = 1, 2, . . . , N (sensors); j = 1, 2, . . . ,m (columns); l = 1, 2, . . . , m

2−m2 (off-diagonal

elements of R); n′i ⊂ n′nn′=1 (set of row indices stored at node i).Variables from global (network) point of view:U(k) = (u1(k), . . . , uj(k), . . . , um(k)) ∈ R

n×m; Q(k) = (q1(k), . . . , qj(k), . . . , qm(k)) ∈ Rn×m;

U(k) = (u1(k), . . . , uj(k), . . . , um(k)) ∈ RN×m; Q(k) = (q1(k), . . . , qj(k), . . . , qm(k)) ∈ R

N×m;

P(1)(k) = (p(1)1 (k), . . . , p

(1)j (k), . . . , p

(1)m (k)) ∈ R

N×m;

P(2)(k) = (p(2)1 (k), . . . , p

(2)l (k), . . . , p

(2)m2−m

2

(k)) ∈ RN×m2−m

2

• Initialization at each node i (if n = N , n′i ≡ i; ∀j = 1, 2, . . . ,m):

[uj(0)]n′i= [aj ]n′

i, [qj(0)]n′

i= [aj ]n′

i,

[uj(0)]i =∑

∀n′i[A A]n′

i,j , [qj(0)]i =

∀n′i[A A]n′

i,j ,

[p(1)j (0)]i =

∀n′i[A A]n′

i,j , [p

(2)l (0)]i =

∀n′i[A⊛A]n′

i,l

• Repeat for k = 1, 2, . . . (j = 1, 2, . . . ,m)

1. Compute locally at each node i

[pj(k)]n′i=

j−1∑

j′=1

[p(2)j′+(j−1)(j−2)/2(k − 1)]i [qj′(k − 1)]n′

i

[qj′(k − 1)]i

[uj(k)]n′i= [aj ]n′

i− [pj(k)]n′

i

[qj(k)]n′i=

[uj(k)]n′i√

N [uj(k−1)]i

2. At each node i store

[Q(k)]n′i=([q1(k)]n′

i, . . . , [qj(k)]n′

i, . . . , [qm(k)]n′

i

),

R(i)(k) = N

[p(1)1 (k)]i [p

(2)1 (k)]i . . . [p

(2)(m2−3m+4)/2(k)]i

0 [p(1)2 (k)]i [p

(2)3 (k)]i

......

.... . .

...

0 . . . 0 [p(1)m (k)]i

3. At node i, aggregate data for the dynamic consensus (cf. (2.67)) (if n = N , n′i ≡ i)

[ψ(1)j ]i = [uj(k − 1)]i +

n′i[uj(k) uj(k)]n′

i− [uj(k − 1) uj(k − 1)]n′

i

[ψ(2)j ]i = [qj(k − 1)]i +

n′i[qj(k) Qj(k)]n′

i− [qj(k − 1) qj(k − 1)]n′

i

[ψ(3)j ]i = [p

(1)j (k − 1)]i +

n′i[qj(k) aj ]n′

i− [qj(k − 1) aj ]n′

i

[ψ(4)l ]i

︸ ︷︷ ︸= [p

(2)l (k − 1)]i

︸ ︷︷ ︸+∑

n′i[Q(k)⊛A]n′

i,l − [Q(k − 1)⊛A]n′

i,l

︸ ︷︷ ︸

[Ψ(k)]i = [X(k − 1)]i + [∆S(k)]i

4. From global point of view (i = 1, 2, . . . , N):

Broadcast Ψ(k)=([Ψ(k)]⊤1 , [Ψ(k)]⊤2 , . . . , [Ψ(k)]⊤N

)⊤∈RN×(m2+5m

2 ), i.e.,(

U(k), Q(k), P(1)(k), P(2)(k))

︸ ︷︷ ︸

X(k)

= WΨ(k).

Page 92: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

80 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

Proof of convergence of DC–CGS. (From global point of view.) For the first column vectorj = 1, u1(k) = a1 is obtained, and thus, as k → ∞, u1(k) = 1/N‖a1‖21 and thus also,q1(k) =

a1‖a1‖ , q1(k) = 1/N1, p(1)

1 (k) = 1/N‖a1‖1, and p(2)1 (k) = 1/N〈a1,a2〉1.

Furthermore, for all j > 1, it can be recognized that the elements depend only on the first

column (j = 1), e.g., Eq. (4.5), u2(k) = a2− p(2)1 (k−1) q1(k−1)

q1(k−1) . Thus, since the elements of the

first vector, s1(k) (S(k) =(s1(k), . . . , sl(k), . . . , s(m2+5m)/2(k)

), see Step 3 of the algorithm)

are bounded (constant), all elements of subsequent vectors sl are bounded as well, and thus,the conditions of Theorem 2.5 (Sec. 2.6) are fulfilled. Therefore, as the elements uj(k), qj(k),

p(1)j (k), p(2)

j (k) converge, the algorithm converges.

It is worth noting that as shown by Theorem 2.6 (Sec. 2.6) if the input signal (S(k) in Step3 of the algorithm) does not decrease, the error remains only bounded (Eq. (2.71)), and doesnot go to zero. As I will show in the simulations in Sec. 4.3, this is often the case under theused numerical precision and given input matrix A.Note that instead of knowing the number of nodes N and using it as a normalization

constant, an additional weight vector ω(k) ∈ RN×1 can be transmitted, i.e., ψ(0)(k) = ω(k)

and [Ψ(k)]i = [(ψ(0)(k),ψ(1)(k),ψ(2)(k),ψ(3)(k),ψ(4)(k))]i, such that ω(0) = (1, 0, . . . , 0)⊤ andEq. (4.4) would change only slightly4, i.e.,

qj(k) =uj(k)

√1

ω(k) uj(k − 1)(4.7)

Note that the normalization constant N (or ω(k), respectively) effects only5 the orthonor-mality of the columns of matrix Q(k), and in case only orthogonality is sufficient, it is possibleto omit this constant. One can, thus, overcome the necessity of knowing the number of thenodes, or, reduce the number of transmitted data in the network, respectively. This fact willbe used later in Sec. 4.4.

4.3 Simulations

In the simulations I consider a connected WSN with N = 30 nodes. I explore the behaviour ofDC–CGS for various topologies: fully connected (each node is connected to every other node),regular (each node has the same degree d) and geometric (each (randomly deployed) node isconnected to all nodes in some radius ρ – a WSN model). If not stated otherwise, the selectedrandom input matrix A ∈ R

300×100 has uniformly distributed elements from the interval [0, 1]and a low condition number κ(A) = 35.7. In Sec. 4.3.3 I, however, investigate the influence ofvarious input matrices with different condition numbers on the algorithm’s performance.Also, except for the Sections 4.3.3 and 4.3.4, for the consensus weight matrix I use the

Metropolis weights (Eq. (2.11)).The confidence intervals were computed from the several instantiations using a bootstrap

method [121].

4limk→∞ ω(k) = 1/N .5Not considering numerical properties.

Page 93: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.3. SIMULATIONS 81

0 200 400 600 800 1 000 1 200 1 400−12

−10

−8

−6

−4

−2

0

2

4

6

k

log10

(∥ ∥I−Q⊤Q∥ ∥2

)

(a) Orthogonality error.

0 200 400 600 800 1 000 1 200 1 400−14

−12

−10

−8

−6

−4

−2

0

2

4

6

k

log10

(‖A

−QR

(i)‖ 2

‖A‖2

)

(b) Factorization error for node i = 1, 2, . . . , 30.

Figure 4.2: Example of orthogonality and factorization error for each node i for a geometrictopology with d=8.533, N=30.

4.3.1 Orthogonality and factorization error

As performance metrics in the simulations I use:

• relative factorization error –∥∥A−Q(k)R(i)(k)

∥∥2/ ‖A‖2 – which measures the accuracy

of the QR factorization,

• orthogonality error –∥∥I−Q(k)⊤Q(k)

∥∥2– which measures the orthogonality of the matrix

Q(k) (see Step 2 of the algorithm).

Note that both errors are calculated from the network (global) perspective and as depictedthey are not known locally at the nodes, since only R(i)(k) is local at each node, whereas Q(k)

is distributed row-wise across the nodes – [Q(k)]n′i. From now on, I simplify the notation by

dropping the index k in Q(k) and R(i)(k). The simulation results for a geometric topologywith an average node degree d = 8.533 are depicted in Fig. 4.2. Since both errors behavealmost identically (compare Fig. 4.2a and Fig. 4.2b) and since each node i can compute a local

factorization error∥∥∥[A]n′

i− [Q]n′

iR(i)

∥∥∥2/∥∥∥[A]n′

i

∥∥∥2from its local data, I infer that such local

error evaluation might be used also as a local convergence criterion in practice. I use this factlater in Sec. 4.4.

Page 94: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

82 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

“uniform”node with di = 5node with dj = 5

0 50 100 150 200 250 300 350 400−12

−10

−8

−6

−4

−2

0

2

4

6

k

log10

(∥ ∥I−Q⊤Q∥ ∥2

)

(a) Regular topology (d = 5).

“uniform”node i with di = 3node j with dj = 15

0 500 1 000 1 500 2 000 2 500 3 000−12

−10

−8

−6

−4

−2

0

2

4

6

k

log10

(∥ ∥I−Q⊤Q∥ ∥2

)

(b) Geometric topology (d = 8.533).

Figure 4.3: Convergence for networks with different topology and initial data distribution:either all nodes store the same amount of data (“uniform”) or most of the data is stored inone node (with minimum or maximum degree). In case of the regular topology, the nodes i, jare picked randomly.

Note that the error at the beginning stage in Fig. 4.2 is caused by the disagreement (unequalvalues) and not converged norms and inner products across the nodes. I also observe that theerror floor6 is highly influenced by the network topology, the weights of matrix W, and thecondition number of the input matrix A. I investigate these properties in Sec. 4.3.3.

4.3.2 Initial data distribution

If n>N , some nodes store more than one row ofA. Thus,before performing distributed summa-tion (broadcasting to neighbors), every node has to locally sum the values from its local rows.Simulations show that the convergence behaviour of DC–CGS strongly depends on the

distribution of the rows across the network (see Fig. 4.3). I investigate the following cases:

(1) each node stores ten rows of A (“uniform” case),

(2) 271 rows are stored in a node with the lowest degree, the other 29 rows in the remaining29 nodes,

(3) 271 rows are stored in a node with the highest degree, the rest in the remaining 29 nodes.

Note that in case of a regular topology, i.e., all nodes have the same degree, case (2) and(3) are the same (cf. Fig. 4.3a).Observe that not only the initial distribution of the data influences the convergence be-

haviour but also the topology of the underlying network. In the case of a regular topology(Fig. 4.3a) the influence of the distribution is small and relatively weak in terms of conver-gence time, but stronger in terms of final accuracy achieved. I recognize that the differencebetween the nodes comes only from the variance of the values in the input matrix A. On theother hand, in case of a highly irregular geometric topology (see Fig. 4.3b), where the nodewith most neighbors stores most of the data, the algorithm converges much faster than in thecase when most of the data are stored in a node with only few neighbors.

6Error at which the algorithm stalls at given computational precision.

Page 95: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.3. SIMULATIONS 83

0 200 400 600 800 1 000 1 200 1 400 1 600−12

−10

−8

−6

−4

−2

0

2

4

6

k

log10

∥ ∥I−Q

⊤Q∥ ∥2

Geom.Reg.Comp.

Figure 4.4: “Uniform” distribution for different topologies. (Bold-face line is the mean valueacross six different uniform data distributions. Shaded areas are 95% confidence intervals.)

Furthermore, Fig. 4.4 shows that in the “uniform” case, the algorithm behaves slightlydifferent for different distributions of the rows (although still having ten rows in each node –globally, this means that the rows of matrix A are interchanged.). The results are shown for sixdifferent placements of the data across the nodes for three different topologies, where I depictthe mean value and the corresponding 95% confidence intervals of the simulated orthogonalityerror. As can be observed, in case of a fully connected topology, the data distribution is ofno importance, since all the nodes exchange data in every step with all other nodes. In caseof a geometric topology, however, the convergence of the algorithm is influenced by the datadistribution, even if every node contains the same number of rows (ten rows in each node).This can be recognized by bigger confidence intervals of the orthogonality error. Nevertheless,the speed of convergence for all cases is bigger than the case when most data is stored in the“sparsest” node (cf. Fig. 4.3b). In case of a regular topology the difference is small only dueto numerical accuracy of the mixing parameters.

4.3.3 Numerical sensitivity

As explained in Sec. 4.1, the classical Gram-Schmidt orthogonalization possesses some unde-sirable numerical properties [48, 110]. In comparison to the centralized algorithm, numericalstability of the DC–CGS is furthermore influenced by the precision of the mixing weight ma-trix W, network topology, and distribution of the input matrix A. In this section I providesimulation results, showing these dependencies.

Weights

As mentioned in Sec. 2.1.2, matrixW can be selected in many ways. Mainly, the selection ofthe weights influences the speed of convergence. Unlike other simulations, where I used theMetropolis weights, Eq. (2.11), here I select constant weights for matrix W (see Sec. 2.1.2),i.e.,

[W]ij =

ηN if (i, j) ∈ E ,1− η

N di if i = j,

0 otherwise,

(4.8)

where η ∈ (0, 1]. Note that I use 1N instead of

1∆ (cf. Eq. (2.10)).

Page 96: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

84 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

−14

−12

−10

−8

−6

−4

−2

0

2

4

0 200 400 600 800 1 000 1 200 1 400

η=0.1η=0.2η=0.5η=0.7η=1

k

log10

∥ ∥I−Q

⊤Q∥ ∥2

(a) Fully connected topology

−12

−10

−8

−6

−4

−2

0

2

4

6

0 1 000 2 000 3 000 4 000 5 000 6 000 7 000 8 000

η=0.1η=0.2η=0.5η=0.7η=1

k

log10

∥ ∥I−Q

⊤Q∥ ∥2

(b) Regular topology

−12

−10

−8

−6

−4

−2

0

2

4

6

0 4 000 8 000 12 000 16 000 20 000

η=0.1η=0.2η=0.5η=0.7η=1

k

log10

∥ ∥I−Q

⊤Q∥ ∥2

(c) Geometric topology

Figure 4.5: Influence of different constant weights η (Eq. (4.8)) on the algorithm accuracyand convergence speed for three different topologies averaged over ten different input matrices.(Shaded areas are 95% confidence intervals.)

Such weights, in general, lead to slower convergence. However, it can also be observed inFig. 4.5 that the weights influence not only the speed of convergence but also the numericalaccuracy of the algorithm (different error floors). This may be caused, as I showed in Eq. (2.71),by different eigenvalues λ2 of the mixing matrix W. This can be observed on Fig. 4.5 wherewith an increasing λ2 (from (a) to (c)), for the same η, the error floor increases as well.

Page 97: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.3. SIMULATIONS 85

−14

−12

−10

−8

−6

−4

−2

0

2

4

6

0 200 400 600 800 1 000 1 200 1 400 1 600

vpa-4vpa-8vpa-16Inf

k

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

Figure 4.6: Influence of the numerical preci-sion of the mixing weights on the algorithmperformance. Geometric topology, matrix A

with low condition number (κ(A) = 1.04).

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

0 200 400 600 800 1 000 1 200 1 400 1 600

vpa-16 vs. Infvpa-32 vs. Inf

k

differenceintheorthogonalityerrors

Figure 4.7: Difference in the orthogonality err-

or (log10

∣∣∣

∥∥I−Q⊤Q

∥∥(vpai)

2−∥∥I−Q⊤Q

∥∥(Inf)

2

∣∣∣)

for the case of 16 and 32 decimal digits versusthe “infinite” precision (converted to double).

−10

−8

−6

−4

−2

0

2

4

6

0 400 800 1 200 1 600 2 000

vpa-4vpa-8vpa-16Inf

k

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

Figure 4.8: Influence of the numerical preci-sion of the mixing weights on the algorithmperformance. Geometric topology, matrix A

with higher condition number (κ(A) = 76.33).

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

0 400 800 1 200 1 600 2 000k

differenceintheorthogonalityerrors

Figure 4.9: Orthogonality error difference

(log10

∣∣∣

∥∥I−Q⊤Q

∥∥(vpa−16)

2−∥∥I−Q⊤Q

∥∥(Inf)

2

∣∣∣)

for the case of 16 decimal digits versus the“infinite” precision (converted to double).Note that in comparison to Fig. 4.7 thedifference between “infinite” and more than16 digits is below the machine precision (exactsame results).

Mixing precision

Another factor influencing the algorithm performance is the numerical precision of the mixingweights W. Here, I simulate the case of a geometric topology with the Metropolis weightmodel, where the weights are of given precision – characterized by the number of variabledecimal digits (4, 8, 16, 32, “Infinite”)7.

7The simulations were performed in Matlab R2011b 64-bit using the Symbolic toolbox with variable precisionarithmetic. “Infinite” precision denotes weights represented as an exact ratio of two numbers. The depictedresult after “infinite” precision multiplication was converted to double precision.

Page 98: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

86 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

Comparing Fig. 4.6 with Fig. 4.8 it may be noticed that the numerical precision of themixing weights have bigger influence in cases when the input matrix is worse conditioned.Also, on Fig. 4.7 and Fig. 4.9 the difference between orthogonality errors for various precisionsmay be observed. Observe also that for matrix A with higher condition number, the highermixing precision has bigger impact on the result.

Fig. 4.6 shows that the error floor moves with the mixing precision. However, note that evenfor the “infinite” mixing precision, the orthogonality error stalls at an accuracy lower than theused machine precision – taking into account also the conversion to double precision (∼ 10−12).As mentioned, this is caused by the properties of the dynamic consensus and the fact that theinput signal (S(k) in Step 3 of the algorithm) in given precision may stop decreasing and theerror (Eq. (2.68)) may not go to zero. In such case, the orthogonality error also may stopdecreasing, reaches some error floor (which can be much higher than the machine precision –as can be observed in the simulations) and remains constant. Then, the bound for constantinput signals (see Eq. (2.71)) applies.

Condition numbers

It is well known that the classical Gram-Schmidt orthogonalization can become numericallyunstable [110]. In cases when the input matrix A is ill-conditioned (high condition number)or rank-deficient (matrix contains linear dependent columns), the computed vectors Q can befar from orthogonal even when computed in a high precision.

In this section I investigate the influence of the condition number of the input matrix A onthe accuracy of the orthogonality. The condition number is defined with respect to inversionas the ratio of the largest and smallest singular value. In comparison to classical (centralized)Gram-Schmidt orthogonalization I observe (Fig. 4.10a) that the DC–CGS algorithm behavessimilarly, although it does not reach the accuracy of neither the SC–CGS nor the centralizedalgorithm (even in the fully connected network). This is caused mainly by the dynamic consen-sus algorithm which performance is influenced by network topology and choice of the mixingweights precision (see the previous simulations). I find in all simulations that the orthogonalityerror in the first phase can reach very high values (due to divisions by numbers close to zero),which may influence the numerical accuracy in the final phase.

I further observe that the algorithm requires matrix A to be very well-conditioned evenfor the fully connected network. Unlike other methods, the factorization error in case ofDC–CGS has the same characteristics as the orthogonality error and is also influenced by thecondition number of the input matrix, see Fig. 4.10b. Although, as I noted in Sec. 4.3.1, theorthogonality and the factorization error of DC–CGS behave almost identically, the dependenceof the condition number κ(A) on the factorization error needs to be studied further.Fig. 4.10 also shows that the G–MGS is the most robust method in comparison to the

others. This is caused by the usage of the modified Gram-Schmidt orthogonalization insteadof the classical method.

Page 99: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.3. SIMULATIONS 87

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

−2 0 2 4 6 8 10 12 14 16 18

CGS (centr.)SC–CGS

DC–CGSG–CGS [112]

G–MGS [120]

log10 κ(A)

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

(a) Orthogonality error. Complete topology.

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

−2 0 2 4 6 8 10 12 14 16 18

CGS (centr.)SC–CGS

DC–CGSG–CGS [112]

G–MGS [120]

log10 κ(A)log10

∥ ∥ ∥A

−Q

R(i)∥ ∥ ∥

2‖A

‖2

(b) Factorization error. Complete topology.

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

−2 0 2 4 6 8 10 12 14 16 18

CGS (centr.)SC–CGS

DC–CGSG–CGS [112]

G–MGS [120]

log10 κ(A)

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

(c) Orthogonality error. Regular topology, d = 5.

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

−2 0 2 4 6 8 10 12 14 16 18

CGS (centr.)SC–CGS

DC–CGSG–CGS [112]

G–MGS [120]

log10 κ(A)

log10

∥ ∥ ∥A

−Q

R(i)∥ ∥ ∥

2‖A

‖2

(d) Factorization error. Regular topology, d = 5.

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

−2 0 2 4 6 8 10 12 14 16 18

CGS (centr.)SC–CGS

DC–CGSG–CGS [112]

G–MGS [120]

log10 κ(A)

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

(e) Orthogonality error. Geometric topology, d=8.53.

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

4

−2 0 2 4 6 8 10 12 14 16 18

CGS (centr.)SC–CGS

DC–CGSG–CGS [112]

G–MGS [120]

log10 κ(A)

log10

∥ ∥ ∥A

−Q

R(i)∥ ∥ ∥

2‖A

‖2

(f) Factorization error. Geometric topology,d=8.53.

Figure 4.10: Impact of the condition number κ(A) of matrix A on the orthogonality andfactorization error. Averaged over 10 matrices for each condition number. Different topologies.(Both axes are in logarithmic scale. Shaded areas are 95% confidence intervals.)

Page 100: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

88 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

4.3.4 Robustness to link failures

In case of distributed algorithms it is of big importance that the algorithm must be robustagainst network failures. Typical failures in WSNs are message losses or link failures, whichoccur due to many reasons, e.g., channel fading, channel congestions, message collisions, movingnodes, or dynamic topology.

−14

−12

−10

−8

−6

−4

−2

0

2

4

6

0 200 400 600 800 1 000 1 200 1 400

0%5%25%50%60%80%

k

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

(a) Fully connected topology

−14

−12

−10

−8

−6

−4

−2

0

2

4

6

0 4 000 8 000 12 000 16 000 20 000

0%5%25%50%60%80%

k

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

(b) Geometric topology

−14

−12

−10

−8

−6

−4

−2

0

2

4

6

0 1 000 2 000 3 000 4 000 5 000 6 000 7 000 8 000

0%5%25%50%60%80%

k

log10

∥ ∥

I−

Q⊤Q∥ ∥

2

(c) Regular topology

Figure 4.11: Robustness to link failures for different percentage of failed links at every timestep. Constant weight model with η = 1, i.e., the fastest option (see Fig. 4.5). (Shaded areasare 95% confidence intervals.)

Page 101: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.3. SIMULATIONS 89

I model link failures as a temporary drop-out of a bidirectional connection between twonodes, meaning that no message can be transmitted between the nodes. In every time stepI randomly remove some percentage of links in the network. As a weight model I pickedthe constant weights model, Eq. (4.8), due to its property that every node can compute ateach time step the weights locally based only on the number of received messages (di). Thus,no global knowledge is required. However, the nodes must still work synchronously and thetransmissions must be pairwise8.

From Fig. 4.11 I conclude that the algorithm is very robust and even if in every time step abig percentage (up to 60%) of the links is dropped, the algorithm still achieves some accuracy(at least 10−2; Fig. 4.11b).

It is worth noting that moving nodes and dynamic network topology can be modelled inthe same way. I therefore argue that the algorithm is robust also to such scenarios (assumingthat synchronicity is guaranteed).

4.3.5 Performance comparison with existing algorithms

Here I compare the new DC–CGS algorithm with SC–CGS, and the gossip-based algorithmsG–CGS and G–MGS introduced in Sec. 4.1.1. Although all approaches have iterative aspects,the cost per iteration strongly differs for each algorithm. Thus, instead of providing a compari-son in terms of the number of iterations to converge, I compare the communication cost neededfor achieving a certain accuracy of the result. I investigate the total number of messages sentas well as the total amount of data (real numbers) exchanged.

Simulation results for various topologies are shown in Figs. 4.12 and 4.13. The gossip-based approaches exchange, in general, less data (Fig. 4.13), but since their message size ismuch smaller than in DC–CGS, the total number of messages sent is higher (Fig. 4.12).

Because the message size of SC–CGS is even smaller than in the gossip-based approaches,it sends the highest number of messages. Since the energy consumption in a WSN is mostlyinfluenced by the number of transmissions [8,122], thus, it is better to transmit as few messagesas possible (with any payload size), therefore, DC–CGS is the most suitable method for a WSNscenario. However, notice that in many cases DC–CGS does not achieve the same final accuracyof the result as the other methods.

Note that in fully connected networks SC–CGS delivers a highly accurate result from thebeginning, because within the first iterations all nodes exchange the required information withall other nodes.

Tab. 4.1 summarizes the total communication cost and local memory requirements of thealgorithms. However, due to different parameters it is difficult to rank the approaches in a gen-eral case. The requirements depend especially on the topology of the underlying network, thenumber of iterations I(·) to converge in consensus-based algorithms, or the number of roundsR needed for convergence of push-sum in the gossip-based approaches. For example, in a fullyconnected network R = O(logN) [55] and I(s) = 1. Thus, the SC–CGS requires O(m2N)

8If there is a link, nodes see each other and immediately exchange messages. From a mathematical point ofview, this implies that the weight matrixW is doubly stochastic [48] in every time step.

Page 102: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

90 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

9

8

7

6

5

4

3

2−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmittedmessages(log.scale)

log10

∥I − Q⊤Q

2

(a) Fully connected topology.

9

8

7

6

5

4

3

2−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmittedmessages(log.scale)

log10

∥I − Q⊤Q

2

(b) Geometric topology (d = 8.53)

9

8

7

6

5

4

3

2−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmittedmessages(log.scale)

log10

∥I − Q⊤Q

2

(c) Geometric topology (d = 24.46)

9

8

7

6

5

4

3

2−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmittedmessages(log.scale)

log10

∥I − Q⊤Q

2

(d) Regular topology (d = 5).

Figure 4.12: Total number of transmitted messages in the network vs. orthogonality error(axes are in logarithmic scale log10).

messages sent as well as data exchanged, whereas gossip-based approaches need O(mN logN)

messages and O(m2N logN) data. Note that, G–CGS and G–MGS have theoretically identicalcommunication cost, however, G–MGS is numerically more stable (see Fig. 4.10) and achievesa higher final accuracy (see Figs. 4.12 and 4.13). In case of the DC–CGS and a fully con-nected network, one can interpret the DC–CGS as m consequent static consensus algorithms(one for each column), thus, I(d) = O(m), and the number of transmitted messages is O(mN)

and data O(m3N). Nevertheless, theoretical convergence bounds of the DC–CGS (boundson I(d)), originating from the bounds derived in Sec. 2.6.2, have to be investigated further.

Page 103: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.3. SIMULATIONS 91

9

8

7

6

5−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmitteddata(log.scale)

log10

∥I − Q⊤Q

2

(a) Fully connected topology.

9

8

7

6

5−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmitteddata(log.scale)

log10

∥I − Q⊤Q

2

(b) Geometric topology (d = 8.53).

9

8

7

6

5−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmitteddata(log.scale)

log10

∥I − Q⊤Q

2

(c) Geometric topology (d = 24.46).

9

8

7

6

5−16 −14 −12 −10 −8 −6 −4 −2 0

DC–CGS

SC–CGSG–CGS [112]

G–MGS [120]

transmitteddata(log.scale)

log10

∥I − Q⊤Q

2

(d) Regular topology (d = 5).

Figure 4.13: Total number of transmitted real numbers (data) in the network vs. orthogonalityerror (axes are in logarithmic scale log10).

Total number ofsent messages

Total amount ofdata (real numbers)

Local memoryrequirements per node

DC–CGS N · I(d) N · I(d) · m2+5m2 O(mn/N+m2)

SC–CGS N · I(s) · (m+1)m2 N · I(s) · (m+1)m

2 O(mn/N+m2)

G–CGS [112] N ·R · (2m− 1) N ·R · m2+5m−22 O(nm/N)

G–MGS [120] N ·R · (2m− 1) N ·R · m2+5m−22 O(nm/N)

Table 4.1: Comparison of various distributed QR factorization algorithms. I(d) – number ofiterations of dynamic consensus, I(s) – number of iterations of static consensus, R – numberof rounds per push-sum, N – number of nodes, m – number of columns of the input matrix.

Page 104: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

92 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

4.4 Distributed network size estimation

A typical requirement in distributed algorithms, especially those based on consensus algorithms,is that the number of nodes N is known at each node beforehand.Known methods for obtaining the number of nodes require, however, either some node

tagging, when each node has a predefined ID (by a manufacturer) and nodes store tablescontaining these IDs, or, similarly, an ordered numbering scheme is used, in which the nodesexchange and store only the maximal value (max-consensus). Obviously, the drawback of thesemethods is a necessity of pre-set “keys” on nodes [123].Another option is to use an average consensus algorithm. Initializing one node to value 1

while all others to 0, the algorithm converges to 1/N at each node [55, 124]. The drawbackof this method is a preference of one node (leader) to others. The leader can be pre-set, or,found using a max-consensus algorithm [124]. Although it may be argued that this approachis preferably better and simpler than the previous ones, they both possess, fundamentally, thesame disadvantages.Probabilistic approaches have also been proposed (e.g., [125, 126]), where the estimation

accuracy grows with the network size and typically an a-priori knowledge of some parametersis required. Also such algorithms are based on some type of a consensus algorithm, i.e., max-consensus [126], and may have complicated stopping criteria [125]. Nevertheless, these methodscan provide good estimates in case of large and dynamic networks.Here I present an application of the distributed orthogonalization algorithms [24, 28, 119]

for obtaining the number of nodes (network size) N in a WSN without any predefined nodeIDs, non-uniform initialization of the values in the network, or any other a-priori knowledge,with simple local stopping criteria [29].

4.4.1 Determining network size

Applying the ideas from the previous section, where the goal was to factorize a matrixA ∈ R

N×m such that A = QR (with matrix Q = (q1,q2, . . . ,qm) ∈ RN×m containing

orthogonal vectors), I propose an algorithm for finding the number of nodes N in a network.The main idea is the following: since matrix Q ∈ R

N×m is a matrix of orthogonal vectors,there exist at most N independent vectors spanning the linear space RN , i.e., m ≤ N . Thus,finding the maximum number of independent vectors is equivalent to finding the number ofnodes N in the network.

4.4.2 Distributed estimation of network size

Assuming a static synchronous connected network, I propose two algorithms utilizing the“static” (DNS–SC) and dynamic (DNS–DC) consensus. Both algorithms are applications of thealgorithms for distributed Gram-Schmidt orthogonalization which I introduced in the previousSec. 4.2. Hereafter, the vector q∈R

N×1 denotes an approximation of the inner product of twoconsequent vectors qj and qj′ at all the nodes, i.e., q=(q1, q2, . . . , qN )⊤ (as defined in Eq. (4.9)in Alg. 4.2, and respectively ψ(4) in Alg. 4.3 ahead), thus, if qj ⊥ qj′ , then q ≈ (0, 0, . . . , 0)⊤.

DNS–SC

Considering the DNS–SC (see Alg. 4.2), the number of consensus iterations I(s) must be setsuch that the consensus algorithm has sufficiently converged. This can be obtained by predefin-

Page 105: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.4. DISTRIBUTED NETWORK SIZE ESTIMATION 93

Algorithm 4.2: Distributed Network Size estimationusing Static Consensus (DNS–SC)

At each node i initiate two random values (aj,1, aj,2), i.e., from a global point of view the inputmatrix A = (a1,a2) ∈ R

N×2 (m = 2). (Notation [W]i,1:N denotes the i-th row of matrix W,i.e., transmission with the neighboring nodes of node i.)

• Initialization at node i:[q]i ≡ qi = 0, l = 1.

• While |[q]i| < ǫ (0 < ǫ≪ 1) at each node i perform:

1. Repeat for columns j = l, l + 1, . . . ,m

a) At each node [qj ]i = [aj ]i,

b) Repeat for columns j′ = 1, 2, . . . , j − 1:Using a consensus algorithm, compute the sum of [s(1)]i , [qj′ ]i[aj ]i (approximation ofthe inner product 〈qj′ ,aj〉), i.e., at each node i after I(s) consensus iterations obtain

[R(i)]j′,j =[

W(I(s))]

i,1:Ns(1),

[qj ]i = [qj ]i − [qj′ ]i[R(i)]j′,j .

c) Using a consensus algorithm, compute the sum of [s(2)]i , [qj ]i[qj ]i (approximation of‖qj‖22), i.e., at each node i after I(s) consensus iterations obtain

[R(i)]j,j =[

W(I(s))]

i,1:Ns(2),

[qj ]i = [qj ]i/√

[R(i)]j,j .

2. Using a consensus algorithm, compute sum of [s(3)]i , [qj ]i[qj−1]i (approximation of theinner product 〈qj ,qj−1〉, i.e., at each node i after I(s) consensus iterations obtain

[q]i =[

W(I(s))]

i,1:Ns(3) (4.9)

3. At each node i = 1, 2, . . . , N add a new random number ci, globally i.e.,

a) l = m, b) m = m+ 1, c) am = (c1, c2, . . . , cN )⊤.

• The detected number of nodes N is equal to m.

ing the number of iterations, or, by observing the difference of values (error) in consecutiveiterations. I discuss the influence of number of iterations I(s) in Sec. 4.4.3. The necessity ofwaiting for the consensus algorithm to converge before computing the next step (see Steps 1b,1c, and 2 in DNS–SC) may be considered as a drawback. On the other hand, this approachis robust and if the synchronicity among the nodes is guaranteed, the number of the nodes isalways obtained exactly (see the simulations in Sec. 4.4.3 ahead).As the local stopping criteria at node i for the algorithm I propose two approaches:

(1) observe the first occurrence of a non-orthogonal vector, i.e., when |[q]i| ≥ ǫ (denoted asDNS–SC; Alg. 4.2);

(2) locally at each sensor i compute the local factorization error, ‖ai − qiR(i)‖, (denoted as

DNS–SC–b in the Simulation Sec. 4.4.3).

In the case of DNS–SC–b there is no need to compute q (cf. (4.9)), thus, the overall numberof sent messages in each node is reduced from O(I(s)(N −1)(N +2)) to O(I(s)(N −1)(N +1)).

Page 106: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

94 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

Note that similarly as in [124] the average consensus algorithm in DNS–SC can be easilyreplaced by any other aggregation algorithm, e.g., an asynchronous push-sum algorithm [55,119], and thus making the algorithm asynchronous.

DNS–DC

To reduce the number of necessary transmissions, a novel orthogonalization algorithm based onthe dynamic consensus algorithm was proposed (see Sec. 4.2). Originating from this approach,I present an algorithm for obtaining the number of nodes (denoted as DNS–DC; see Alg. 4.3):

Algorithm 4.3: Distributed Network Size estimationusing Dynamic Consensus (DNS–DC)

At each node i initiate two random values (ai,1, ai,2), i.e., from a global point of view the inputmatrix A = (a1,a2) ∈ R

N×2 (m = 2).

• Initialization at node i (∀j = 1, 2, . . . ,m; k = 0):

[uj(0)]i = [aj ]i, [qj(0)]i = [aj ]i,[uj(0)]i = [A A]i,j , [qj(0)]i = [A A]i,j ,[pl(0)]i = [A⊛A]i,l, [q(0)]i = [a1]i[a2]i

• While k < I(d),

1. k = k + 1,

2. Compute locally at each node i (j = 1, 2, . . .m)

[pj(k)]i =

j−1∑

j′=1

[pj′+(j−1)(j−2)/2(k − 1)]i [qj′(k − 1)]i

[qj′(k − 1)]i

[uj(k)]i = [aj ]i − [pj(k)]i,

[qj(k)]i =[uj(k)]i√[uj(k−1)]i

.

3. At each node i = 1, 2, . . . , N store a row of Q(k), i.e.,

qi(k) = ([q1(k)]i, [q2(k)]i, . . . , [qm(k)]i),

4. Aggregate data for the dynamic consensus

[ψ(1)j ]i = [uj(k − 1)]i + [uj(k) uj(k)]i − [uj(k − 1) uj(k − 1)]i

[ψ(2)j ]i = [qj(k − 1)]i + [qj(k) qj(k)]i − [qj(k − 1) qj(k − 1)]i

[ψ(3)l ]i = [pl(k − 1)]i + [Q(k)⊛A]i,l − [Q(k − 1)⊛A]i,l

[ψ(4)]i︸ ︷︷ ︸

= [q(k − 1)]i︸ ︷︷ ︸

+ [qm(k) qm−1(k)]i − [qm(k − 1) qm−1(k − 1)]i︸ ︷︷ ︸

[Ψ(k)]i = [X(k − 1)]i + [∆S(k)]i

5. Broadcast Ψ(k)=([Ψ(k)]⊤1 , [Ψ(k)]⊤2 , . . . , [Ψ(k)]⊤N

)⊤∈RN×(m2+3m

2 +1) to the neighbors, i.e.,(

U(k), Q(k), P(k),q(k))

︸ ︷︷ ︸

X(k)

= WΨ(k).

6. If (|[q(k)]i| < ǫ ∧ k > m), thenAt each node i = 1, 2, . . . , N add a new random number ci (c = (c1, c2, . . . , cN )⊤; m = m+1)and set k = 0, i.e.,

am = c, um(0) = c c, qm(0) = c c,um(0) = c, qm(0) = c, q(0) = qm(0) qm−1(0).

• The detected number of nodes N is equal to m.

Page 107: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.4. DISTRIBUTED NETWORK SIZE ESTIMATION 95

The main modification to the algorithm from Sec. 4.2 is that here there is no need tocompute orthonormal vectors (orthogonal is sufficient) and matrix R need not to be stored.Also, since the proposed decision rule for adding a new column (new random value at eachnode) is given by the inner product of the last two consequent column vectors qm and qm−1,this has to be computed distributively (see ψ(4)), as well. If the absolute value of the innerproduct |qi(k)| at each node is smaller than a given ǫ, a new value is added and a new vectorqm is found again. If the absolute value of the inner product does not drop in I(d) consensusiterations below ǫ, the algorithm stops and the number of nodes is given by the number ofstored values m at each node. The number of transmitted messages is dramatically reduced,i.e., each node has to broadcast O(I(d)N) number of messages (cf. O(I(s)N2) in DNS–SC).

As mentioned in Sec. 4.3.1, since the factorization error for the DNS–DC behaves in thesame manner as the orthogonality error, observing the factorization error locally at each nodemay serve as a stopping criterion for the orthogonality error. In that case, it is necessary tocompute the full matrix R, but not the value of q(k)(ψ(4) in Step 4 in DNS–DC). Analogouslyto DNS–SC–b, I denote such modified algorithm as DNS–DC–b in the simulations in Sec. 4.4.3ahead.

Note that, as I showed in Sec. 4.3.3, the numerical accuracy of the distributed orthogonal-ization algorithm highly depends on the condition number of the input matrix and networktopology. Therefore, by adding random numbers (ci) to all nodes, there is no guarantee thatthe input (global) matrix A will not be ill-conditioned and thus this may lead to a prematurestopping of the DNS–DC and imprecise determination of the network size, an effect which canbe observed in the simulations.

4.4.3 Simulations

Throughout the simulations, I use the Metropolis weight matrixW [28,51]. Note that also forobtaining these weights the knowledge of N is not necessary. Also, I set I , I(s) = I(d).

A typical behaviour of the orthogonality error is depicted in Fig. 4.14, where it can beobserved that after some iterations the orthogonality error does not decrease anymore. Thismeans that after adding the 9th column, the algorithm cannot find within the last I iterationsany vector orthogonal to the previous vectors Q and stops.

I further study the number of correctly detected number of nodes while changing the size ofthe network from N =2 to N =120, for two different topologies – complete (fully connected)9

and geometric (randomly deployed nodes communicating only with the nodes within someradius (WSN)). I consider connected, static networks with synchronous transmissions. In caseof a geometric topology I observe, e.g., in Fig. 4.15c, that with increasing number of nodes theperformance of the algorithms DNS–DC and DNS–DC–b worsens. For a complete topology,the performance depicted in Fig. 4.15a and Fig. 4.15b is independent from the number ofnodes (lines overlap). Algorithms DNS–SC and DNS–SC–b perform well for all cases. It canbe also observed that the stopping criterion ǫ influences the algorithm precision. The difference

9A complete topology is simulated only to allow for comparisons. Naturally, it is straightforward to obtaina number of nodes in a complete topology just by counting the received messages in every node.

Page 108: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

96 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

23

4 5 6

78

−6

−4

−2

0

2

4

6

0 200 400 600 800 1 000

I(d)

iterations

log10‖N

I−Q(k)⊤

Q(k)‖

2

Figure 4.14: Example of the evolution of the orthogonality error (DNS–DC) in a geometricnetwork with average degree d=4, N=8, ǫ = 0.0001.

between the two approaches is caused by the usage of the dynamic consensus algorithm and,as mentioned in Sec. 4.4.2, by its undesirable numerical properties. In Fig. 4.15 I compare theproposed algorithms with a state-of-the-art probabilistic algorithm [126], which uses a max-consensus, with an overall number of transmitted numbers O(MIm). Parameters M and γ aredefined in [126] and influence the maximum likelihood estimation. Parameter Im is the numberof max-consensus iterations and may differ from I (see [126]).

Furthermore, in Tab. 4.2 and Tab. 4.3 I show the dependence on the number I of iterationsof the consensus algorithms. The shown error is a relative, average detection error over 12independent initializations for the given range of number of nodes (N ∈ (0, 40), N ∈ (41, 80),

N ∈ (81, 120)) and defined as Einit |N−N |N . It can be observed from Tab. 4.2 that for a

complete topology, the correct number is obtained for very few consensus steps I. On theother hand, in case of a geometric topology, Tab. 4.3, even after many consensus iterations,the relative detection error decreases only very slowly or remains constant. In both cases thebest performance is achieved for N < 40, as can also be seen from Fig. 4.15a–Fig.4.15f. Theaverage relative error of Terelius’ algorithm [126] remains almost constant (∼ 8% for selectedparameters M = 100, γ = 0).

Observe that the DNS–SC and DNS–SC–b perform always better than DNS–DC andDNS–DC–b but at the cost of higher communication. The estimation error in case of DNS–DCand DNS–DC–b is caused by the instability of the algorithm since by adding random numbersthere is no guarantee that the global factorized matrix has appropriate numerical properties.

Page 109: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.4. DISTRIBUTED NETWORK SIZE ESTIMATION 97

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Terelius [126]

DNS–SC

DNS–SC–b

DNS–DC

DNS–DC–b

true N

detectedN

(a) Complete topology; ǫ = 0.001, I = 5.

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Terelius [126]

DNS–SC

DNS–SC–b

DNS–DC

DNS–DC–b

true NdetectedN

(b) Complete topology; ǫ = 0.0001, I = 5.

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Terelius [126]

DNS–SC

DNS–SC–b

DNS–DC

DNS–DC–b

true N

detectedN

(c) Geometric topology; ǫ = 0.0001, I = 100.

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Terelius [126]

DNS–SC

DNS–SC–b

DNS–DC

DNS–DC–b

true N

detectedN

(d) Geometric topology; ǫ = 0.0001, I = 1000.

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Terelius [126]

DNS–SC

DNS–SC–b

DNS–DC

DNS–DC–b

true N

detectedN

(e) Geometric topology; ǫ = 0.00001, I = 100.

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Terelius [126]

DNS–SC

DNS–SC–b

DNS–DC

DNS–DC–b

true N

detectedN

(f) Geometric topology; ǫ = 0.00001, I = 1000.

Figure 4.15: Detected number of nodes vs. number of true nodes; 95% CI based on 12 runs.For Algorithm [126] (Terelius): M=100, γ=0, Im=I.

Page 110: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

98 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

NI 5 20 80 100

DNS–SC10–40 0 0 0 041–80 0 0 0 081–120 0 0 0 0

DNS–SC–b10–40 0 0 0 041–80 0 0 0 081–120 0 0 0 0

DNS–DC10–40 0.0142 0.0142 0.0345 0.031341–80 0.0017 0.0017 0.0016 0.001581–120 0.0013 0.0013 0.0013 0.0018

DNS–DC–b10–40 0.0412 0.0412 0.0187 0.032741–80 0.0030 0.0030 0.0026 0.003281–120 0.0035 0.0035 0.0035 0.0036

Terelius [126],(M = 100, γ = 0,

Im=I)

10–40 0.0848 0.0839 0.0790 0.08341–80 0.0830 0.0783 0.0819 0.0881–120 0.0802 0.0854 0.0817 0.0833

Table 4.2: Relative estimation error. Complete topology, ǫ=0.0001.

NI 100 500 1 000 2 000

DNS–SC10–40 0.0371 0.0145 0.0245 0.007541–80 0 0 0 081–120 0 0 0 0

DNS–SC–b10–40 0.0099 0 0 041–80 0 0 0 081–120 0 0 0 0

DNS–DC10–40 0.0386 0.0188 0.0253 0.018441–80 0.1559 0.1494 0.1364 0.118881–120 0.3226 0.3115 0.3005 0.2908

DNS–DC–b10–40 0.0183 0.0038 0.0025 0.002541–80 0.1610 0.1495 0.1322 0.112281–120 0.3478 0.3266 0.3373 0.3085

Terelius [126],(M = 100, γ = 0,

Im=I)

10–40 0.0966 0.0834 0.0828 0.080041–80 0.0806 0.0824 0.0847 0.078781–120 0.0825 0.0868 0.0840 0.0872

Table 4.3: Relative estimation error. Geometric topology, ǫ=0.0001.

Page 111: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

4.5. CONCLUSION 99

4.5 Conclusion

In the following table I summarize the advantages and disadvantages of the algorithms proposedin this chapter.

Advantages Disadvantages

DC–CGS • Small number of transmitted mes-sages required.

• The whole matrix Q and R com-puted simultaneously ⇒ local stop-ping condition.

• Robust to link failures.

• High message payload in comparisonto other algorithms.

• Sensitive to the network topologyand synchronicity.

• Sensitive to condition number of theinput matrix.

DNS–SC • No a-priori knowledge needed at thenodes.

• Number of nodes in a network esti-mated more precisely than DNS–DCand DNS–DC–b.

• Higher communication cost than theDNS–DC and the DNS–DC–b.

• Based on the “static” average con-sensus ⇒ necessity to wait for theconsensus to converge before com-puting the next column vector (nointermediate result).

• Suitable only for static networks.

DNS–SC–b • Slightly lower communication costthan in the DNS–SC → stoppingcondition based on the factorizationerror.

• Necessity to compute matrix R.

DNS–DC • No a-priori knowledge needed at thenodes.

• Based on the dynamic average con-sensus algorithm⇒ intermediate re-sult

• Low number of transmitted mes-sages.

• Number of nodes in a network esti-mated less precisely than DNS–DCand DNS–DC–b.

• Higher message payload than in theDNS–SC and the DNS–SC–b.

• Suitable only for static networks.

DNS–DC–b • Slightly lower communication costthan in the DNS–DC → stoppingcondition based on the factorizationerror.

• Necessity to compute matrix R.

As I showed, the dynamic consensus algorithm used in the DC–CGS brings many advan-tages, but also disadvantages. The simultaneous computation of all the elements of the matricesQ and R allows to transmit very low number of messages (with high payload) and to evaluatethe orthogonality and factorization error at any time step. Subsequentially, this leads to asimple local stopping condition. On the other hand, it also brings additional sensitivity issues

Page 112: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

100 CHAPTER 4. DISTRIBUTED GRAM-SCHMIDT ORTHOGONALIZATION

to the Gram-Schmidt orthogonalization method. Especially, the influence of the network topol-ogy and data distribution on the algorithm performance is crucial. Therefore, a more robustversion of the algorithm, possibly not originating from the classical Gram-Schmidt method, orusing a different dynamic consensus algorithm, would be of interest. Also the influence of thecondition number on the factorization error needs to be investigated further. Also, analysis ofthe orthogonalization of complex-valued vectors is of further investigation.The algorithms for finding the number of nodes in a network – DNS–SC and DNS–DC (and

their “–b” versions) – present a straightforward application of the distributed orthogonalizationmethods. Although, they require no knowledge about network, and each node can locallyevaluate the number of estimated nodes, the methods are not suitable for networks with time-varying number of nodes. Such method would have to be capablo of “tracking” the changingdimension of the linear vector space. The solution to this problem remains an open issue.

Page 113: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Chapter 5

Conclusions and Outlook

In this thesis I focused on the convergence analysis of the consensus algorithms and their ap-plication in other more sophisticated distributed algorithms. Although consensus algorithmshave been studied many times in literature, with emerging new technologies and a spread ofdevices with big distributed computational power, they regained wide research interest. In thepast few years, an increased effort in designing and analysis of distributed algorithms can beobserved in many different areas – from control theory, computer science, to signal processing,economics, or even social studies. In my thesis I answered some previously open questions andprovided a novel insight on various problems regarding distributed algorithms and in particulardistributed consensus algorithms.

Many previous studies have proposed different methods for finding optimal mixing weightsto reach fastest convergence, estimate bounds on convergence speed, or find optimal quantiza-tions. They have achieved this typically by referring to the results from spectral graph theory.Optimization and analysis of more complicated distributed algorithms in a general frameworkhave been barely investigated. Nevertheless, the need for having a “language” which couldhelp to optimize distributed algorithms from the hardware point of view (e.g., similarly toVHDL [127]) has been substantial. That was also my original motivation for creating theframework presented in Chapter 2. The formulation of a quantized distributed algorithm inthis formalism led to novel bounds on the quantization error. Nevertheless, further implicationsand questions of practicality of the framework remain open.

In distributed consensus algorithms, it is typical (necessary) to consider synchronouslyworking nodes. However, in practical applications, this may be a substantial issue. Since Ifocused mostly on the analysis of consensus algorithms considering real implementation issues,I also looked into consensus algorithms with asynchronous data exchange. While the resultsfrom classical spectral graph theory are not directly applicable here, I proposed a very simplenotion of state transitions. Having each transmission described, from the global point of view,by a simple state-transition matrix, I looked at the convergence conditions of iterations definedby such matrices. Substantially, this resulted in necessary vector-space conditions on the con-vergence for any asynchronous consensus algorithm. Yet, having only a necessary condition,I found further conditions on the mixing parameters of such asynchronous algorithm. With

101

Page 114: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

102 CHAPTER 5. CONCLUSIONS AND OUTLOOK

help of the relaxed projection mapping algorithm, I successfully solved this question by inter-preting the state-transition matrices as projections onto some vector spaces. Simple boundson the mixing parameters followed then straightforwardly. Nevertheless, the convergence ofasynchronous distributed consensus algorithms depends on many factors, e.g., the order andprobability in which the transmissions occur, and therefore sufficient conditions for the con-vergence are non-trivial to find and are not discussed in this thesis.In Chapter 2, I also defined and briefly analyzed the dynamic consensus algorithm.

Following a known proof for convergence of a continuous dynamic average consensus algorithm,I presented a proof of a discrete version in terms of Z-transform. For this algorithm I also pro-vided novel bounds on the convergence time and rate. Moreover, a generalized version of thedynamic consensus algorithm has also been proposed, for which, however, a proper convergenceanalysis needs still to be carried out.

Despite their simple behaviour, in the next chapters I showed that consensus algorithms arewell suited for designing any distributed algorithms which require to learn some common globalknowledge. Basically any algorithm which contains a summation or averaging of some values,can be made distributed using a consensus algorithm. However, a straightforward usage of theaverage consensus algorithm may not lead to optimal distributed algorithms. For example, byemploying the dynamic consensus algorithm instead of the classical consensus algorithm, theresulting distributed algorithm may gain more favorable features. A designer of distributedalgorithm should have this in mind.Therefore, in Chapter 3, I first looked at the application of classical average consensus

algorithm and later on I extended the algorithm by using the dynamic consensus algorithm.Utilizing these consensus algorithms, a novel method for distributed computation of a so-calledjoint likelihood function was found. This later served as a building block for a novel distributedtarget tracking algorithm which outperforms in many regards many state-of-the-art distributedtarget tracking applications.

In Chapter 4, I proposed yet another original algorithm applying a dynamic consensus al-gorithm. By rewriting the well-known classical Gram-Schmidt orthogonalization method intoa form of iteration-dependent quantities, I overcame the biggest drawback of previously knownorthogonalization algorithms – the necessity to wait until the simple consensus algorithm con-verges. As preferred, the whole orthonormal matrix Q and the upper-triangular matrix R areiteratively refined with each consensus iteration. This further led to low a number of transmit-ted messages, which is highly preferable in distributed algorithms with limited communicationcapability. Moreover, a straightforward application of the proposed distributed orthogonaliza-tion algorithm was the algorithms for estimating the size of a network. Nevertheless, althoughthe dynamic consensus algorithm brought many advantages, in other aspects the performanceof the orthogonalization algorithm was worse than previously known algorithms.

In the next section, I summarize ideas and open issues for future research which have arisenduring my work.

Page 115: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

5.1. OUTLOOK AND FUTURE RESEARCH 103

5.1 Outlook and future research

As mentioned, the analysis and development of new distributed algorithms has become evenmore demanding in the past few years. Therefore, except for the open issues that I alreadymentioned in the conclusions at each chapter, here I propose few further open and challengingproblems for future research investigation:

• Implementations of different consensus algorithms on various hardware platforms andtheir comparisons with other methods (e.g., gossip-based, consensus propagation), havebeen very rare. Such comparisons could, however, objectively answer important questionsabout which distributed agreement algorithm use in which scenario.

• Implementation of the proposed algorithms in a physical WSN (Chapter 3 andChapter 4) and a comparison with the simulated results would be also beneficial.

• Although there exist methods for optimal selection of the mixing weight matrix in theclassical average consensus algorithm, all known weights require some knowledge aboutthe network (number of nodes, number of neighbors, etc.). Nevertheless, weights selectionthat would not require any network knowledge, possibly based only on the received data,is a challenging problem.

• Further analysis and optimal selection of the parameters in case of the generalized dy-namic consensus algorithm is also of future interest.

• In case of the dynamic average consensus algorithm, an asynchronous version would bevery desirable, since it could lead to many new interesting algorithms. The comparisonwith the so-called diffusion algorithms [128] is also a topic for future research.

• Currently, there are almost no consensus algorithms which would be derived from the the-ory of so-called fractional order differential equations [129]. This theory should be,however, able to design asynchronous consensus algorithms straightforwardly. Researchin this direction is, nevertheless, challenging.

• Incorporating asynchronous consensus algorithms (or possibly the diffusion algo-rithms) in the sequential likelihood consensus algorithm and the DC–CGS algorithm,introduced in this thesis, is also of future interest.

• A more robust version of the distributed orthogonalization algorithm, probablynot based on the classical Gram-Schmidt orthogonalization, which would be less sensitiveon the properties of the input matrix is also an unsolved question.

• A transfer of the proposed distributed Gram-Schmidt orthogonalization onto a so-calledNetworks-on-Chip platform [130] could possibly lead to new interesting and practicalapplications of the orthogonalization algorithm.

Page 116: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 117: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Appendices

Page 118: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 119: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Appendix A

Diffusion and Average Consensus

In this chapter I show that the average consensus algorithm physically represents a diffusionprocess.The differential equation of diffusion (heat equation) on a 2D lattice is known to be [131]

∂φ(x, y; t)

∂t= D∇2

x,yφ(x, y; t), (A.1)

where D is the diffusion coefficient and the Laplacian ∇2x,y = ∂2

∂x2 + ∂2

∂y2.

After discretizing the differential equation in time and space, assuming a 2D orthogonal

lattice (xi = iδ, yj = jδ; i, j ∈ Z), i.e.,∂φ(x,y;t)

∂x≈ φ(x+δ,y;t)−φ(x,y;t)

δ

∂2φ(x,y;t)∂x2 ≈ φ((i+1)δ,jδ;t)−2φ(iδ,jδ;t)+φ((i−1)δ,jδ;t)

δ2

∂φ(x,y;t)∂t

∣∣∣t=kǫ

=

φ(x,y;(k+1)ǫ)−φ(x,y;kǫ)ǫ

,

having δ = 1, ǫ = 1 and denoting φ(i, j; k) = φi,j(k), Eq. (A.1) can be rewritten as

φi,j(k + 1) = φi,j(k) + ǫDδ2︸︷︷︸

ε

[φi+1,j(k)− 2φi,j(k) + φi−1,j(k) + φi,j+1(k)− 2φi,j(k) + φi,j−1(k)] ,

φi,j−1

φi+1,j

φi,j+1

φi−1,j φi,j

and stacking all node indices i, j → n leads to

φn(k + 1) = φn(k) + ε∑

m∈Nn

(φm(k)− φn(k))

107

Page 120: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

108 APPENDIX A. DIFFUSION AND AVERAGE CONSENSUS

and finally from a global (network) point of view it can be found

Φ(k + 1) = Φ(k) + ε(AΦ(k)−DΦ(k)) =

= Φ(k)− ε(D−A)Φ(k)

= (I− ηL)︸ ︷︷ ︸

W

Φ(k)

thus obtaining the equation of the average consensus (cf. Eq. (2.3)) withW as in Eq. (2.7):

Φ(k + 1) = WΦ(k).

Page 121: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Appendix B

Spectral Norm

Since throughout this thesis I use many different norms (vector/matrix), I provide here abrief review of the theory of norms. Note that especially the spectral norm (Sec. B.2) is acrucial term in the convergence analysis, due to its connection with the speed of convergence ofa distributed average consensus algorithm.

B.1 Norm

Let S (Cn) be a n-dimensional (linear) vector space with column vectors x ∈ S (∈ C). Thena function p , ‖ · ‖ : S → R is called norm if it has the following properties:

1. ‖x‖ ≥ 0 with ‖x‖ = 0 iff x = 0 (positive definiteness)

2. ‖αx‖ = |α|‖x‖ (scalability)

3. ‖x+ y‖ ≤ ‖x‖+ ‖y‖ (triangle inequality)

Examples:

• lp norm: ‖x‖p = (∑n

i=1 |xi|p)1/p

• l∞ norm: ‖x‖∞ = maxi=1,2,...,n |xi|

• Lp norm: ‖x(t)‖p =(∫ b

a|x(t)|pdt

)1/p

for 1 ≤ p <∞• L∞ norm: ‖x(t)‖∞ = supt∈[a,b] |x(t)|

B.1.1 Induced norm

If a norm is induced (defined/described) by a function, the norm is called an induced norm(by the function; usually an inner product).

Examples:

• A norm induced by an inner product:

‖x‖2 =√

〈x|x〉

109

Page 122: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

110 APPENDIX B. SPECTRAL NORM

⇒ l2 is an induced norm (lp if p 6= 2 is not an induced norm).

• A norm of a matrix can be induced as

‖A‖ = max‖x‖≤1

‖Ax‖

• A norm induced by a weighted inner product:

‖x− y‖2W = (x− y)HW(x− y)

B.1.2 Matrix norm

For a matrix A ∈ Sm×n a matrix norm is a operation with following properties (same as vectornorm):

1. ‖A‖ ≥ 0, ‖A‖ = 0 iff A = 0

2. ‖αA‖ = |α|‖A‖

3. ‖A+B‖ ≤ ‖A‖+ ‖B‖

Moreover, if m = n and the norm satisfies also the property that

‖AB‖ ≤ ‖A‖‖B‖,

then the norm is called a sub-multiplicative norm. General conditions on norms to be sub-multiplicative can be found in [132].

The matrix norm of a matrix A is then a measure of how the changes in vector x are“magnified” by the matrix A (‖Ax‖).

Examples:

• Let us define a norm ‖A‖∗ = max |Aij |, then for A = B = 11⊤ (matrix of all ones),‖AB‖∗ = N > ‖A‖‖B‖ = 1, thus this norm is not a sub-multiplicative norm.

• ‖A‖p = ‖A‖p,induced = supx 6=0

‖Ax‖p‖x‖p

= supx 6=0

∥∥∥∥A

x

‖x‖p

∥∥∥∥p

= sup‖x‖p=1

‖Ax‖p

• ‖A‖1,induced = sup‖x‖1=1

‖Ax‖1

• ‖A‖1 = max1≤j≤n

m∑

i=1

|Aij | ⇒ ‖A‖1 6= ‖A‖1,induced !

• ‖A‖∞ = max1≤i≤m

n∑

j=1

|Aij | (maximum of row sums)

• Frobenius norm ‖A‖F =

√√√√

m∑

i=1

n∑

j=1

|Aij |2 =√

trace(AHA) =√

σ21 + σ2

2 + · · ·+ σ2minm,n

• Spectral norm: if p = 2, m = n the induced matrix norm is called the spectral norm, i.e. thelargest singular value of A (square root of the largest eigenvalue of AHA)

Page 123: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

B.2. SPECTRAL NORM 111

B.2 Spectral norm

A spectral norm of a matrix A is defined as

‖A‖2 = supx 6=0

‖Ax‖2‖x‖2

= sup‖x‖2=1

‖Ax‖2 =√

ρ(AHA) =√

max |λi(AHA)|.

Proof. Having an induced norm (inner product), and for simplicity let us compute ‖A‖22 (squareroot will be introduced in the end), i.e.,

‖A‖22 = sup‖x‖2=1

‖Ax‖2 = sup‖x‖2=1

〈Ax|Ax〉 = sup‖x‖2=1

xH AHA︸ ︷︷ ︸

B

x

For a symmetric B, it is known that the eigenvalues must be real and non-negative, andeigenvectors are orthogonal.Denote the eigenvalues of B by λi and eigenvectors by vi, i.e., Bvi = λivi ⇒ vH

i Bvi =

vHi λivi = λiv

Hi vi = λi.

Further, rewrite x in the orthogonal basis of vectors vi (matrix V) ⇒ x = Vy, with somecoordinates yi. It is clear that to have ‖x‖2 = 1, also ‖y‖2 = 1. This is because V is orthogonal(1 = ‖x‖2 = 〈x|x〉 = 〈Vy|Vy〉 = yH VHV

︸ ︷︷ ︸

I

y = ‖y‖2).

Therefore,

‖A‖2 = max‖x‖2=1

xHAHAx = max‖x‖2=1

xHBx = max‖x‖2=1

xHVHΛVx = maxy; ‖y‖2=1

yHΛy,

i.e., it is necessary to find such vector y which maximizes the quadratic form.

maxy

yHΛy = maxy

λiy2i ≤ λmaxmax

y

y2i︸ ︷︷ ︸

=1

= λmax(= λ1)

i.e., ymax = (1, 0, . . . , 0)⊤.Thus,

‖A‖2 =√

maxi

|λi(AHA)| ,√

ρ(AHA)

where λi are the eigenvalues of AHA.

In case of AH = A, ‖A‖2 =√

ρ(A)ρ(A) = ρ(A), where ρ(A) is so-called spectral radiusof matrix A.

Page 124: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 125: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Appendix C

Perron-Frobenius Theorem

In this chapter I review the Perron-Frobenius theorem for non-negative matrices which servesas a base for the classical proof of convergence of average consensus algorithms and providessome insights on properties of the weight matrices used in average consensus algorithms (seeSec. 2.1). The first results concerning positive matrices were due to Oskar Perron [133] whichwere later extended by Ferdinand Georg Frobenius [134] to non-negative irreducible matrices.Although the Perron-Frobenius theorem applies to any non-negative matrices, for the sake ofthis thesis, it is extremely intriguing to reveal the connection between the general theory ofmatrices (Perron-Frobenius theorem) and the graphs corresponding to these matrices. Thus,surprisingly, from properties of matrices, a lot about properties of graphs (networks) can alsobe concluded.A positive matrix W is a matrix with all elements larger than zero, i.e., [W]ij > 0, ∀i, j,

or shortly W > 0. Non-negative matrices may further contain also zero elements, i.e.,[W]i,j ≥ 0, ∀i, j; shortlyW ≥ 0. Here, I always consider a real square matrix: W ∈ R

N×N .A permutation matrix P is a square matrix which contains exactly one element 1 in each

row and each column [62]. Such matrix is clearly orthogonal, i.e., P⊤ = P−1.An irreducible matrixW is a matrix, for which there is no permutation matrix P which per-

mutes the coordinates into upper-triangular matrix, i.e., PWP−1 6=(

W1 W2

0 W3

)

[45, 47,62,135]. This, in turn, means that for a matrixW ≥ 0, for every i, j there exists somek > 0 (possibly depending on i, j) such that [Wk]ij > 0. Considering W to be a (weighted)adjacency matrix, from graph theory it is known that this is true if and only if there is a pathof length k in the graph from node i to j (graph is strongly connected) [47,136].The period of an irreducible matrixW is the greatest common divisor of the integers k for

which [Wk]ii > 0 (independent from i), that is, the greatest common divisor of the lengths ofloops in the corresponding graph (number of edges along the path from node i back to itself.).A primitive matrix W is a square non-negative matrix some power of which is strictly

positive, i.e., for W ≥ 0 there is some k, for Wk > 0 (for all i, j). The difference between aprimitive and an irreducible matrix is that every primitive matrix is irreducible and every irre-ducible matrix of period 1 (aperiodic) is primitive. There exist also other equivalent conditionsfor checking if a matrix is irreducible and primitive [62,137].

113

Page 126: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

114 APPENDIX C. PERRON-FROBENIUS THEOREM

Example 1: MatrixW1 =

(1 0

1 1

)

is non-negative, but reducible, since there is a P =

(0 1

1 0

)

,

or, it is clear that Wk1 =

(1 0

k 1

)

(i.e., [W1]1,2 = 0 for any k). Also, the corresponding graph is

not strongly connected (see Fig. C.1a).

On the other side, matrix W1 =

(1 1

1 0

)

is irreducible, since, e.g., W21 =

(2 1

1 1

)

(i.e.,

[Wk1 ]i,j > 0 for all i, j at some time k). Also, the corresponding graph is strongly connected.

Therefore, since for k = 2, for all i, j, W21 > 0, the matrix W1 is a primitive matrix (see Fig. C.1b).

1 2G(W1)

(a)W1 is not strongly connected, thus, it is re-ducible(there is no path from Node 1 to Node 2)

1 2G(W1)

(b) W1 is strongly connected, thus is irreducibleand also primitive (Node 1 can reach itself inone hop, node 2 in two hops. Thus, the pe-riod of the irreducible matrix is 1, therefore itis primitive.).

Figure C.1: Example 1.

Example 2: Matrix W2 =

0 1 0

1 0 1

0 1 0

is irreducible, since the corresponding graph is

strongly connected (see Fig. C.2a). Or, it can be checked that W22 =

1 0 1

0 2 0

1 0 1

, thus

if k1 = 1, Wk12 > 0 for (i, j) = (1, 2), (2, 1), (2, 3), (3, 2), and if k2 = 2, Wk2

2 > 0 for(i, j) = (1, 1), (1, 3), (2, 2), (3, 1), (3, 3), i.e., (Wk1

2 +Wk22 ) > 0, ∀i, j. However, since k1 6= k2 and

also from Fig. C.2a (greatest common divisor of the lengths of the loops is 2), it can be concludedthat the matrixW2 is not primitive.

However, if one self-loop is added, i.e., W2 =

1 1 0

1 0 1

0 1 0

, and take k = 4, W42 =

6 4 3

4 5 1

3 1 2

,

it can be found that the matrix W42 > 0 and thus primitive (also the greatest common divisor of

lengths of the loops is 1; see Fig. C.2b).

1 2

3

G(W2)

(a)W2 is strongly connected, thus, it is irreducible.Observe that each node reaches itself in 2 hops(period ofW2 is 2), thus, the matrix is not prim-itive.

1 2

3

G(W2)

(b) W2 is primitive, since the Node 1 reaches itself inone hop, Node 2 and Node 3 in two hops, periodis therefore 1, and the irreducible matrix is alsoprimitive.

Figure C.2: Example 2.

Page 127: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

115

Theorem 3.10 (Perron-Frobenius theorem, [47]). Let W ≥ 0 be an irreducible matrix.

Then there is a (unique) positive real number λ0 with the following properties:

(i) There is a real vector u > 0 with Wu = λ0u.

(ii) λ0 has geometric and algebraic multiplicity 1 (λ0 is geometrically and algebraically sim-

ple).

(iii) λ0 is maximal in modulus of all eigenvalues of W, i.e., for each eigenvalue λj of W it

holds that |λj | ≤ λ0.

If W is primitive, then |λj | = λ0. In general, if W has period d, then W has precisely

d eigenvalues λj with |λj | = λ0, namely λj = λ0e2πij/d, j = 0, 1, . . . , d − 1. In fact,

the entire spectrum ofW is invariant under rotation of the complex plane over an angle

2π/d about the origin.

(iv) Any nonnegative left or right eigenvector ofW has eigenvalue λ0.

Suppose t ∈ R, and x ≥ 0,x 6= 0.

If Wx ≤ tx, then x > 0 and t ≥ λ0; moreover, t = λ0 if and only if Wx = tx.

If Wx ≥ tx, then t ≤ λ0; moreover, t = λ0 if and only ifWx = tx.

(v) If 0 ≤ S ≤ W or if S is a principal minor ofW, and S has eigenvalue σ, then |σ| ≤ λ0;

if |σ| = λ0, then S = W.

(vi) Given a complex matrix S, let |S| denote the matrix with elements [|S|]ij = |sij |. IfS ≤ W and S has eigenvalue σ, then |σ| ≤ λ0. If equality holds, then |S| = W, and

there are a diagonal matrix E with diagonal entries of absolute value 1 and a constant c

of absolute value 1, such that S = cEWE−1.

Proof. Due to the fact that the proof is non-trivial and lengthy, I refer the reader to thefollowing references: Standard proofs are provided, for example, in [45, 47, 138–141]. A simpleproof for the case of positive symmetric matrices was provided by [142]. A proof using a variantof the so-called inverse power method is provided in [143]. A geometric proof based on theBrouwer fixed-point theorem can be found in [144]. Last but not least, MacCluer in [145] madean overview of the proofs.

Note that if I compare the eigenvalues of the examples, i.e.,

λ(W1) = (1, 1)(SinceW1 is not irreducible,

the theorem does not hold.)λ(W1) = (1.618,−0.618)

λ(W2) = (1.4142, 0,−1.4142) λ(W2) = (1.8019, 0.445,−1.247)

it can be observed that the theorem holds (e.g.,W2: (iii) for not primitive case with period 2;λj =

√2e2πij/2 = ±

√2.).

Page 128: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received
Page 129: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Appendix D

Particle Filter

Particle filtering is a Monte Carlo approximation of optimal sequential Bayesian state esti-mation (see Sec. 3.1) [16, 98, 104, 146]. The main task of sequential Bayesian estimation isto estimate the state xk recursively from the observations zk. In general, there are threeprobability density functions of interest:

1. state-transition pdf f(xk|xk−1) – conditional pdf of the current state given the previousstate. This pdf models the assumed behaviour of the system.

2. likelihood pdf f(zk|xk) – conditional pdf of the current measurement given the state.This pdf describes the assumed measurement model (see Sec. 3.1).

3. posterior pdf f(xk|z1:k) – conditional pdf of the current state given all past and presentmeasurements. From f(xk|z1:k) one can calculate various estimates of some functionF(xk) given z1:k, e.g., the MMSE estimator of F(xk) = xk is obtained as

xMMSEk , Exk|z1:k =

xkf(xk|z1:k)dxk (k = 1, 2, . . . ). (D.1)

This measure is typically unknown and needs to be calculated.

As mentioned in Sec. 3.1 the current posterior pdf can be calculated from the previousposterior f(xk−1|z1:k−1). This recursion consists of two steps: prediction and update.In the prediction step, the “predicted” posterior f(xk|z1:k−1) is calculated from the previous

posterior f(xk−1|z1:k−1) using the state-transition pdf f(xk|xk−1) according to

f(xk|z1:k−1) =

f(xk|xk−1)f(xk−1|z1:k−1)dxk−1. (D.2)

In the update step, the predicted posterior pdf f(xk|z1:k−1) is “updated” by the currentmeasurement to the current posterior pdf f(xk|z1:k) according to

f(xk|z1:k) =f(zk|xk)f(xk|z1:k−1)

f(zk|z1:k−1)(D.3)

where the normalizing constant

f(zk|z1:k−1) =

f(zk|xk)f(xk|z1:k−1)dxk.

Note that the recursion Eq. (D.3) is initialized by the prior pdf f(xk|z1:k)∣∣k=0

= f(x0).

117

Page 130: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

118 APPENDIX D. PARTICLE FILTER

Typically, it is computationally infeasible to directly calculate Eqs. (D.1)–(D.3). The ex-ception is the case of linear state-transition and measurement models with Gaussian noises anda Gaussian prior pdf. In such case the Kalman filter directly updates the mean and the covari-ance of these pdfs in a recursive manner and the mean directly yields the MMSE estimate [104].

Particle filter algorithm

For general, nonlinear/non-Gaussian case, the so-called Particle Filter (PF) provides a solutionto the problem. The key idea behind PF is to approximate the posterior pdf by a sequence ofJ discrete weighted Dirac delta functions (i.e., Monte Carlo representation) [104,146], i.e.,

fδ(xk|z1:k) ,J∑

j=1

w(j)k δ(xk − x

(j)k ). (D.4)

With such approximation, computations of expectations (which involve complicated integrals)are simplified to summations, e.g.,

xMMSEk ≈ xk ,

xkfδ(xk|z1:k)dxk =J∑

j=1

w(j)k

xkδ(xk − x(j)k )dxk =

J∑

j=1

w(j)k x

(j)k . (D.5)

The problem then remains in finding the appropriate “particles” x(j)k , which are mostly located

in regions with significant probability mass, and the associated weights w(j)k . Essentially, the

PF performs a recursive updating of the previous particles x(j)k−1 and weights w

(j)k−1 using the

current measurements zk. More specifically, the particle representation (x(j)k , w

(j)k )Jj=1 is used

to approximate the prediction step (D.2) and update step (D.3). This is obtained by means ofsequential importance sampling.The principle of importance sampling is the following. Since the posterior pdf f(xk|z1:k)

is unknown, it is impossible to generate a sample x(j)k . Nevertheless, suppose that another

(proposal) pdf q(xk|z1:k) is known, which is different from f(xk|z1:k), but from which it ispossible to generate samples x(j)

k (having the current measurement zk), such that

x(j)k ∼ q(xk|x(j)

k−1, z1:k) = q(xk|xk−1, z1:k)∣∣xk−1=x

(j)k−1

.

To “fit” the original pdf f(xk|z1:k) at “points” x(j)k , the weights w

(j)k need to be found

such that w(j)k ∝ f(x

(j)k

|z1:k)q(x

(j)k

|z1:k). Observe that the pair (x(j)

k , w(j)k )Jj=1 represents the posterior

pdf f(xk|z1:k). It can be derived1 that this same pair represents also the marginalized jointposterior pdf f(xk,xk−1|z1:k) (marginalization with respect to xk−1 and notice that the weight

w(j)k remains the same, i.e., w(j)

k ∝ f(x(j)k

|z1:k)q(x

(j)k

|z1:k)∝ f(x

(j)k

,x(j)k−1|z1:k)

q(x(j)k

,x(j)k−1|z1:k)

. Since the proposal pdf q(x) is

1Having f(xk,xk−1|z1:k) represented by (x(j)k ,x

(j)k−1), w

(j)k Jj=1, it follows that:

f(xk,xk−1|z1:k)dxk−1 ≈∫

j

w(j)k δ((xk,xk−1)−(x

(j)k ,x

(j)k−1))dxk−1=

j

w(j)k δ(xk−x

(j)k )

δ(xk−1 − x(j)k−1)dxk−1

︸ ︷︷ ︸

=1

=∑

j

w(j)k δ(xk−x

(j)k ).

Thus, after the marginalization (dropping out the past particles x(j)k−1), the weight w

(j)k does not change.

Page 131: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

119

chosen such that it factorizes as q(xk,xk−1|z1:k) = q(xk|xk−1, zk)q(xk−1|z1:k−1), the followingrecursion holds

w(j)k ∝ f(x

(j)k

|z1:k)q(x

(j)k

|z1:k)∝ f(x

(j)k

,x(j)k−1|z1:k)

q(x(j)k

,x(j)k−1|z1:k)

“Eq. (D.3)′′∝ f(zk|xk,xk−1)f(xk,xk−1|z1:k−1)

q(x(j)k

,x(j)k−1|z1:k)

“chain rule′′∝ f(zk|x(j)k

)f(x(j)k

|x(j)k−1)

q(x(j)k

|x(j)k−1,zk)

· f(x(j)k−1|z1:k−1)

q(x(j)k−1|z1:k−1)

∝ f(zk|x(j)k

)f(x(j)k

|x(j)k−1)

q(x(j)k

|x(j)k−1,zk)

w(j)k−1.

To obtain the proper pdf, the weights have to be further normalized [98], i.e., w(j)k =

w(j)k

∑Jj′=1 w

(j′)k

.

There exist many modifications of particle filters [92,104,108], with different number of par-ticles J required for accurate precision, and various selections of the proposal pdf. Here I recallthe basic PF algorithm [104], also referred to as the Sequential Importance Resampling (SIR)filter (see Fig. D.1), which computes an approximation of the MMSE estimate xMMSEk :

Particle Filter Algorithm

(Sequential Importance Resampling (SIR))

At time k = 0, the algorithm is initialized by J particles x(j)0 (j ∈ 1, 2, . . . , J), drawn from

the prior pdf f(x0). The weights are initially equal, i.e., w(j)0 = 1/J .

At time k ≤ 1, the following steps are repeated:

1. Prediction step: For each previous particle x(j)k−1, a new particle x

(j)k is sampled from a

suitably chosen proposal pdf q(xk|x(j)k−1, zk) ≡ q(xk|xk−1, zk)

∣∣xk−1=x

(j)k−1

.

2. Update step: Nonnormalized weights associated with the particles x(j)k drawn in Step 1

are calculated according to

w(j)k = w

(j)k−1

f(zk|x(j)k )f(x

(j)k |x(j)

k−1)

q(x(j)k |x(j)

k−1, zk), j ∈ 1, 2, . . . , J. (D.6)

The weights are then normalized according to

w(j)k =

w(j)k

J∑

j′=1

w(j′)k

. (D.7)

The set of particles and weights (x(j)k , w

(j)k )Jj=1 represents the current posterior

f(xk|z1:k).3. Calculation of estimate: From the particles and weights (x(j)

k , w(j)k )Jj=1, an approxi-

mation of the MMSE state estimate is computed according to (D.5), i.e.,

xMMSEk =

J∑

j=1

w(j)k x

(j)k . (D.8)

4. Resampling : The set (x(j)k , w

(j)k )Jj=1 can be resampled if needed (due to so-called

degeneracy problem, where after few iterations, all but one particle will have negligibleweight, see e.g. [98]). The resampled particles are obtained by sampling with replace-ment from the set x(j)

k Jj=1, where x(j)k is sampled with probability w

(j)k . This produces

J resampled particles x(j)k ; the weights are redefined to be identical, i.e., w

(j)k = 1/J .

Page 132: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

120 APPENDIX D. PARTICLE FILTER

Sampling (prediction) → Step 1

Computing of the importance

weights → size of “ ” → Step 2

Resampling (Moving of the par-ticles) → Step 4

f(xk|z1:k)

particlerepresentationof f(xk|z1:k)

f(xk+1|z1:k+1)

particlerepresentation off(xk+1|z1:k+1)

q(x|z1:k)′k=0′≡ f(x0)

Figure D.1: Principle of sequential importance resampling (SIR).

Page 133: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

Bibliography

[1] S. Ghosh. Distributed Systems An Algorithmic Approach. Chapman & Hall/CRC, 2007.

[2] G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Conceptsand Design. Addison Wesley, 5th. edition, 2011.

[3] M. H. DeGroot. Reaching a Consensus. Journal of the American Statistical Association,69(345), Mar. 1974.

[4] A. Okubo. Dynamical aspects of animal grouping: Swarms, schools, flocks, and herds.Advances in Biophysics, 22:1–94, 1986.

[5] D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature,393:440–442, Jun. 1998.

[6] J. Lin, A. S. Morse, and B. D. O. Anderson. The multi-agent rendezvous problem. InProc. of the 42nd IEEE Conference on Decision and Control, pages 1508–1513, 2003.

[7] J. Li and A. H. Sayed. Modeling bee swarming behavior through diffusion adaptation withasymmetric information sharing. EURASIP Journal on Advances in Signal Processing,18(1):17, 2012.

[8] R. Shorey, A. Ananda, M. C. Chan, and W. T. Ooi. Mobile, Wireless, and SensorNetworks: Technology, Applications, and Future Directions. John Wiley & Sons, Inc.,Hoboken, NJ, 2006.

[9] W. Dargie and C. Poellabauer. Fundamentals of Wireless Sensor Networks: Theory andPractice. John Wiley and Sons, Ltd., 2010.

[10] A. Nayak and I. Stojmenovic. Wireless Sensor and Actuator Networks: Algorithms andProtocols for Scalable Coordination and Data Communication. John Wiley & Sons, Inc.,Hoboken, NJ, 2010.

[11] N. M. Freris, H. Kowshik, and P. R. Kumar. Fundamentals of large sensor networks:Connectivity, capacity, clocks, and computation. Proceedings of the IEEE, 98(11):1828–1846, Nov. 2010.

[12] D. Shah. Gossip Algorithms. Foundations and Trends R© in Networking, 3(1):1–125, 2009.

121

Page 134: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

122 BIBLIOGRAPHY

[13] R. Olfati-Saber, J. A. Fax, and R. M. Murray. Consensus and cooperation in networkedmulti-agent systems. Proceedings of the IEEE, 95(1):215–233, Jan. 2007.

[14] G. Cybenko. Dynamic Load Balancing for Distributed Memory Multiprocessors. Journalof Parallel and Distributed Computing, 7(2):279–301, Oct. 1989.

[15] F. Sivrikaya and B. Yener. Time Synchronization in Sensor Networks: A Survey. IEEENetwork Magazine–special issues on ‘Ad Hoc Networking: Data Communications &

Topology Control’, 18(4):45–50, 2004.

[16] O. Hlinka, F. Hlawatsch, and P. M. Djuric. Distributed Particle Filtering in AgentNetworks. IEEE Signal Processing Magazine, 30(1):61–81, Jan. 2013.

[17] C. Reyes, T. Hilaire, and C. F. Mecklenbräuker. Distributed projection approximationsubspace tracking based on consensus propagation. In The 3rd Intl. Workshop on Com-putational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 340–343,Aruba, Dec. 2009.

[18] P. Di Lorenzo and S. Barbarossa. Swarming Algorithms for Distributed Radio ResourceAllocation. IEEE Signal Processing Magazine, 30(3):144–154, 2013.

[19] O. Slučiak, T. Hilaire, and M. Rupp. A general formalism for the analysis of distributedalgorithms. In Proc. of the 35th IEEE International Conference on Acoustics, Speech,and Signal Processing (ICASSP), pages 2890–2893, Dallas, USA, Mar. 2010.

[20] O. Slučiak and M. Rupp. Steady-state analysis of a quantized average consensus algo-rithm using state-space description. In Proc. of European Signal Processing Conference(EUSIPCO), pages 199–203, Aalborg, Denmark, Aug. 2010.

[21] O. Slučiak and M. Rupp. Reaching consensus in asynchronous WSNs: Algebraic ap-proach. In Proc. of the 36th IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP), pages 3300–3303, Prague, Czech Rep., May 2011.

[22] O. Slučiak and M. Rupp. Almost sure convergence of consensus algorithms by relaxedprojection mappings. In Proc. of Statistical Signal Processing Workshop (SSP), pages632–635, Ann Arbor, MI, USA, Aug. 2012.

[23] O. Slučiak, O. Hlinka, M. Rupp, F. Hlawatsch, and P. M. Djuric. Sequential likelihoodconsensus and its application to distributed particle filtering with reduced communica-tions and latency. In Rec. of the 45th Asilomar Conf. on Signals, Systems, and Comput-ers, pages 1766–1770, Pacific Grove, CA, USA, Nov. 2011.

[24] O. Slučiak, H. Straková, M. Rupp, and W. N. Gansterer. Distributed Gram-Schmidtorthogonalization based on dynamic consensus. In Rec. of the 46th Asilomar Conf. onSignals, Systems, and Computers, pages 1207–1211, Pacific Grove, CA, USA, Nov. 2012.

[25] O. Hlinka, O. Slučiak, F. Hlawatsch, P. M. Djuric, and M. Rupp. Likelihood consensus:Principles and application to distributed particle filtering. In Rec. of the 44th Asilomar

Page 135: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

BIBLIOGRAPHY 123

Conf. on Signals, Systems, and Computers, pages 349–353, Pacific Grove,CA,USA,Nov.2010.

[26] O. Hlinka, O. Slučiak, F. Hlawatsch, P. M. Djuric, and M. Rupp. Distributed Gaussianparticle filtering using likelihood consensus. In Proc. of the 36th IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), pages 3756–3759,Prague, Czech Rep., May 2011.

[27] O. Hlinka, O. Slučiak, F. Hlawatsch, P. M. Djuric, and M. Rupp. Likelihood consensusand its application to distributed particle filtering. IEEE Trans. on Signal Processing,60(8):4334–4349, Aug. 2012.

[28] O. Slučiak, H. Straková, M. Rupp, and W. N. Gansterer. Dynamic average consensusand distributed orthogonalization. IEEE Trans. on Signal Processing, 2013. (submitted;[Online] Available: http://publik.tuwien.ac.at/files/PubDat 216777.pdf).

[29] O. Slučiak and M. Rupp. Network Size Estimation using Distributed Orthogonalization.IEEE Signal Processing Letters, 20(4):347–350, Apr. 2013.

[30] Fan R.K. Chung. Spectral Graph Theory, volume 92. American Mathematical Society,1997.

[31] R. Karp, C. Schindelhauer, S. Shenker, and B. Vocking. Randomized rumor spreading.In Proc. Annual Symp. on Foundations of Computer Science, pages 564–574, CA, USA,Nov. 2000.

[32] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized Gossip Algorithms. IEEETrans. on Information Theory, 52(6):2508–2530, Jun. 2006.

[33] R. Carli. Topics on the average consensus problem. PhD thesis, University of Padova,Italy, 2008.

[34] C. C. Moallemi and B. Van Roy. Consensus Propagation. IEEE Trans. on InformationTheory, 52(11):4753–4766, Nov. 2006.

[35] T. C. Aysal, M. E. Yildiz, A. D. Sarwate, and A. Scaglione. Broadcast gossip algorithmsfor consensus. IEEE Trans. on Signal Processing, 57(7):2748–2761, Jul. 2009.

[36] I. D. Schizas, A. Ribeiro, and G. B. Giannakis. Consensus in ad hoc WSNs with noisylinks– part I: Distributed estimation of deterministic signals. IEEE Trans. on SignalProcessing, 56(1):350 – 364, 2008.

[37] A. Ribeiro. Distributed Quantization-Estimation for Wireless Sensor Networks. PhDthesis, Faculty of the graduate school of the University of Minnesota, 2005.

[38] J. Kenyeres, M. Kenyeres, M. Rupp, and P.Farkas. WSN implementation of the averageconsensus. In 11th European Wireless Conference – Sustainable Wireless Technologies(European Wireless), Vienna, Austria, Apr. 2011.

Page 136: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

124 BIBLIOGRAPHY

[39] A. Censi and R.M. Murray. Real-valued average consensus over noisy quantized channels.In American Control Conference (ACC ’09), pages 4361–4366, June 2009.

[40] A. Kashyap, T. Basar, and R. Srikant. Quantized consensus. In IEEE InternationalSymposium on Information Theory, pages 635–639, Seattle, WA, USA, Jul. 2006.

[41] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: NumericalMethods. Prentice Hall, Inc., 1997.

[42] K. Römer. Time synchronization in ad hoc networks. In MobiHoc ’01: Proc. of the2nd ACM International Symposium on Mobile Ad Hoc Networking & Computing, pages173–182, Long Beach, CA, USA, Oct. 2001.

[43] Y.-Ch. Wu, Q. Chaudhari, and E. Serpedin. Clock synchronization of wireless sensornetworks. IEEE Signal Processing Magazine, 28(1):124–138, Jan. 2011.

[44] J.-S. Kim, J. Lee, E. Serpedin, and K. Qaraqe. Robust clock synchronization in wirelesssensor networks through noise density estimation. IEEE Trans. on Signal Processing,59(7):3035–3047, Jul. 2011.

[45] C. D. Meyer. Matrix Analysis & Applied Linear Algebra. SIAM, 2000.

[46] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal,23(2):298–305, 1973.

[47] A. E. Brouwer and W. H. Haemers. Spectra of graphs. Springer, 2012.

[48] Gene H. Golub and Charles F. Van Loan. Matrix computations (3rd ed.). Johns HopkinsUniversity Press, Baltimore, MD, USA, 1996.

[49] L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems & ControlLetters, 53:65–78, 2004.

[50] A. Olshevsky and J. N. Tsitsiklis. Convergence speed in distributed consensus and aver-aging. SIAM J. Control and Optimization, 48(1):33–55, 2009.

[51] L. Xiao, S. Boyd, and S. Lall. A scheme for robust distributed sensor fusion based onaverage consensus. In International Conference on Information Processing in SensorNetworks, pages 63–70, Los Angeles, USA, Apr. 2005.

[52] L. Georgopoulos and M. Hasler. Nonlinear Average Consensus. In International Sympo-sium on Nonlinear Theory and its Applications, Sapporo, Japan, Oct. 2009.

[53] V. Schwarz and G. Matz. Nonlinear average consensus based on weight morphing. In Procof the 36th IEEE International Conference on Acoustics, Speech and Signal Processing

(ICASSP), pages 3129–3132, Mar. 2012.

[54] V. Schwarz and G. Matz. Mean-square optimal weight design for average consensus. InProc. IEEE SPAWC-2012, volume 1, pages 374–378, Jun. 2012.

Page 137: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

BIBLIOGRAPHY 125

[55] D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information.In Proc. of 44th Annual IEEE Symposium on Foundations of Computer Science, pages482–491, Oct. 2003.

[56] A. Papoulis and S. U. Pillai. Probability, random variables, and stochastic processes (4thEd.). McGraw-Hill, 2002.

[57] T. Hilaire, D. Ménard, and O. Sentieys. Bit accurate roundoff noise analysis of fixed-pointlinear controllers. In Proc. IEEE Int. Symposium on Computer-Aided Control SystemDesign (CACSD’08), Sept. 2008.

[58] B. Widrow. Statistical analysis of amplitude quantized sampled-data systems. TransAIEE, 2(79):555–568, 1960.

[59] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw Hill,1991.

[60] T. C. Aysal, M. Coates, and M. Rabbat. Distributed average consensus using probabilisticquantization. In Proc. of the 14th IEEE Workshop on Statistical Signal Processing (SSP),pages 640–644, Madison, WI, USA, Aug. 2007.

[61] A. Boukerche. Algorithms and Protocols for Wireless, Mobile Ad Hoc Networks. JohnWiley & Sons, Inc., 2009.

[62] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1990.

[63] R. Olfati-Saber and R. M. Murray. Consensus problems in networks of agents withswitching topology and time-delays. IEEE Trans. on Automatic Control, 49(9):1520–1533, Sep. 2004.

[64] S. Kar and J. M. F. Moura. Distributed Average Consensus in Sensor Networks withRandom Link Failures. In Proc. of the 32th IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), volume 2, pages II–1013–1016, Apr. 2007.

[65] P. Frasca, R. Carli, F. Fagnani, and S. Zampieri. Average consensus on networks withquantized communication. Int. Journal of Robust and Nonlinear Control, 19(16):1787–1816, Nov. 2009.

[66] S. S. Pereira and A. Pages-Zamora. Mean square convergence of consensus algorithms inrandom WSNs. IEEE Trans. on Signal Processing, 58(5):2866–2874, May 2010.

[67] R. M. Gray. Probability, Random Processes, and Ergodic Properties (2nd Ed.). Springer,2009.

[68] D. Jakovetic, J. Xavier, and J. M. F. Moura. Weight optimization for consensusalgorithms with correlated switching topology. IEEE Trans. on Signal Processing,58(7):3788–3801, Jul. 2010.

Page 138: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

126 BIBLIOGRAPHY

[69] I. Yamada and N. Ogura. Adaptive projected subgradient method for asymptotic min-imization of sequence of nonnegative convex functions. Numerical Functional Analysisand Optimization, 25(7&8):593–617, 2004.

[70] K. Slavakis, I. Yamada, and N. Ogura. The adaptive projected subgradient method overthe fixed point set of strongly attracting nonexpansive mappings. Numerical FunctionalAnalysis and Optimization, 27(7&8):905–930, 2006.

[71] K. Slavakis, S. Theodoridis, and I. Yamada. Online kernel-based classification usingadaptive projection algorithms. IEEE Trans. on Signal Processing, 56(7):2781–2796,2008.

[72] K. Slavakis, S. Theodoridis, and I. Yamada. Adaptive constrained learning in reproducingkernel Hilbert spaces: The robust beamforming case. IEEE Trans. on Signal Processing,57(12):4744–4764, 2009.

[73] K. Slavakis, Y. Kopsinis, and S. Theodoridis. Revisiting adaptive least-squares estimationand application to online sparse signal recovery. In Proc. of the 36th IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), pages 4292–4295,Prague, Czech Rep., May 2011.

[74] S. Theodoridis, K. Slavakis, and I. Yamada. Adaptive learning in a world of projections.IEEE Signal Processing Magazine, pages 97–123, Jan. 2011.

[75] J. von Neumann. Functional Operators Vol. II: The Geometry of Orthogonal Spaces,volume 22 of Ann. Math. Stud. Princeton University Press, Princeton, NJ, 1950. [Reprintof mimeographed lecture notes first distributed in 1933].

[76] I. Halperin. The product of projection operators. Acta Sci. Math. (Szeged), 23:96–99,1962.

[77] M. Práger. Über ein Konvergenzprinzip im Hilbertschen Raum. Czechoslovak Math. J.,10(2):271–282, 1960.

[78] I. Amemiya and T. Ando. Convergence of random products of contractions in Hilbertspace. Acta Sci. Math. (Szeged), 26(3–4):239–244, 1965.

[79] H. H. Bauschke. A norm convergence result on random products of relaxed projectionsin Hilbert space. Trans. of the American Mathematical Society, 347(4):1365–1373, 1995.

[80] J.-B. Baillon and R. E. Bruck. On the random product of orthogonal projections inHilbert space. Nonlinear Analysis and Convex Analysis (Niigata 1998), pages 126–133,1998.

[81] R. E. Bruck. On the random product of orthogonal projections in Hilbert space II.Contemp. Math., 513:65–98, 2010.

[82] P. Denantes, F. Bénézit, P. Thiran, and M. Vetterli. Which distributed averaging algo-rithm should I choose for my sensor network? Proc. IEEE Infocom ’08, Apr. 2008.

Page 139: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

BIBLIOGRAPHY 127

[83] R. A. Freeman, P. Yang, and K. M. Lynch. Stability and convergence properties of dy-namic average consensus estimators. In 45th IEEE Conference on Decision and Control,pages 338–343, Dec. 2006.

[84] W. Ren. Consensus seeking in multi-vehicle systems with a time-varying reference state.In Proc. of the 2007 American Control Conference, pages 717–722, Jul. 2007.

[85] P. Braca, S. Marano, and V. Matta. Running consensus in wireless sensor networks. InProc. Int. Conf. Inf. Fusion (FUSION 2008), pages 152–157, Cologne, Germany, Jun.–Jul. 2008.

[86] V. Schwarz, C. Novak, and G. Matz. Broadcast-based dynamic consensus propagationin wireless sensor networks. In Proc. 43rd Asilomar Conf. on Sig., Syst., Comp., pages255–259, Pacific Grove, CA, Nov. 2009.

[87] M. Zhu and S. Martínez. Discrete-time dynamic average consensus. Automatica,46(2):322–329, 2010.

[88] D. P. Spanos, R. Olfati-Saber, and R. M. Murray. Dynamic consensus on mobile networks.In Proc. 16th IFAC World Congress, Prague, Czech Republic, Jul. 2005.

[89] S. Elaydi. An Introduction to Difference Equations (3rd Ed.). Springer, 2005.

[90] J. F. Ritt. On the zeros of exponential polynomials. Transactions of The AmericanMathematical Society, 31(4):680–686, 1929.

[91] H. Wolter. On roots of exponential terms. Mathematical Logic Quarterly, 39:96–102,1993.

[92] J. H. Kotecha and P. M. Djuric. Gaussian particle filtering. IEEE Trans. on SignalProcessing, 51(10):2592–2601, 2003.

[93] X. Sheng and Y.H. Hu. Maximum likelihood multiple-source localization using acous-tic energy measurements with wireless sensor networks. IEEE Trans. Signal Process.,53(1):44–53, Jan. 2004.

[94] G. M. Hoffmann and C. J. Tomlin. Mobile sensor network control using mutual informa-tion methods and particle filters. IEEE Transactions on Automatic Control, 55(1):32–47,2010.

[95] B. Ristic and S. Arulampalam. Beyond the Kalman Filter: Particle Filters for TrackingApplications. Artech House, Boston, MA, 2004.

[96] S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall PTR, Upper Saddle River, NJ, 1998.

[97] A. Doucet, N. De Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice.Springer, New York, NY, 2001.

Page 140: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

128 BIBLIOGRAPHY

[98] M. S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, D. Sci, T. Organ, and S. A. Ade-laide. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking.IEEE Trans. Signal Process., 50(2):174–188, Feb. 2002.

[99] O. Hlinka, F. Hlawatsch, and P.M. Djuric. Likelihood consensus-based distributed par-ticle filtering with distributed proposal density adaptation. In Proc. of the 37th IEEEInternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages3869–3872, Kyoto, Japan, Mar. 2012.

[100] S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Upper Saddle River, NJ, 1993.

[101] A. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, PA,1996.

[102] P. R. Bevington and D. K. Robinson. Data Reduction and Error Analysis for the PhysicalSciences. McGraw-Hill, New York, NY, 2003.

[103] G. Allaire and S. M. Kaber. Numerical Linear Algebra. Springer, New York, NY, 2008.

[104] O. Hlinka. Distributed Particle Filtering in Networks of Agents. PhD thesis, ViennaUniversity of Technology, 2012.

[105] M. Coates. Distributed particle filters for sensor networks. In Proc. IPSN-04, pages99–107, Berkeley, CA, Apr. 2004.

[106] D. Gu, J. Sun, Z. Hu, and H. Li. Consensus based distributed particle filter in sensornetworks. In Proc. IEEE ICIA-08, pages 302–307, Zhangjiajie, P. R. China, Jun. 2008.

[107] B. N. Oreshkin and M. J. Coates. Asynchronous distributed particle filter via decen-tralized evaluation of Gaussian products. In Proc. ISIF Int. Conf. Information Fusion,Edinburgh, UK, Jul. 2010.

[108] S. Farahmand, S. I. Roumeliotis, and G. B. Giannakis. Set-membership constrainedparticle filter: Distributed adaptation for sensor networks. IEEE Trans. Signal Process.,59(9):4122–4138, Sep. 2011.

[109] A. Mohammadi and A. Asif. Consensus-based distributed unscented particle filter. InProc. IEEE SSP-11, pages 237–240, Nice, France, Jun. 2011.

[110] L. N. Trefethen and D. Bau, III. Numerical Linear Algebra. SIAM: Society for Industrialand Applied Mathematics, Philadelphia, PA, 1997.

[111] J. M. Lees and R. S. Crosson. Bayesian ART versus conjugate gradient methods intomographic seismic imaging: An application at Mount St. Helens, Washington. InAntonio Possolo, editor, Spatial Statistics and Imaging, volume 20, pages 186–208. IMSLecture Noted-Monograph Series, Hayward, CA, 1991.

[112] C. Dumard and E. Riegler. Distributed sphere decoding. In International Conference onTelecommunications ICT ’09, pages 172–177, 2009.

Page 141: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

BIBLIOGRAPHY 129

[113] G. Tauböck, M. Hampejs, P. Švač, G. Matz, F. Hlawatsch, and K. Gröchenig. Low-complexity ICI/ISI equalization in doubly dispersive multicarrier systems using adecision-feedback LSQR algorithm. IEEE Trans. on Signal Processing, 59(5):2432–2436,2011.

[114] K.-J. Cho, Y.-N. Xu, and J.-G. Chung. Hardware efficient QR decomposition for GDFE.In IEEE Workshop on Signal Processing Systems, pages 412–417, Oct. 2007.

[115] X. Wang and M. Leeser. A truly two-dimensional systolic array FPGA implementationof QR decomposition. ACM Trans. Embed. Comput. Syst., 9(1):3:1–3:17, Oct. 2009.

[116] Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. Parallel tiled QRfactorization for multicore architectures. In Proceedings of the 7th International Confer-ence on Parallel Processing and Applied Mathematics, pages 639–648. Springer-Verlag,2008.

[117] James Demmel, Laura Grigori, Mark Frederick Hoemmen, and Julien Langou.Communication-optimal parallel and sequential QR and LU factorizations. Technicalreport, no. UCB/EECS-2008-89, EECS Department, University of California, Berkeley,2008.

[118] F. Song, H. Ltaief, B. Hadri, and J. Dongarra. Scalable tile communication-avoidingQR factorization on multicore cluster systems. In International Conference for HighPerformance Computing, Networking, Storage and Analysis, pages 1–11, 2010.

[119] Hana Straková, Wilfried N. Gansterer, and Thomas Zemen. Distributed QR factorizationbased on randomized algorithms. In Proc. of the 9th International Conference on ParallelProcessing and Applied Mathematics, Part I, volume 7203 of Lecture Notes in ComputerScience, pages 235–244, 2012.

[120] Hana Straková and Wilfried N. Gansterer. A distributed eigensolver for looselycoupled networks. In 21st Euromicro Int. Conference on Parallel, Distributed, andNetwork-Based Processing (PDP), Belfast, N. Ireland, Mar. 2013. [Online] Available:http://eprints.cs.univie.ac.at/3483/.

[121] B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall/CRCMonographs on Statistics & Applied Probability 57, 1994.

[122] P. Rost and G. Fettweis. On the transmission-computation-energy tradeoff in wirelessand fixed networks. eprint arXiv:1008.4565, Aug. 2010. arXiv:1008.4565.

[123] C. Budianu, S. Ben-David, and L. Tong. Estimation of the number of operating sensorsin large-scale sensor networks with mobile access. IEEE Trans. on Signal Processing,54(5):1703–1715, May 2006.

[124] I. Shames, T. Charalambous, C. N. Hadjicostis, and M. Johansson. Distributed networksize estimation and average degree estimation and control in networks isomorphic todirected graphs. In 50th Annual Allerton Conference on Communication, Control, andComputing, 2012.

Page 142: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

130 BIBLIOGRAPHY

[125] C. Baquero, P. S. Almeida, R. Menezes, and P. Jesus. Extrema propagation: fast dis-tributed estimation of sums and network sizes. IEEE Trans. on Par. and Dist. Systems,23(4):668–675, Apr. 2012.

[126] H. Terelius, D. Varagnolo, and K. H. Johansson. Distributed size estimation of dynamicanonymous networks. In 51st IEEE Conference on Decision and Control, pages 5221–5227, Dec. 2012.

[127] B. Mealy and F. Tappero. Free range VHDL. http://www.freerangefactory.org,2013.

[128] A. H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, and Z. J. Towfic. Diffusion strategies foradaptation and learning over networks. IEEE Signal Processing Magazine, 30(3):155–171, 2013.

[129] W. Sun, Y. Li, C. Li, and Y. Chen. Convergence speed of a fractional order consensusalgorithm over undirected scalefree networks. Asian Journal of Control, 15(4):1–11, 2013.

[130] L. Benini and G. De Micheli. Networks on chips: A new SoC paradigm. IEEE Computer,35(1):70–78, 2002.

[131] S. Sardellitti, M. Giona, and S. Barbarossa. Fast distributed average consensus algorithmsbased on advection-diffusion processes. IEEE Trans. on Signal Processing, 58(2):826–842,Feb. 2010.

[132] P. Lancaster. A note on sub-multiplicative norms. Numerische Mathematik, 19:206–208,1972.

[133] O. Perron. Zur Theorie der Matrices. Mathematische Annalen, 64:248–263, 1907.

[134] G. Frobenius. Über Matrizen aus nicht negativen Elementen. Sitzungsberichte derKöniglich Preußischen Akademie der Wissenschaften zu Berlin, pages 456–477, 1912.

[135] H. Schneider. The concepts of irreducibility and full indecomposability of a matrix in theworks of Frobenius, König and Markov. Linear Algebra and Its Applications, 18:139–162,1977.

[136] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences.SIAM, classics in applied mathematics edition, 1994.

[137] N. Keyfitz and H. Caswell. Applied Mathematical Demography, volume 14. Springer,3rd. edition, 2005.

[138] R. B. Bapat and T. E. S. Raghavan. Nonnegative Matrices and Applications. CambridgeUniversity Press, Encyclopedia of mathematics and its applications edition, 1997.

[139] M. Boyle. Notes on the Perron-Frobenius theory of nonnegative matrices. http://yaroslavvb.com/papers/boyle-notes.pdf, 2005.

Page 143: theses.eurasip.org · 2014. 5. 23. · Acknowledgements First and foremost, I would like to thank my advisor, Prof. Markus Rupp, for the guidance, support and help I have received

BIBLIOGRAPHY 131

[140] S. Khim. The Frobenius-Perron theorem. http://www.math.uchicago.edu/~may/VIGRE/VIGRE2007/REUPapers/FINALAPP/Khim.pdf, 2007.

[141] S. Sternberg. Dynamical Systems. Dover Publications, 2010.

[142] F. Ninio. A simple proof of the Perron-Frobenius theorem for positive symmetric matrices.J. Phys. A: Math. Gen., 9(8):1281–1282, 1976.

[143] G. W. Stewart. Perron-Frobenius theory: A new proof of the basics. Technical report,Univ. of Maryland, College Park, 1994.

[144] A. Borobia and U. R. Trías. A geometric proof of the Perron-Frobenius theorem. RevistaMatematica de la Universidad Complutense de Madrid, 5(1), 1992.

[145] C. R. MacCluer. The many proofs and applications of Perron’s theorem. SIAM Review,42(3):487 – 498, 2000.

[146] P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, andJ. Miguez. Particle filtering. IEEE Signal Processing Magazine, 20(5):19–38, 2003.