Masterarbeit Development of a Mobile Social Networ- king...

92
Georg-August-Universität Göttingen Zentrum für Informatik ISSN 1612-6793 Nummer ZFI-MSC-2010-11 Masterarbeit im Studiengang "Angewandte Informatik" Development of a Mobile Social Networ- king Platform Supporting Decentralized Data Storage Optimized by Social Trust David Koll am Lehrstuhl für Computer Networks Bachelor- und Masterarbeiten des Zentrums für Informatik an der Georg-August-Universität Göttingen 10. November 2010

Transcript of Masterarbeit Development of a Mobile Social Networ- king...

Georg-August-UniversitätGöttingenZentrum für Informatik

ISSN 1612-6793Nummer ZFI-MSC-2010-11

Masterarbeitim Studiengang "Angewandte Informatik"

Development of a Mobile Social Networ-king Platform Supporting DecentralizedData Storage Optimized by Social Trust

David Koll

am Lehrstuhl für

Computer Networks

Bachelor- und Masterarbeitendes Zentrums für Informatik

an der Georg-August-Universität Göttingen

10. November 2010

Georg-August-Universität GöttingenZentrum für Informatik

Goldschmidtstraße 737077 GöttingenGermany

Tel. +49 (5 51) 39-17 2010

Fax +49 (5 51) 39-1 44 15

Email [email protected]

WWW www.informatik.uni-goettingen.de

Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keineanderen als die angegebenen Quellen und Hilfsmittel verwendet habe.

Göttingen, den 10. November 2010

Development of a Mobile Social NetworkingPlatform supporting Decentralized Data Sto-

rage optimized by Social Trust

David Koll

November 10th 2010

Betreut durch Prof. Dr. FuComputer Networks Group

Georg-August-Universität Göttingen

Abstract

Due to their centralized architecture current Online Social Networks (OSNs) like Facebookor Twitter suffer from multiple deficiencies, which are of administrative and technical na-ture. For example, users are subject to censorship and do not control their own data. Fur-thermore, the strict requirement for connectivity to a central server prevents participationin the OSN in the case of only intermittent connectivity. To overcome these shortcomings,this thesis presents a peer-to-peer (P2P) architecture based on decentralized data storage,that allows its users to control access to their personal data precisely. Moreover, usersare no longer subject to arbitrary decisions by the OSN provider. Furthermore, the strictconnectivity requirements are mitigated.

As the central server is eliminated in a P2P architecture, the availability of user data isof concern. Users will not switch to a decentralized approach, if data is less available thanin current OSNs. As participants of an OSN are only periodically online, their data has tobe mirrored at a set of remote nodes to achieve high availability. It is important to selectthese nodes carefully, as nodes of a P2P network are heterogenous and therefore providedifferent capabilities like online time, storage space or bandwidth. Previous approachesrealizing a decentralized OSN did not focus this problem. In contrast to that, this workpresents a node selection process, that efficiently selects a set of nodes for the task of mir-roring data based on a simple trust model. It is shown that, in comparison to previousapproaches, the number of mirroring nodes is kept at a minimum, while data availabilityis increased.

Contents

List of Figures 5

List of Acronyms 7

1 Introduction 9

2 Foundations 122.1 Client-Server-Model and Peer-to-Peer . . . . . . . . . . . . . . . . . . . . . . 122.2 Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Delay Tolerant Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Distributed Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 Pastry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Attribute Based Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.1 Encryption Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5.2 Publishing and Retrieving Data . . . . . . . . . . . . . . . . . . . . . . 17

3 Related Work 183.1 Safebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 PeerSoN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Persona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Diaspora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.5 Haggle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Architecture Design 264.1 Use Cases and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 Joining the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.1.2 Building a Social Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3

Contents

4.1.3 Exchange of Social Context . . . . . . . . . . . . . . . . . . . . . . . . 274.1.4 Group Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.1.5 General Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Important Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1 Decentralized Data Storage . . . . . . . . . . . . . . . . . . . . . . . . 284.2.2 Using Attribute Based Encryption . . . . . . . . . . . . . . . . . . . . 304.2.3 Data Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.1 A Decentralized System . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Middleware and Applications . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4.1 Social Graph Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 344.4.2 Decentralized Data Storage . . . . . . . . . . . . . . . . . . . . . . . . 354.4.3 Security and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.4.4 Data Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Implementation 415.1 Implementation Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1.1 Portation to Mobile Phones . . . . . . . . . . . . . . . . . . . . . . . . 425.2 The GEMSTONE Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2.2 Implementation of Modules . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Functionality of the GEMSTONE Prototype . . . . . . . . . . . . . . . . . . . 615.4 A Demo Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.6 Deployment Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Node Selection Process 656.1 Discussion of the State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . 666.2 GEMSTONE Node Selection - GEMNOSE . . . . . . . . . . . . . . . . . . . . 676.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.3.2 Finding a Parameter Weighting . . . . . . . . . . . . . . . . . . . . . . 716.3.3 GEMNOSE Availability Evaluation . . . . . . . . . . . . . . . . . . . 796.3.4 Possible Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7 Conclusion and Outlook 83

Bibliography 85

4

List of Figures

3.1 Safebook Architecture [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Haggle Resolution Process [25] . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Haggle Node Architecture [25] . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 The GEMSTONE System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 GEMSTONE Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3 Relation Request Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4 Mirroring Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5 Mapping of Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 GEMSTONE Implementation Overview . . . . . . . . . . . . . . . . . . . . . 435.2 Package Core - UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . . 475.3 End-to-End Object Communication Example . . . . . . . . . . . . . . . . . . 505.4 Package Application - UML Class Diagram . . . . . . . . . . . . . . . . . . . 515.5 Package ObjectHandling - UML Class Diagram . . . . . . . . . . . . . . . . . 555.6 Package Connection - UML Class Diagram . . . . . . . . . . . . . . . . . . . 565.7 Package DataStorage - UML Class Diagram . . . . . . . . . . . . . . . . . . . 575.8 Package NodeSelection - UML Class Diagram . . . . . . . . . . . . . . . . . 595.9 Package SocialGraph - UML Class Diagram . . . . . . . . . . . . . . . . . . . 605.10 Package Interface - UML Class Diagram . . . . . . . . . . . . . . . . . . . . . 615.11 Screenshot of the demo application . . . . . . . . . . . . . . . . . . . . . . . . 625.12 Possible deployment scenario: a Facebook wrapper application . . . . . . . 64

6.1 Node availability distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.2 Data availability by parameter weighting . . . . . . . . . . . . . . . . . . . . 726.3 Average number of mirroring nodes by parameter weighting . . . . . . . . . 736.4 Data availability introducing srj with weight γ = 0.2 . . . . . . . . . . . . . 746.5 Mirroring nodes by parameter weighting . . . . . . . . . . . . . . . . . . . . 756.6 Close-up view of data availability, α = 0.1,β = 0.7,γ = 0.2 . . . . . . . . . . 766.7 Close-up view of the average number of mirroring nodes, α = 0.1,β =

0.7,γ = 0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5

List of Figures

6.8 Data availability with a more optimistic availability assumption, and α =0.1,β = 0.7,γ = 0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.9 Average number of mirroring nodes with a more optimistic availability as-sumption, α = 0.1,β = 0.7,γ = 0.2 . . . . . . . . . . . . . . . . . . . . . . . . 77

6.10 Number of dropped profiles by parameter weighting . . . . . . . . . . . . . 786.11 Number of dropped profiles by drop policy . . . . . . . . . . . . . . . . . . . 786.12 Data availability by strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.13 Average number of mirroring nodes by strategy . . . . . . . . . . . . . . . . 80

6

List of Acronyms

ABE Attribute Based Encryption

ACK Acknowledgment

ACL Access Control List

AMSK ABE Master Secret Key

API Application Programming Interface

APK ABE Public Key

AS Access Structure

ASCII American Standard Code for Information Interchange

ASK Attribute Secret Key

CAN Content Addressable Network

CSM Client-Server Model

DHT Distributed Hash Table

DO Data Object

DS Data Store

DTN Delay Tolerant Network

EQ Event Queue

FIFO First in, First out

7

List of Figures

GUI Graphical User Interface

IP Internet Protocol

LOC Lines of Code

MOSN Mobile Online Social Networking

ND Node Description

OS Operating System

OSN Online Social Network

P2P Peer-to-Peer

PKC Public-Key Cryptography

PKI Public-Key Infrastructure

RTT Round Trip Time

SHA Secure Hash Algorithm

SS Storage Space

TCP Transmission Control Protocol

TIS Trusted Identification Service

TPK Traditional Public-Key Cryptography

TTL Time-to-Live

TR Trust and Reputation

UML Unified Modeling Language

XML Extensible Markup Language

8

1 Introduction

Over the past couple of years Online Social Networks (OSNs) like Facebook, Flickr or My-Space have seen a boost in popularity and thereby become an omnipresent medium ofcommunication in modern society. Despite an ever-growing number of users consumingas well as generating content - 50% of currently 500 million Facebook users log on to Face-book on any given day1 - these services still scale very well and thus offer an enjoyableuser experience. In 2008, both Facebook and MySpace achieved an availability of above99% with the maximum period of downtime being 0.4 hours and 1.4 hours respectively[28]. With the creation of applications for devices with limited resources (i.e., mobile de-vices), OSNs have increased their coverage even further. For instance Facebook recentlyannounced 150 million users of their Facebook Mobile service. This number is growingrapidly as the capabilities in terms of computing power, bandwidth and battery lifetimeincrease due to the introduction of modern smartphones.

However, current OSNs suffer from a number of serious flaws. Due to their centralizedarchitecture OSNs are not only vulnerable to external attacks but also to internal misuse ofpersonal data by the OSN provider itself, which is endangering the users’ privacy:

• Current privacy agreements are often opt-out processes, of which the users are not al-ways aware. Also, recent history shows that these agreements have not always beenenforced strictly. Deployed in 2007, Facebook Beacon2 is the most important examplefor misuse of personal data by an OSN provider. The application recorded a user’sonline shopping activities and informed their friends on Facebook without consentthe user’s consent. After this practice led to a controversy in the online communityas well as a class action lawsuit against Facebook in 20093, Facebook shut down theapplication in September 2009.4

• An example for external attacks is the disclosure of 1.6 million profiles in the German

1http://www.facebook.com/press/info.php?statistics, retrieved Nov. 8th 20102http://www.facebook.com/press/releases.php?p=9166, retrieved Nov. 8th 20103http://www.beaconclasssettlement.com/, retrieved Nov. 8th 20104http://www.theregister.co.uk/2009/09/23/facebook_beacon_dies/, retrieved Nov. 8th 2010

9

1 Introduction

network SchülerVZ.5 With the majority of users in SchülerVZ being underage, thisshows the threat to the part of the population that may not be able to assess the valueof their most personal data correctly.

In addition to privacy concerns the user needs to connect to the server to be able toparticipate in the network in a client-server model (CSM). This is of importance for mobileusers, who might be disconnected from the infrastructure for a longer period of time, e.g.,while on holidays or working in secluded areas. Moreover, users in centralized OSNs aresubject to censorship limiting the right to freedom of speech [7]. Finally, any OSN providermight change its terms of use at will by e.g., introducing a usage fee. In that case users ofthe particular service would be forced to choose between paying the fee and quitting thenetwork.

To mitigate these flaws an open-source peer-to-peer (P2P) architecture is needed. Bycombining decentralized storage of private user data with encryption users can regain con-trol by determining who can access which parts of their private data with fine granularity.Moreover, using delay tolerant networks (DTN) and the corresponding routing protocolsthe strict requirement for a connection to the OSN infrastructure is no longer given. Thisis crucial for Mobile Online Social Networking (MOSN) that adds location based servicesand augmented reality to social networking. Furthermore, this approach fits the model ofcurrent OSN applications, in which the users alone are consuming and generating most ofthe content in a P2P fashion. This allows for accurate intellectual property rights manage-ment and enables an exploitation of user locality [7]. As no centralized entity exists, usersare safe from changes in the terms of use.

This thesis presents the design and implementation of a socio-aware middleware real-izing such a P2P architecture as foundation for the GEMSTONE (Generic Middleware forOnline Social Networks) project by supporting fully decentralized data storage. Additio-nally, ideas on how to realize privacy and multiple ways of data dissemination includingDTN routing are presented, although their implementation is out of scope of this thesis.

On top of this middleware any kind of application utilizing the same social informa-tion may be built, superseding multiple accounts for multiple OSNs. For example, suchan application could measure user movements and interactions while preserving useranonymity, providing data sets for further research. Applications communicate with GEM-STONE via an easy-to-implement ASCII protocol, allowing the development of GEM-STONE applications in any programming language. For this thesis a simple applicationdemonstrating the functionality of the first GEMSTONE prototype was created.

5http://www.spiegel.de/netzwelt/web/0,1518,692822,00.html, retrieved Nov. 8th 2010

10

1 Introduction

As there is no central data storage provider in a decentralized OSN, all data has to bestored and managed by the users of the OSN themselves. Given that these users are notonline continuously, it is not sufficient for a user to store his own data on his device alone.Therefore, each user needs to select a number of users (i.e., nodes) to provide his data assoon as the user himself is not reachable.

The selection of suitable nodes for storage of personal data is difficult: since nodes areheterogenous, they offer different data availability (i.e., online time) and resources like sto-rage space or bandwidth. Both are parameters that influence not only the user experience,but also the robustness and scalability of the system itself. Besides, users have no incentiveto store data on their local data storage for other users they do not even know. Thus, it ishard to find the most suitable nodes for data storage out of a set of candidate nodes.

Previous approaches to a decentralized OSN did not address these difficulties [8] orselected mirroring nodes purely based on social relations [10]. This thesis shows how tominimize the required number of user data replicas in the system while providing highdata availability. To achieve this, a process that selects nodes based on a parameter weigh-ting of availability, social relations and user experience is introduced. By using simulationsthe most efficient parameter weighting is determined to achieve close to constant dataavailability while still keeping the amount of replicas in the system at a minimum.

The structure of the remainder of this thesis is as follows: First, the introduction of coredefinitions and foundations the thesis is built on in Chapter 2. Subsequently a review ofrelated work is presented in Chapter 3 followed by a detailed description of the systemarchitecture in Chapter 4 and the implementation in Chapter 5. The node selection processincluding an evaluation is presented in Chapter 6. Finally, this thesis ends in a conclusionand an outlook on future work in Chapter 7.

11

2 Foundations

This chapter introduces basic foundations required for the comprehension of this work.Starting with definitions of P2P and Middleware it will also present introductions to DelayTolerant Networks (DTN), Distributed Hash Tables (DHT) and Attribute Based Encryption(ABE). To stay in the scope of this work only brief definitions and introductions are given.For more detailed information on the following topics, the reader may refer to the literature[20].

2.1 Client-Server-Model and Peer-to-Peer

Both, CSM and P2P are widely used models describing the relationships between partic-ipants of a network. In the CSM, there are two roles, called client and server. Typically,one central, continuously online server handles multiple clients, who can request servicesfrom the server by communicating via a predefined protocol. One client may be connectedto multiple servers providing different services as servers are usually nodes with higherresources than clients. If the connection to the server is lost on either client or server side,the service is no longer available to the client.

Contrary to the CSM, in P2P each node (peer) impersonates both client and server whilebeing available only periodically. This implies that each peer shares resources with otherpeers adequate to its own resources to provide the same services available in the CSM.Therefore, in P2P a central server is non-existent and peers exchange data directly. If apeer disconnects, the service is still available for the remaining peers.

2.2 Middleware

A middleware is a piece of software connecting several applications with each other. Itusually provides a set of functions allowing these applications to interact with each other.

12

2 Foundations

2.3 Delay Tolerant Networks

A DTN1 [12] is a network architecture allowing communication between nodes withoutaccess to the infrastructure. This type of networking is needed in several scenarios cha-racterized by only intermittent connectivity, including mobile networks in general as wellas sensor networking or military ad hoc networks. In these networks a node might nothave connectivity to the infrastructure and therefore needs another way to transmit datato other nodes. Hence, in DTNs nodes exchange data on an opportunistic base, i.e., whenthey move in range of each other. Multiple data forwarding algorithms - mostly based onBluetooth - have been proposed [18, 21, 24, 26]. These algorithms exploit patterns of mo-bility to send messages from a source to a destination. Therefore, nodes that have beendetermined to be able to get the message closer to its destination may carry messagesnot targeted at themselves when participating in a DTN (local forwarding). Still, as thename implies, DTNs are networks of high delays as it may take several days to transmita message to its destination depending on the characteristics (e.g., node mobility or targetbehaviour) of the network itself.

2.4 Distributed Hash Tables

DHTs offer one possibility to manage decentralized data storage. A DHT provides look-up functionality for decentralized networks, where a requesting participant can retrievethe value associated with a given key efficiently. In a DHT, each participating node isresponsible for a partition of the keyspace, i.e., it has to respond to queries requesting thevalue(s) for keys in that partition of the keyspace (keyspace-partitioning). Hence, eachnode represents a bucket of a traditional hash table. The keyspace itself is a set of keys,e.g., the set of hash values of 1024-bit public keys. The nodes are then connected via anoverlay network to provide the look-up service.

Look-up Procedure. A participating node R in a social network may want to find outwhich nodes are mirroring the data of a friend F. Given a keyspace consisting of the hashvalues of unique user IDs, e.g., public keys, R produces kuid, the hash of F’s user ID. Af-terwards it sends get (kuid) to any node N that is part of the overlay network. Each nodemaintains a set of neighbours in a routing table. N by definition either is responsible forkuid itself or has a neighbour in the routing table that is closer to being responsible for kuidthan N itself. It therefore chooses the node closest to kuid and forwards the request there.

1http://www.dtnrg.org/, retrieved Nov. 8th 2010

13

2 Foundations

If there is no closer node in the routing table of a node N′, this node is responsible for kuidand the request has reached its destination. Otherwise, this procedure is repeated recur-sively. The procedure to put a key-value pair into the table is executed analogously viaput (kuid, value).

Key properties. Besides decentralization there are more key properties of DHTs. Thefirst is scalability : A DHT has to scale with a great number of nodes. This means thatthe degree of a node, i.e., the number of neighbours in the table, and the route length todetermine a responsible node need to be kept at reasonable amounts to prevent excessivemaintenance overhead and keep look-up times low. Most DHTs are designed to guaranteeO(logn) for both cases.

The second key property is robustness / fault tolerance : the DHT has to be able to dealwith nodes joining and leaving. Furthermore, a node leaving or joining the DHT mustaffect its neighbour nodes regarding the keyspace and these neighbour nodes only. This isdue to the bandwidth intense process of repartitioning the whole DHT since in that caseall nodes would have to exchange all key-value pairs. Keeping this process limited tothe neighbouring nodes only is therefore essential, especially in mobile networking wheredevices only have limited resources.

In this thesis only the Pastry DHT [31] is viewed in detail. This is because FreePastry2.12 was chosen as the look-up dictionary for GEMSTONE, since it provides an easy-to-useJAVA package as well as maintenance of the project. For information on other DHTs likeKademlia, Chord, OpenDHT, or CAN the reader may refer to the literature [22, 29, 30, 33].

2.4.1 Pastry

Nodes are arranged in a circular 128-bit node key (identifier) space in Pastry. The identifieris assigned randomly as a new node joins the system and is considered a sequence of digitswith base B, with B being a configuration parameter, which the authors suggest as B = 2b

with b = 4. In Pastry each peer maintains a leaf set and a neighbourhood set in additionto its routing table. The routing table consists of logBN rows with each row of cardinality|B− 1|. An entry in row n shares a prefix of length n with the local node but differs in then + 1th digit. This prefix is used for a longest-prefix matching routing protocol.

The neighbourhood set M consists of node identifiers and IP addresses of the |M| closestpeers to the local node. The closeness of a node is determined by using a scalar proximity

2http://www.freepastry.org/FreePastry/, retrieved Nov. 8th 2010

14

2 Foundations

metric like the IP geographic routing distance. The leaf set L contains the |L| nodes withthe numerically closest node identifiers, in which |L|/2 entries contain nodes with largerIDs and |L|/2 nodes contain nodes with smaller IDs. The cardinalities of M and L typicallyare B or 2 · B.

Routing. The routing itself is done as follows: if a node faces a routing request it firstchecks whether or not the key falls within the range of its leaf set. If that is the case,the message is forwarded directly to the closest node to the key in the leaf set which isthe destination of the request. If the key does not fall within the range of its leaf set, therequest is forwarded using the routing table. In the case of a node failure or missing entryin the routing table, the request is forwarded to a node sharing a prefix of the same lengthas the local node and that is numerically closer to the key than the local node. The cost fora routing request in Pastry is O(logN) steps in the DHT with N being the number of nodesin the system.

Robustness. If a new node X wants to join the Pastry network it needs to inform theother nodes of its presence and initialize the routing table. For this it needs to know at leastone node A already participating in the network, e.g., based on some external mechanism.It then sends a JOIN message targeted at its own identifier. This request is then routedthrough the network to a node Z with the identifier numerically closest to X’s identifier.Upon receiving the request all nodes on the path from X to Z send their routing tables to Xwho uses that information to build an own routing table as follows: X most probably doesnot share a prefix with A, but may take the first row A0 of A’s routing table, since it doesnot need to share any prefix with these entries. As the message is routed via longest-prefixmatching afterwards, the nth node on the path to Z shares a prefix of length n. Therefore,A uses the nth row of that nodes routing table as its own nth row. X also receives the leafset of Z since Z is the numerically closest identifier and the leaf set is therefore suitable forX as well. The authors moreover assume X being in proximity to A and therefore identifyA’s neighbourhood set as suitable to initialize X’s neighbourhood set. Finally, X sendsits routing table to each node in the leaf set, neighbourhood set and routing table. Therecipients update their own tables based on this information.

Rowstron et al. [31] define that a node has left the network if its immediate neighboursin the identifier space can no longer communicate with it. To update their leaf set theneighbours of the failed peer contact the live peer with the largest index on the side ofthe leaf set the node was on and request its leaf set L′. L′ partly overlaps with the localnodes leaf set L. The local node then inserts a suitable peer of L′ into L after verifyingthat it is still alive by contacting it. To fix the routing table in case of a failing entry thelocal node contacts another peer in the same row as the failed entry. It asks that nodefor its entry at the position of the failed entry. To keep the neighbourhood set up-to-date

15

2 Foundations

each node periodically contacts all the entries of its neighbourhood set. If one of thesedoes not respond the local node asks other nodes to transmit their neighbourhood sets anddetermines the closest distance of an entry of these sets as its new neighbour.

2.5 Attribute Based Encryption

In ABE as introduced by Sahai et al. [6] the encrypter defines an access structure (AS) overa set of attributes, e.g., (’neighbor’ AND ’football fan’), and encrypts some data with thatAS. Each contact of the encrypter holds an attribute secret key (ASK) that grants access tothe data if it satisfies the condition of the AS, e.g., it contains the attributes ’neighbor’ and’football fan’. In addition to that, each user has an ABE master secret key (AMSK) and ABEpublic key (APK). Combined with standard public key cryptography (PKC) this approachcan be used to provide user privacy and security of personal data.

2.5.1 Encryption Routines

Defining a Relationship. In particular, to define a relationship between two users Aliceand Bob, Alice creates K = Alice.ASK′ f riend′ , the ASK that grants access to any content thatis published for the group ’friend’. Afterwards, Alice computes C = TEncrypt(Bob.TPK, K),i.e., encrypts K with Bob’s traditional public key (TPK) and transmits C to Bob who canthen decrypt C using his traditional private key.

Transitive Relationships. Defining groups based on groups defined by another user ispossible as well (transitive relationships). E.g., Alice may want to encrypt data to Bob’sfriends. To do so, Alice creates A = Alice.ASK′bob− f riend′ and encrypts it with the AS′ f riend′ using Bob.APK.

Assign Rights to Identity. Another routine is to assign rights to an identity. E.g., thismay be used to allow a certain user to write data onto a disk. To achieve this Alice updatesthe Access Control List (ACL) of the disk with (Bob.TPK, rights). If Bob then issues a storecommand, the disk starts a challenge-response protocol to authenticate Bob allowing himto write onto the resource afterwards.

Assign Rights to a Group. The same procedure may be applied to a group. Alice needsto create a new PKC key pair and encrypts this key pair with an AS defining the group. ThePKC key pair then becomes the group identity and can be treated by assigning rights to an

16

2 Foundations

identity as described above. To remove a user from such a group, re-keying is required. Inthe opposite case, i.e., a new user joining the group, a condition, e.g., keyYear ≤ 2009, canbe included in the AS to prevent newly joined members to read older data.

2.5.2 Publishing and Retrieving Data

Data is encrypted with a symmetric key. This key is encrypted with an ABE key corre-sponding to the group allowed to read the data with the group being specified by an AS.The encrypted data is then stored on the disk and to announce the data, references arecreated and distributed to their recipients by applications. These references contain in-structions on how to obtain the data (tag and disk fields) and decryption keys (key-tag andkey-store fields).

If a data item i is encrypted with the symmetric key s and a user u1 wants to read i,he needs to retrieve s using the key-tag and key-store fields in the reference. s will beencrypted with an AS in the domain of an APK and stored at H(AS, APK). Both the ASand the APK can be inferred from s. If u1 is allowed to read the data it may use its ASK todecrypt s and then use s to decrypt i afterwards. u1 also stores s along with its own publickey for future use.

If another user u2 wants to publish some data for a group of users specified by an AS inthe ABE domain of APK (this domain may belong to u2 or any other user) he looks up apossibly already existing key s on its own disk. If the key exists, u2 decrypts it using hisown public key. Otherwise, u2 constructs a new key s, encrypts it with its own PKC publickey and stores it on its own disk at location H(AS, APK). It also stores s encrypted by anAS and its APK at location H′(AS, APK) with H′ differing from H. Now every member ofthe group defined by the AS can access s. Afterwards, u2 encrypts the data using s and isthen able to publish a reference to the data and decryption key via an application.

17

3 Related Work

There is a number of existing works dedicated to the realisation of decentralized OSNs.These works focus on different challenges of such networks, ranging from data storageand the preservation of privacy to networking with mobile devices. In this chapter anoverview of the most important works on this topic is provided.

3.1 Safebook

Safebook [10, 11] is based on two design principles: Building a decentralized P2P structureto give users control over their data as well as privacy and trust management exploitingreal-life trust.

To achieve this, Safebook introduces a three-tier architecture as depicted in Figure 3.1:The user-centered social network layer that represents the members of the OSN and theirrelations, the P2P substrate that implements application services like look-up services andthe internet providing communication and transport services. Each party in Safebook isrepresented by a node which is seen differently as member, peer node or host node in eachof these layers. In addition to these layers, Safebook is built on several main components.

Matryoshka. Each node builds its own Matryoshka (an onion-like concentric ring ofnodes) at the social network layer. This Matryoshka consists of several shells built aroundthe node. The innermost shell consists of nodes that are direct contacts of the local node.Each of these nodes mirrors the local nodes data in encrypted form. Therefore, Safebookdoes not use any measurements on capabilities of nodes, but uses the social relations toother users as the single criterion to determine the best nodes for remote data storage. Theouter shells’ nodes do not necessarily have any relationship with the local (core) node buteach node in an intermediate shell has a social relationship to at least one node in the one-hop-neighbour shell(s). This way, radial paths are created on which messages targetedat the core node can be relayed to the innermost shell. The nodes in the outermost shellare entry points since they act as gateways for all data requests targeted at the core node.

18

3 Related Work

Figure 3.1: Safebook Architecture [10]

Since all nodes in the Matryoshka are only aware of their one-hop neighbours, this systemprovides privacy for the core node.

P2P system. Safebook uses a DHT as look-up system to find the entry points of a Ma-tryoshka. Currently, Kademlia [22] is used and the nodes are arranged according to theirpseudonyms.

Trusted identification service (TIS). The TIS is a centralized trusted third party guaran-teeing that each user gets no more than one unique identifier in each category of identifiers.Therefore, an out-of-band procedure, that relies on real-life mechanisms like face-to-facemeetings, is required. This way every node requesting membership has to know at leastone member on Safebook, that can verify its identity to create an account.

3.1.1 Evaluation

The most critical point in decentralized OSNs is data availability. Data availability inSafebook is subject to the span factor and the number of shells in the matryoshka. It isfound that with a span factor of 2 there need to be 13 mirroring contacts with 3 matryoshkashells - or 23 with 4 matryoshka shells - in the innermost shell to reach an availability of90% for a path from at least one entrypoint to at least one mirroring node. Therefore, thereis still a 10% failure rate for a data request with these numbers even with an uniformedavailability distribution of nodes, as each node is supposed to be online 30% of the time.

19

3 Related Work

This can only be reduced by increasing the number of mirroring nodes in the innermostshell. However, the authors do not evaluate their system for availability rates approaching100%.

3.2 PeerSoN

PeerSoN [8] is another approach at a decentralized OSN. The main goals of PeerSoN areovercoming the privacy issues while preserving the features of existing OSNs and reducingthe connectivity requirements to participate in such a network. To achieve privacy, Peer-SoN relies on an existing Public-Key Infrastructure (PKI) in the current state of the project.Furthermore, users are able to exchange data when they meet and carry data for each otherto spread the information through the network, reducing connectivity requirements.

The PeerSoN prototype implements a two-tiered architecture in which one tier servesas a look-up service while the other consists of the peers and contains the user data (e.g.,profiles). For the look-up service OpenDHT [30] is used, which in the authors point-of-view should be replaced by an own DHT in the future due to its deficiencies: In OpenDHT,entries are limited in space and lifetime. Not all the nodes of the OSN are responsible forentries in the DHT. This is because some nodes are more suited for this task than others,say, mobile phones and other devices with fewer resources.

To achieve high data availability, the authors propose a selection based on cliques withmutual storage agreements [32]. The selection is based purely on the locally known avail-ability of the nodes. Data availability of above 99% is achieved in simulations with thisapproach.

3.3 Persona

Persona [2] presents a solution to decentralized OSN while strongly focusing on privacyin OSNs relying on encryption mechanisms. Persona uses group key management basedon a combination of traditional PKC with ABE to preserve the user’s privacy as describedin Chapter 2.5. The keys generated by the corresponding cryptographic operations areused to encrypt two types of objects, namely user data, e.g., personal information, andabstract resources, e.g., a user’s Facebook-like wall. Users store their encrypted data ontheir storage spaces (SS) – which is an abstract resource as well – for groups of users they

20

3 Related Work

can define themselves. For each abstract resource there is an ACL in which the ownerof the resource may define different levels of access for different groups. Each user isidentified by his public key and exchanges this key with socially related users via an off-band mechanism.

3.3.1 Applications

Users interact via applications in Persona. Applications have to store the metadata theycreated on some storage service, which may be offered by the application itself or by theuser. Therein lies a major weakness of Persona: Without earning information about itsusers, that can be used for advertising, a storage service would not have any incentive toprovide free storage space. As the authors themselves state, current OSNs are partly popu-lar because they are free. It is very doubtful that users would choose a costly OSN over thecurrent free OSNs. It is also unlikely that enough users can provide reliable storage servicesthemselves to maintain the system. The authors propose a system with advertisement viathe Doc application (see below) to solve this. The user does not need to trust the storageservice to keep his data confidential as he relies on the encryption routines described aboveto achieve this. Instead he trusts the SS to store and provide the data reliably on requestwhile preventing unauthorized deletion or overwriting. The SS application provides putand get methods. To execute the former, the requesting user has to authenticate via theACL and the corresponding challenge-response protocol while the latter does not needauthentication as the data is encrypted and only authenticated users are able to decrypt it.

The authors identify multi-reader/writer applications using collaborative data like theFacebook wall as the main method of communication of existing OSNs. They presentDoc, a generic multi-reader/writer application template that can be used to implementseveral of these applications. Doc maintains a Page which other users may read or writeon depending on their access rights to the Page which is again organized by the use of ABEas described above. For instance, to grant read access, Alice produces an ASK that allowsBob to read the data to the extent allowed by the attributes used for the ASK (i.e., Aliceuses the define relationship routine).

3.3.2 Evaluation

The authors evaluate their implementation using real data of over 65,000 profiles obtainedfrom a Facebook crawl. They parse these profiles into single data units to be able to encryptvery small amounts of data if the user wishes to do so. They also measure the size of each

21

3 Related Work

of these data items and find out that, while most of the data items are small, many pagescontain large items as well.

Furthermore, experiments are conducted for two states of Persona, called cold andwarm. The cold scenario represents the initial state of Persona which means that groupsymmetric keys must be retrieved from a SS and decrypted afterwards, whereas the warmscenario represents the state in which all keys have been cached. The cold scenario is runthree times with an increasing number of user-defined groups (1, 10 and 100 groups). Thedata items of each Facebook profile are encrypted and stored. Following this, a page con-taining references to all the data items is retrieved. Finally, these references are resolved.This includes the fetching of the required keys for the cold scenario and the fetching of alldata items for both scenarios. Then the data is decrypted, signatures are verified and theprofile page is rendered.

For these experiments Baden et al. [2] find that the time needed to load a page increaseslinearly with the number of elements on that page. The median page load time is 2.3 sec-onds with the maximum load time being 13.7 seconds. Furthermore, the cold and warmscenario load in similar time. From this the authors conclude that retrieving decryptionkeys is not too expensive. Also, the splitting of the data into very small data items rep-resents the worst case, as users may group items in the reality. If that is the case, fewerfetches with less Round-Trip-Time (RTT) would have to be conducted. Moreover, the au-thors find the size of the encrypted data being substantially larger than the data in plaintext, which leads to greater requirements at storage and network resources. Finally, experi-ments on an iPhone 1G are conducted to test Persona on a device with less computationalpower and battery life. They conclude that a mobile Persona is generally possible, basedon measurements of encryption times. However, the authors do not evaluate Persona onthe iPhone in terms of bandwidth and power consumption, two limiting factors in mobilecommunications.

3.4 Diaspora

Diaspora is another approach to a decentralized OSN and was announced in Spring 2010as a project of four US students with the source code of the first version being releasedin September 2010.1 However, this version was criticized because of several weaknesses,

1http://www.joindiaspora.com/2010/09/15/developer-release.html, retrieved Nov. 8th 2010

22

3 Related Work

e.g., non-scaling and the usage of uncommon technologies like Ruby/MongoDB.2 A newversion is scheduled for October 2010.

3.5 Haggle

Haggle [25] is a networking architecture that deals with opportunistic communication be-tween mobile devices. The Haggle project3 itself is funded by the European Union (EU)and operated by around 25 researchers distributed among 8 project partners. It was startedin 2006 and predicted to last four years until 2010. Haggle was used in the MobiClique [27]experiments to evaluate mobile social networking regarding DTN.

3.5.1 Architecture

In Haggle a data-centric, layerless, event-driven architecture that provides functionalityfor neighbour discovery, resource management, data storage and data dissemination in anopportunistic environment is proposed. Applications can be built on top of Haggle and donot need to care about this functionality anymore.

In traditional host-centric networks hosts are able to look-up the address of the receiverand send data there using the underlying infrastructure. In an opportunistic environmentwithout connection to the infrastructure like DTNs, such a look-up is not possible at alltimes. Therefore, nodes map data to other nodes interested in that data as they are encoun-tered in a data-centric network.

To achieve this, the concept of a data object (DO) is introduced in Haggle. A DO is atuple (metadata, data) where data is, e.g., an email or a picture, and metadata a set of at-tributes of the form name = value. The concept of the DO replaces traditional packets withpacket headers being replaced by unified metadata. Each node then maintains a weightedrelation graph of all its DOs where the weight of a relation increases with the number ofattributes two DOs share. In this graph, DOs can also represent nodes. These DOs arecalled node descriptions (ND) and may be related to other NDs meaning these two nodesshare a number of interests or to normal DOs meaning the node corresponding to the ND isinterested in the DO. There are two different resolution primitives triggered in the relation

2http://www.diaspora-news.net/2010/09/16/first-impressions-of-diaspora-dev-preview/, retrieved Nov.8th 2010

3http://www.haggleproject.org/, retrieved Nov. 8th 2010

23

3 Related Work

graph as shown in Figure 3.2: Either an application inserts a new DO or a DO is receivedfrom a neighbour. In case of the former the node resolves those nodes that are interested

Figure 3.2: Haggle Resolution Process [25]

in the DO (i.e., the NDs that have a relation with the DO in the graph) and pushes the datato these nodes in order of rank, where the rank is determined by the weight of the relation.It is possible to limit the dissemination of the DO to the n top ranked nodes or by setting aminimum weight requirement, e.g., to prevent network congestion.

In case of the latter the node inserts the DO into its own graph and then resolves the DOsmatching the received DO. The matched DOs are then sent to the corresponding node inorder of rank. This is because connectivity might not be available for a long time and thebest matches should be prioritized. E.g., this resolution happens if a node receives a NDfrom its neighbour.

Forwarding. In Haggle, there are two ways of forwarding DOs to other nodes. Firstly,there is interest forwarding, in which DOs are sent to an interested node as it is encoun-tered. This is epidemic dissemination within the interest community. Secondly, there isdelegate forwarding, in which a DO may be forwarded to a node that is not member of theinterest community if it is likely that this will improve the dissemination within the interestcommunity. For instance, this may be used, if an interest community is not connected. Del-egate forwarding is realised in PROPICMAN [24], the forwarding protocol implementedin the Haggle reference implementation.

Node Architecture. To realise the data-centric concept, the authors propose a layerlessnode architecture as depicted in figure 3.3 consisting of a kernel, managers and modules.In this architecture the kernel implements an event queue (EQ) where managers or exter-nal sockets may register events. Moreover, it provides a data store (DS). The DS is read-accessible by all managers and stores the relation graph with all DOs, NDs and attributes.Each of the managers has tasks and responsibilities. According to these, a manager haswrite-access to some parts of the DS. Managers are also interested in events appearing in

24

3 Related Work

the EQ.

Figure 3.3: Haggle Node Architecture [25]

3.5.2 Evaluation

The authors conduct experiments focusing several main criteria of mobile communica-tions. Regarding power consumption they find that with WiFi in best performance modeand broadcast beacons sent every 5 seconds the battery of the used HTC Touch phone onlylasts 2-3 hours, while it lasts for 10-14 hours in best battery mode. With Bluetooth scansat an average frequency of 60 seconds the battery lasted 7-10 hours. They also reason thatBluetooth is more suitable due to the increased range over WiFi best battery mode andan acceptable trade-off between battery consumption and service capabilities. Regardingthe delivery success rate Nordström et al. [25] find that it highly depends on the interac-tion with other devices. For example, a phone isolated in another room at the office onlyreceived 20% of the data after 7 hours.

25

4 Architecture Design

As discussed in Chapter 1, centralized OSNs suffer from multiple deficiencies weakeningthe protection of an individual user from being exploited by the OSN provider. Moreover,the user’s participation in the OSN is prevented under certain circumstances. To dilutethese shortcomings, the design of a completely decentralized, socio-aware, P2P architec-ture as a foundation for the GEMSTONE project is presented in this chapter.

Firstly, use cases and requirements of an OSN will be reviewed in Section 4.1, reveal-ing the functionality an OSN should provide. Subsequently, there will be a discussion ofimportant design choices in Section 4.2, dealing with the question of how to realize therequired functionality. An overview of the system will be given in Section 4.3, describingthe relations between a node and applications as well as between the nodes themselves.Finally, the design of a single node architecture will close this chapter.

4.1 Use Cases and Requirements

The development of an OSN requires to reflect on the use cases of such a network. The usecases describe which actions a participant of the network may want to take. Based on theseuse cases, the required functionality of the network can be derived. In addition, generalrequirements on OSNs are discussed.

4.1.1 Joining the Network

To participate in OSNs, the users have to be able to create accounts and join the network,independent from a specific location.

Derived Requirements. The system has to take care of account creation and provide aprocedure to join the OSN. This procedure has to be available unconstrained by the user’slocation.

26

4 Architecture Design

4.1.2 Building a Social Graph

As social networks are mappings of the social contacts of their users, the users maintain re-lationships with each other. These relationships may be on different levels, since each usermaintains a couple of close relationships - for example to his family and a few close friends- but also a lot of superficial acquaintances. With time passing, relationships between userschange as new friendships are established (e.g., by looking up old friends), while othersare ceased.

Derived Requirements. Users have to be able to search for other users as well as toadd and remove friends. Moreover, it has to be possible to categorize friends (e.g., “closefriend”, “family”, or “colleague”).

4.1.3 Exchange of Social Context

A social network gives users the opportunity to exchange their social context after theyhave joined the network successfully. Users want to share a certain, flexible amount ofinformation about themselves with other users. This information contains personal profileinformation (e.g., age, interests, educational information, friend lists), pictures, videos ormemberships in groups. However, users might not want to share all of their social contextwith all users of the OSN, but only with their friends or family. As the social contextchanges over time, users want to be able to update their interests, educational informationet cetera from time to time. From another perspective, users want to access the informationshared by others.

Derived Requirements. In an OSN the social context of a user needs to be stored in away that guarantees availability. Functionality to update the data is required. Accessingthis data has to be possible in a fine granularity based on the type of the social relation be-tween the data requester and the data owner. Users must be able to request the informationthat is shared by other users.

4.1.4 Group Contents

As mentioned above, users can be members of groups of which some may be private (i.e.,invite only). Inside of these groups, discussions related to the groups’ topic are held. Be-sides that, groups maintain a description of their purpose and a list of users who are sub-

27

4 Architecture Design

scribed to that particular group as publicly available information. One or multiple user(s)act as a moderator(s)/group admin(s).

Derived Requirements. The discussions need to be available and up-to-date for allgroup members at any time. Content and group structures must not be lost as a groupmember leaves the group. There has to be at least one admin/moderator at any time.Private groups need some kind of invitation/authentication process.

4.1.5 General Requirements

Besides requirements based on the use cases described above there is general functionalitythe system should provide.

Privacy. Each user should keep control over his data and decide which information hewants to share. Hence, the sensitive data (e.g., social context, user status messages) of usershas to be stored in a way that preserves privacy while satisfying the above requirements.While being sent from one user to another, no one should be able to read the content of amessage.

Synchronisation. Users of an OSN may have multiple devices. These devices might bemobile (e.g., mobile phone, laptop) or stationary (e.g., desktop computer). To get the mostrecent data available on all the devices of a user a synchronisation process is required.

4.2 Important Design Choices

This section discusses which important design choices the development of GEMSTONEfaces to fulfill the requirements derived in Chapter 4.1. This includes decision-making interms of decentralized data storage, data encryption and data dissemination.

4.2.1 Decentralized Data Storage

Due to the decision in favor of decentralized data storage as pointed out in Chapter 1, anoption on how to realize that type of storage has to be chosen. There are several optionsavailable.

28

4 Architecture Design

(1) Maintenance of own Data only. In this approach, each user only maintains his owndata on his local device(s).

(2) Personal Storage Space for each User. This approach is implemented by Persona[2] as described in Chapter 3.3. Each participant of the decentralized OSN can choosea personal storage space to store his data on. Access to this storage space is granted ordenied based on an ACL. The storage space is a highly available node, most likely a server.

(3) Storage of Data in DHT. A different solution on where to store the data is to com-pletely store it in a DHT. Since each DHT consists of (key,value) -entries, each user storeshis data as value field for his globally unique key.

(4) Storage on Mirroring Nodes. A modification of the above procedure is to use mir-roring nodes, that store the personal data on behalf of the local node. In this approach, theDHT serves as a look-up directory only, so that each requesting node can look-up informa-tion on where to find the requested data, if the local node itself is not reachable.

Some of these approaches have major weaknesses: In (1), data is not available as soonas the maintaining user is not online anymore. Given that users in an OSN are onlineonly periodically [15], this leads to low data availability. In (3), all profile data has to bemoved to a different node, as soon as the responsible node for that entry quits the DHT. Inscenarios with high node churn, this leads to huge amounts of traffic within the DHT andis therefore not feasible, especially in mobile networks, in which nodes may have limitedresources. This effect is worsened as mirroring in the DHT is still needed for the case ofunexpected leavings of nodes.

(2) seems to be a well thought approach to guarantee high data availability withoutproducing a lot of overhead. However, it is unlikely that every user can provide a reliablestorage service to maintain the system. This would only fit the technically affine users ofthe OSN. As described in 3.3, the realization of this data storage procedure would requirepayments of users or at least advertisements for those users, who choose a public storageservice. In current centralized OSNs, advertising is done in a personalized fashion, whichis more valuable for investors [16]. If personal data is stored in an encrypted fashion likeproposed in [2], it is no longer possible to personalize it. Hence, it is unlikely that investorswill switch from current OSNs to Persona storage services.

Therefore, we follow (4) for our system. Each node selects a number of mirroring nodesto achieve high data availability while not overburdening the DHT. For details on how thestorage process is designed, please refer to 4.3. However, it is even possible to implementa system similar to (2) in GEMSTONE. Please refer to Chapter 6 for further details.

29

4 Architecture Design

4.2.2 Using Attribute Based Encryption

How to deal with personal data is one of the most important questions in OSNs. In acentralized OSN, all data is stored on the OSN provider’s servers with the provider beingresponsible for the security of that data. As pointed out in Chapter 1, there were numerousleakings of private data in the past. In a decentralized OSN, all data is basically accessiblefor everyone. This is why encryption is needed in our system.

One option to achieve encryption is by relying on a Public Key Infrastructure (PKI). Eachuser has a secret/public-key pair to encrypt and decrypt profile data. However, this onlygrants access to personal data on a “all-or-nothing” base: A requesting user can either ownthe right to access the personal data of a user completely or not own any rights to accessany data of that user at all.

Since we want to provide fine access granularity, PKI alone does not suffice and needs tobe combined with ABE as introduced in 2.5. This way, a user can determine what portionof his personal data can be accessed by a certain group of users based on an attribute hecan define himself (define-relationship encryption routine). Also, (private) groups can bedefined by encrypting data with the assign rights to group routine.

4.2.3 Data Dissemination

Users should be able to participate in our system independently from their location. There-fore, decisions on how to communicate if infrastructure is available as well as in the case ofonly intermittent connectivity have to be made. Also, the form of communication betweenapplications has to be determined.

Use Available Infrastructure or DTN Routing

Our design choice for the case of available infrastructure is easy: If such infrastructurecan be used, it is used prior to any opportunistic networking that may be available aswell. If, however, no infrastructure is available, DTN forwarding is required. We choseto integrate Haggle [25] and its PROPICMAN [24] forwarding algorithm as described inChapter 3.5 over implementing a DTN forwarding algorithm ourselves. This way, we caneasily integrate DTN routing. If, however, new research questions arise, it is still possibleto integrate an own algorithm into Haggle, if required. A detailed description on how data

30

4 Architecture Design

is disseminated can be found in Chapter 4.4.

Transparent Data Exchange

Applications running on top of GEMSTONE should be able to implement their own func-tionality that goes beyond the functionality of our system. GEMSTONE allows data ex-change transparent to the middleware between two nodes running the same application.Therefore, each application is able to create arbitrary content that is forwarded to its desti-nation by the middleware without requiring the middleware to be aware of the structureof the content or the content itself. For details on the implementation of this design choiceplease refer to Chapter 5.

4.3 System Overview

With our design choices in mind, we propose GEMSTONE, a fully decentralized, privacypreserving and socio-aware middleware. This section presents a general system overviewdescribing two core relations: Firstly, the relation of nodes among each other and secondly,the relation between nodes and applications. Those relations will be discussed in Chapters4.3.1 and 4.3.2 respectively.

4.3.1 A Decentralized System

Since we chose to develop a decentralized system in 4.2, there is no central server inGEMSTONE, as opposed to current OSNs like Facebook or Twitter. As Figure 4.1 shows,the nodes alone form the network in GEMSTONE. A node can be run on any form of anetworking-enabled device. This may be a desktop PC or laptop as well as a PDA or mo-bile phone.

Also, highly available nodes with appropriate resources (e.g., servers) may act as a GEM-STONE node. The effect of this is discussed in Chapter 6. Note however, that this is dif-ferent from a server in a centralized OSN, since such a server is logically a standard nodewithout any privileges like access to personal data (due to the encryption of this data). Thisguarantees the privacy of user data and eliminates the problem of personal data misuse byan OSN provider. Therefore, providers pursuing commercial interests based on user data

31

4 Architecture Design

will not allocate any highly available nodes. However, research has shown that there arealtruistic users in P2P networks [13, 17] who might want to act as such a node withoutpersonal gain.

All nodes together form the system in a ring structure. However, only nodes with con-nectivity to the infrastructure form the DHT ring serving as directory as described in Chap-ter 4.2, while nodes only connected to the overlay via DTN do not execute any adminis-trative tasks. This is because such nodes are non-reliable and would therefore decrease thesystem’s robustness. Still, these nodes can participate in GEMSTONE without connectivityvia DTN forwarding.

GEMSTONE

App1 App2 App3

GEMSTONE

App1 App3 App4 App5

GEMSTONE

App1 App2 App3

GEMSTONE

App2 App3

GEMSTONE

App1 App2 App5

No Apps Active

GEMSTONE

Figure 4.1: The GEMSTONE System

32

4 Architecture Design

4.3.2 Middleware and Applications

Each node may run several applications on top of GEMSTONE as shown in Figure 4.1.These applications only communicate with the middleware. The middleware handles allapplication requests and if necessary forwards requests to other nodes. These requestsinclude all functionality discussed in Chapter 4.1, e.g., adding or removing friend relations,searching for other users, requesting profiles of other users or update the social context ofthe node.

Thanks to transparent data exchange as introduced in Chapter 4.2, the requests reachingbeyond the functionality of GEMSTONE can be of arbitrary form, only constrained by theability to be decoded by the application on the other end of the request.

Finally, note that a node does not have to run any application, e.g., if the node wants toact as an altruistic node as described above.

4.4 Node Architecture

Now, as the architecture of the system as a whole was described above, this section presentsthe architecture of a single node. A GEMSTONE node consists of several modules as de-picted in Figure 4.2, each module of which is responsible for one well-defined area of tasks.In our design, there are currently four modules which will be described below. For the ac-tual implementation of these modules in the GEMSTONE prototype, please refer to Chap-ter 5.

Infrastructure

Application

Data Dissemination

Decentralized Data

Storage

DTN

Se

cu

rity

&

Priva

cy

Tru

st

So

cia

l

Re

latio

n

His

tory

Decision

Social Graph Maintenance

Node

Selection

Figure 4.2: GEMSTONE Node Architecture

33

4 Architecture Design

4.4.1 Social Graph Maintenance

The Social Graph Maintenance is responsible for all requests dealing with manipulationsof the social graph. This includes adding, deleting and editing of relations.

Adding Relations

Since relations are mutual in an OSN, the addition of a relation includes a process of re-questing, confirming and denying relations as well. Therefore, a friend request has to beforwarded to the target of the new relation. On the other side of the request, the user has tobe informed of the requested relation. The middleware has to act according to the decisionof the user, who might confirm or deny the relation. In both cases, the decision must be for-warded to the issuer of the request and be dealt with on the issuer’s side. This procedureis depicted in Figure 4.3.

GEMSTONE

App1App2App3

GEMSTONE

App1App3

(2) add uncofirmed relation to local social graph

(3) Send friend request to target

(1) User adds a new relation via App2

(4) add unconfirmed relation to local social graph

(5) present request to user

(6) update local social graph according to decision

(7) inform requester about decision

(8) update local social graph according to decision

(9) present decision to user

Figure 4.3: Relation Request Process

Deleting Relations

The deletion of a social relation is performed analogously to the process of adding one.However, ending a relation does not require the confirmation of the target.

34

4 Architecture Design

Editing Relations

In our system, editing a relation means to change the category of a relation. E.g., thiscan happen as a work colleague becomes a good friend. Editing relations is required toachieve flexible granularity of data access as described in Chapter 4.2, since access is basedon attributes (i.e., categories) a related user owns. Therefore, an application informs themiddleware of the changed relation, which then updates the social graph. As relations areassessed differently among users, this is a purely local process.

Access to Data

With ABE in mind, all manipulations on the social graph require the assignment of ad-ditional or fewer rights to users as described in Chapter 2.5, depending on whether therelation to a user was intensified (adding, editing “upwards”) or loosened (deleting, edit-ing “downwards”). This is handled by the Privacy module specified in Chapter 4.4.3.

Synchronization with Data Storage Module

Each change in the social graph is also an update on the basic GEMSTONE profile. As anupdate on the basic profile always implies restoring it on the mirroring nodes (see Chapter4.4.2), the Data Storage Module is informed of any updates.

4.4.2 Decentralized Data Storage

This module handles all tasks required to achieve a robust system with regards to datastorage in a decentralized fashion. It is responsible for all operations on the DHT and fordistribution of the node’s data to its mirroring nodes as well as storing data for other nodes.Furthermore, it provides functionality to update the local node’s profile.

DHT Operations

Operations on the DHT are multifaceted. Firstly, the module realizes the join-requirementdescribed in Chapter 4.1 via a procedure that joins a node into the overlay. Therefore,

35

4 Architecture Design

the node needs to know at least one member of the overlay in order to bootstrap. This isachieved by a list of bootstrapping nodes that is delivered with the system. The node ishenceforward identified by a SHA-1 hash of it’s public key as unique social ID.

Secondly, the module implements a look-up functionality for the DHT which is used tolearn about other participants of the network, their interfaces and their mirroring nodes.This way, nodes can request personal data from the mirroring nodes of their request target.The look-up functionality is also important for the node selection process as presented inChapter 6, as nodes learn about candidates for decentralized data storage this way.

Thirdly, the module takes care of keeping the DHT entry of the local node up-to-date, sothat other nodes can communicate with the local node after looking this data up. Moreover,the list of mirroring nodes maintains information where to send data destined for the localnode in case of its absence.

Mirroring of Personal Data

One of the central aspects of GEMSTONE is that personal data of the local node N is mir-rored at a set of nodes MN = {M1,...,Mn} to achieve high data availability. The Decentral-ized Data Storage module handles this task in the following way: N requests to store its

N

R

MnM1

Entry: (socialID(N),socialID(M1);…;socialID(Mn))

(1) N requests to store data at MN = {M1…Mn}

(2) MN send ACK

(3) N sends data directly to MN

(4) N stores info about MN in DHT on

responsible node

(5) N disconnects

(6) R fails to connect to N

(7) R looks up MN in DHT

(8) R requests data from each node in MN

...

(10) N reconnects, looks up MN in DHT

(11) N gets profile and

updates from any node in

MN

Figure 4.4: Mirroring Process

profile at each node in MN as depicted in Figure 4.4. After receiving an ACK it directly

36

4 Architecture Design

sends its profile to each node in the set (Steps 1-3 in Figure 4.4). Afterwards, N insertsinformation about which nodes are mirroring for it in the DHT (Step 4). The inserted (key,value) -pair is (social ID(N), social ID(M1); ...; social ID(Mn)). In order to keep data trafficlow, N keeps information about MN in its local storage. That way, N does not need toupdate the DHT after every update on its profile, but can do so only if MN has changedby comparing the local information about MN with the one stored in the DHT.

If N is now disconnected a requesting node R will not be able to request the profiledirectly from N. R will then look-up MN in the DHT (Step 7) and retrieve the data fromone of the Mi ∈ MN (Step 8).

As N reconnects it can not simply request its profile from Mi ∈ MN in its local storage.This is due to a synchronisation problem: N might have used another device inbetweenand the information about MN might be outdated on the particular device N is using atthat moment. Therefore, N needs to look-up MN in the DHT again (Step 10) and requestits own data from one of the Mi ∈ MN afterwards (Step 11).

At any time, one or multiple of N’s mirroring nodes can be unavailable. This reveals theproblem of synchronisation between the mirroring nodes, as the versions of N’s personaldata may differ among them.

Update Concept

However, this problem is solved in GEMSTONE by the concept of updates: Each node,that mirrors the personal data of another node also stores updates destined to that node.These updates are request objects which, e.g., indicate a new friend request. Since eachMi ∈ MN has mirroring nodes itself, these nodes store updates destined for N as well, asthese updates are encapsulated in requests to Mi.

If Mi now reconnects, it receives all updates destined for itself, but also all updates des-tined for the nodes it mirrors for. This way, all mirroring nodes have the same tuple of(data, updates) for N at all times, if they are online.

Node Selection

Having the storage process with mirroring nodes in mind, the set of mirroring nodes MNneeds to be determined in a way that selects the most suitable nodes for this task. This has

37

4 Architecture Design

influence on multiple key properties of the whole system:

Scalability. The more available the data stored on each Mi ∈ MN is, the less replicaswill be needed in the system to achieve high data availability. Therefore, traffic decreases,as personal data has to be distributed to less nodes. Also, if a node has established a wellknown set of nodes, it does not need to update its mirroring node information in its DHTentry frequently.

Robustness. The more suited the nodes are for the mirroring task, the lower the chanceof a lost profile is. This increases data availability and therefore robustness.

The node selection process is one of the main research questions of this thesis and isdiscussed in Chapter 6. Group structures and content can be stored at mirroring nodes inexactly the same way.

4.4.3 Security and Privacy

Concerning security and privacy in our system, we propose to make use of already existentsolutions.

Preserving Privacy

To achieve privacy, GEMSTONE as well as Persona [2] relies on ABE. This provides accessto personal data that can be as flexible as needed among other advantages as pointed outin Chapter 4.2.

This means that the GEMSTONE basic profile can be split up in multiple pieces (e.g.,personal information, friend lists, groups). Each piece of data is encrypted with a symmet-ric key. This key is encrypted with an ABE key corresponding to the group (e.g., a certaincategory of social relations) allowed to read the data with the group being specified by anAS. Given this encryption routine, leaking of private data is improbable.

However, relying on ABE may face challenges. Since Persona [2] does not evaluate ABEon critical features of mobile phones, the encryption process might be to costly in termsof bandwidth and power consumption. Most importantly, social relations change overtime. This requires frequent re-keying, which might be to costly when considering mobilephones.

38

4 Architecture Design

Authenticity of Users

Another critical point is the authenticity of users: Basically, an impostor could fake theidentity of another person by providing wrong identity information and act maliciously inthat person’s name.

However, this is a general problem of OSNs, in which there is no required real-life au-thentication.1 A centralized real-life authentication as required in Safebook [10] would beobstructive to the deployment of a platform and furthermore contradict the intentions of acompletely decentralized system like GEMSTONE.

Our solution is to use a trust and reputation (TR) system where users can rate otherusers in terms of authenticity, behaviour and more aspects. However, the design of such asystem is out of the scope of this thesis, as it is prone to a number of attacks [34] and has tobe well-designed to be robust regarding these attacks.

4.4.4 Data Dissemination

Lastly, the Data Dissemination module handles incoming as well as outgoing data. Basi-cally there are two tasks that need to be covered to handle all data: Keeping track of activeinterfaces and the mapping of social IDs to addressable identifiers.

Keeping Track of Active Interfaces

The Data Dissemination module therefore checks which interfaces are active in intervals.This is required to provide the possibility of participation in the OSN even with only inter-mittent connectivity as indicated by Chapter 1. Depending on which interfaces are active,a node either joins the DHT ring (IP connectivity given) or tries to route messages via DTNforwarding based on Haggle [25] (no IP connectivity given). If multiple interfaces connectthe local device with the infrastructure (e.g., WiFi and 3G) the most efficient way is chosenin terms of parameters like bandwidth or battery consumption.

1http://twitter.pbworks.com/Fakers, retrieved Nov. 8th 2010

39

4 Architecture Design

N

R

Entry: (socialID(N),identifier1(N);…;identifierN(N))

(2) R wants to send data to N

(3) S looks up R’s identifier list in DHT

(1) N keeps an up-to-date list of ist addressable identifiers in DHT

(4) R sends data directly to N

Figure 4.5: Mapping of Identifiers

Identifier Mapping

In order to actually send data to another user of GEMSTONE via an interface requiresthe mapping of a social ID to an addressable identifier. The resolution process works asshown in Figure 4.5. A node N keeps an up-to-date list of it’s addressable identifiers inthe DHT. These entries have a certain Time-To-Live (TTL) for the case of a node beingdisconnected from the DHT or network respectively and therefore being unable to updateits own identifier list. If another node R wants to send data (e.g. a profile update) to Dit checks its own connectivity and simply looks up D’s available identifiers in the DHT.Afterwards, it sends the data relying on routing protocols of the network stack dependingon the technology used.

40

5 Implementation

This chapter describes the development of a first prototype of the GEMSTONE middle-ware. First, the goals of the implementation will be given. Afterwards, the implementa-tion of the first GEMSTONE prototype will be presented, followed by a discussion of itscapabilities. Also, a demo application utilizing the prototype will be shown. Finally, anexample of how this implementation could be deployed will be illustrated.

5.1 Implementation Goals

This section specifies which parts of the architecture can be realized while keeping this the-sis in scope. Hence, a discussion of the desired functionality of the prototype is required.

To stay in the scope of this thesis, not all of the modules designed in Chapter 4 canbe implemented in more than a rudimentary fashion. However, there is some requiredfunctionality to enable the platform itself:

• Any new node has to be able to join the overlay. Hence, the directory for looking upinformation has to be implemented.

• Nodes have to be able to communicate with each other. Therefore, at least one tech-nology for communication has to be implemented.

• Nodes have to be able to communicate with applications. This has to be possiblein an easy-to-implement way, since this is one factor favouring a successful deploy-ment of our system in the future. This is because developers will be discouraged toimplement any application if the development of applications is too complicated.

• Decentralized data storage has to be realized according to Chapter 4.4.2. Without aworking data storage process our system is not feasible.

• In addition to data storage, manipulations on the social graph have to be possible.

41

5 Implementation

The platform will not evolve with missing functionalities to, e.g., add relations.

On account of this, the following implementation of modules as described in Chapter 4emerges:

• Decentralized data storage will be supported.

• Social graph manipulations will be supported.

• Security and privacy is out of the scope of this thesis. Therefore, data will remainunencrypted.

• Regarding data dissemination, IP connectivity will be supported. The realization ofBluetooth forwarding is out of scope .

With this in mind, it has to be possible to easily exchange modules with versions imple-menting more functionality. Therefore, the prototype has to be implemented in a modularfashion. Also, it has to enable transparent application communication as introduced inChapter 4.2.

5.1.1 Portation to Mobile Phones

Even though the first prototype will not be required to work on mobile phones, GEM-STONE has to run on these devices in the future. The implementation should thereforebe easily portable to at least one platform. Hence, a choice for a mobile platform and aprogramming language has to be made.

With the openness of a system and the size as well as the activity of its community askey factors, we chose Android 2. The prototype implementation of GEMSTONE is writtenin JAVA, which is supported by the Android platform.

5.2 The GEMSTONE Node

This section presents the actual JAVA implementation of the prototype. For that purpose,we begin with an overview of the implementation before describing each module in detail.

42

5 Implementation

5.2.1 Overview

An overview of our implementation is shown in Figure 5.1. The implementation can bedivided into three blocks: 1) Internal object communication; 2) object exchange with othernodes; 3) communication with applications.

IPv4ConnectionHandlerBluetoothConnectionHandler IPv6ConnectionHandler)

ObjectHandlerGem(interface between

GEMSTONE and different

connection technologies)

GemManager(handles Gemstone-

Objects issued by

modular Gems)

NodeSelectionGem(determines where data is

stored)

SecurityGem(provides encryption

mechanisms)

ApplicationGem(interface between

GEMSTONE and

applications )

SocialGraphGem(manages social relations

and group structures)

DataStorageGem(manages decentralized

data storage)

InterfaceGem(manages addressable

interfaces)

Application A Application N

ASCII-Protocol

via TCP Sockets

Object Exchange

Between Gems

and GemManager

Object Exchange

with other nodes

Object Exchange

with other nodes

...

...

Figure 5.1: GEMSTONE Implementation Overview

43

5 Implementation

Object Communication

The most important concept in our implementation is Object Communication. This impliesthat all communication within the GEMSTONE middleware is achieved via the exchangeof objects. This is valid for internal communication between modules as well as for externalcommunication with other nodes.

Each module is able to handle the sort of objects it is responsible for. Besides this, eachmodule can generate any object destined for another module or for another node. All ob-jects generated by modules or retrieved from an external node are handled and distributedto their destination module or node by the GemManager class.

The objects exchanged are implemented in the GemstoneObject class. Each object hasthe following fields:

• objectId - The object ID, pseudo-randomly generated

• timestamp - The object timestamp

• versionId - The GEMSTONE version ID

• isInternal - a flag indicating internal or external processing

• targetId - The social ID of the target

• sourceId - The local node’s social ID

• command - indicates which functionality of the handling module should be called

• targetAddr - The target’s addressable interfaces (may be null)

• sourceAddr - The local node’s addressable interface (may be null)

• appId - the creating application’s ID (may be null)

• payload - the payload to be sent (may be any kind of object) (may be null)

To put additional information into a GemstoneObject some of the modules implement sub-classes of GemstoneObject. For details, see the description of each package in Chapter5.2.2.

44

5 Implementation

Object Exchange with Other Nodes

Since all communication is done via objects, these objects are also exchanged with othernodes. This is done by the ObjectHandlerGem class as depicted in Figure 5.1. As there aretwo directions of data exchange, these are treated as follows:

(1) Outgoing Objects. The ObjectHandlerGem retrieves an object destined for an ex-ternal node from the GemManager. If null, the target’s addressable identifiers have to belooked up and inserted into the object. Depending on these identifiers, the ObjectHand-

lerGem chooses a technology available and transmits the data via the appropriate Connec-

tionHandler. If the target is unavailable, it stores the object as an update at the target’smirroring nodes.

(2) Incoming Objects. If a node receives an object, this is done by one of the Connec-

tionHandlers as well. These handlers forward the object to the ObjectHandlerGem whichstores all incoming objects in a queue that is accessed by the GemManager.

Details on both are presented in the description of the Object Handling Package in Chap-ter 5.2.2

Communication with Applications

Communication between the middleware and applications in GEMSTONE is implementedvia a text-based ASCII protocol. This protocol is executed between applications runningon top of our system and the ApplicationGem, which is then responsible for parsing theprotocol data into GemstoneObjects and forwarding them to the GemManager. For detailson this, the reader may refer to the description of the Application Package in Chapter 5.2.2.

5.2.2 Implementation of Modules

Considering the concepts of our implementation, this section presents the implementationof each module by one or multiple packages:

• Decentralized Data Storage: core.datastorage, core.nodeselection

• Social Graph Maintenance: core.socialgraph

45

5 Implementation

• Security and Privacy: Not implemented in the first prototype.

• Data Dissemination: core.connection, core.interfaces, core.objecthandling

All packages except for the Core Package are organized in the same way: Each packageprovides a JAVA interface which requires the implementation of methods that are essentialto run the system. These interfaces are called Gems. In each package, there is one im-plementation of the corresponding Gem. To stay in the scope of this thesis, some of theseGems will be wrapper classes, providing the basic functionality required to run the system.However, it is easily possible to exchange the Gems for other implementations in the fu-ture. E.g., the BasicNodeSelectionGem that currently manages the node selection processcan be changed for a version providing a more sophisticated selection process as describedin Chapter 6.

Moreover, since communication in GEMSTONE is done via objects, some packages pro-vide extensions of the GemstoneObject as introduced in the description of the Core Pack-age below. This may be done to be able to provide information going beyond the capabili-ties of the GemstoneObject to the corresponding Gem.

This section is organized as follows: Each package and its functionality will be described.This includes a detailed description of each package’s main classes as well as interactionsbetween packages. At the beginning, the modules realizing all sorts of communication,internal as well as external (remote nodes and applications) will be discussed. Afterwards,a detailed view on the process of decentralized data storage will be given, followed by adescription of the implementation of social graph maintenance functionality.

Core Package

The UML diagram of the Core Package is depicted in Figure 5.2. This package includes theJAVA executable class Gemstone. When started, this class creates an instance of Gemstone-Node and GemManager. GemstoneNode is the GEMSTONE representation of the local node,including its PKI key pair and its Social ID, which is a SHA-1 hash of the public key.

GemManager is - as described in Chapter 5.2.1 - the central class in GEMSTONE. Thisclass is responsible for handling all instances of GemstoneObject. The GemstoneObject

class implements the serializable interface of java.io. This is required to enable thetransmission of these objects over Transmission Control Protocol (TCP) sockets. On cre-ation (i.e., start of the middleware), the GemManager follows this protocol:

46

5 Implementation

Figu

re5.

2:Pa

ckag

eC

ore

-UM

LC

lass

Dia

gram

47

5 Implementation

(1) Create all Gems. In GEMSTONE, each module is realized by one or multiple pack-ages, as described above. Within each package there is one class called Gem that executesthe functionality the package has to provide. The creation of an instance of each Gem isthe first responsibility of the manager.

(2) Determine all active interfaces. Afterwards, the manager determines all active in-terfaces via querying the InterfaceGem.

(3) Join the overlay and listen to incoming connections. By executing functionality ofthe DataStorageGem, the manager joins the node into the overlay. After finishing the joinprocedure, the manager commands the ConnectionGem to listen for incoming requests onall interfaces.

(4) Retrieve personal data from mirroring nodes. Using the DataStorageGem again, themanager now determines which nodes are mirroring for itself and requests the transmis-sion of the local node’s basic profile and updates.

(5) Check mirroring status. The manager then checks if the local node is still mirror-ing data for the nodes listed in the mirroring directory. This is done by commanding theDataStorageGem to look-up the mirroring nodes of each node the local node currently mir-rors. If the local node is not listed as mirroring node anymore, it may delete the profile aswell as updates to save disk space. It is of no use to cash such a profile, as it may be out-dated on a later request. Afterwards, the node has reached a stable state and is successfullybooted into the overlay.

(6) Process objects. With reaching the stable state the manager enters the phase of pro-cessing objects. This processing is again done for outgoing and incoming objects. For theformer, the manager looks up the target’s addressable identifiers in the DHT using theDataStorageGem (if required) and forwards these objects to the ObjectHandlerGem. Forthe latter, the manager accesses the queue of objects provided by the ObjectHandlerGem.This queue is implemented as a blocking java.util.ConcurrentLinkedQueue to avoidany busy waiting. Each object flagged as internal - this may be an object generated onthe local node by any of the Gems or an externally received object - is processed in thefollowing way:

Firstly, it is checked, whether the object is targeted at the local node. If that is not thecase, the object must be destined to one of the nodes, for which the local node serves as amirroring node. In that case, the object is stored as an update for the targeted node on thelocal node’s disk.

48

5 Implementation

If the object is targeted at the local node, the manager determines its type (i.e., the re-sponsible Gem) using the java.lang.Object.getClass() method. As mentioned above,the objects are subclasses of GemstoneObject to provide more fields. Using this, the re-sponsible Gem can be easily determined. The object is then handled by the responsibleGem.

E.g., a remote node might initiate a friend request to the local node. The incoming objectis targeted at the local node and flagged as internal by the ObjectHandlerGem. The man-ager now executes java.lang.Object.getClass() on the object, which identifies the ob-ject as SocialGraphObject. The manager then forwards the object to the SocialGraphGem

for further processing. An example for end-to-end object communication (request of asocial relation) is shown in Figure 5.3.

Application Package

This package as shown in Figure 5.4 and the BasicApplicationGem class in particular (im-plementing the ApplicationGem interface) is responsible for communication with applica-tions running on top of the prototype. Since communication with applications is bidirec-tional, the Gem has to support both incoming as well as outgoing requests. These requestsare represented by a set of ASCII-commands as listed in Tables 5.1 and 5.2 respectively.

To support all requests, the ApplicationGem manages three request handlers:

(1) Incoming Requests. An instance of the IncomingRequestHandler class handles allincoming requests. After communication with the application is finished, the Gem createsa GemstoneObject containing the request and forwards it to the GemManager.

(2) Outgoing System Messages. An instance of the SystemDeliverer class handles alloutgoing messages that are system messages. Since all outgoing messages arrive at theApplicationGem as instances of a subclass of GemstoneObject a flagging of messages assystem messages is easily possible: An object representing a system message contains thevalue SYSTEM in its appId field. If however no application is running, the object is storedand will be delivered as soon as an application connects to the middleware.

(3) Outgoing Application Messages. All objects containing an appId value other thanSYSTEM are targeted at a particular application. The ApplicationGem maintains a list of allapplications that registered via the REGISTER_APP command and reacts to quitting applica-tions (EXIT_APP) by removing the affected application from the list. Therefore a mappingof objects to the correct application is easily possible. If that application is currently not

49

5 Implementation

Application issues

Friend Request via

ASCII protocol

BasicApplicationGem

creates appropriate

GemstoneObject

Object is handed to

GemManager

GemManager hands

object to

BasicSocialGraphGem

BasicSocialGraphGem

parses object and

updates XML file

BasicSocialGraphGem

creates request object

targeted at remote node

Object is handed to

GemManager

Object is handed to

BasicObjectHandlerGem

Object is handed to

BasicIPConnectionGem

Object is transmitted via

TCP sockets

Object arrives at Remote

Node’s

BasicIPConnectionGem

Object is passed to

BasicObjectHandlerGem

Object is passed to

GemManager

Object is passed to

BasicSocialGraphGem

BasicSocialGraphGem

parses Object and

updates XML file

BasicSocialGraphGem

creates

ApplicationObject

Object is passed to

GemManager

Object is passed to

BasicApplicationGem

BasicApplicationGem

parses Object

Friend Request is

delivered to Application

via ASCII protocol

ApplicationObject

(local node)

ApplicationObject

(remote node)

SocialGraphObject

Figure 5.3: End-to-End Object Communication Example

50

5 Implementation

Figu

re5.

4:Pa

ckag

eA

pplic

atio

n-U

ML

Cla

ssD

iagr

am

51

5 Implementation

running, the object is stored for later delivery. The application can request any updates byusing the TRANSMIT_UPDATES command.

The Gem detects what kind of outgoing message is at hand by parsing the command

field of the object in its handleObject() method. It then calls the appropriate deliverer toexecute the request.

All communication between the middleware and the applications is conducted via TCPsockets as specified in the java.net package. For each application a new thread is createdso that requests are always unambiguously assignable to the correct application. This isdone by the BasicAppSocketHandler class. This class implements the java.io.Runnable

interface to create a new thread for each connection with an application.

A sample request indicating a transmission of an instant message to another node (IN-STANT protocol command) is shown in Listing 5.1. Firstly, the Gem reads the target’s socialID from the application and acknowledges (ACKs) the receival. Afterwards, it reads thesize of the instant message and ACKs again. Finally, it reads the message itself. Thisrequest accounts for only 12 lines of code, an amount that is roughly the same on the appli-cation side. Requests in general need 10 to 30 lines of code on each side, depending on thecomplexity of the request. After completing the request, the Gem creates the appropriateGemstoneObject and hands it to the manager.

1 //read the t a r g e t of the message2 byte [ ] i n s t a n t B u f f e r = new byte [ 4 0 ] ;3 dis . read ( i n s t a n t B u f f e r ) ;4 S t r i n g t i d = new S t r i n g ( i n s t a n t B u f f e r ) ;5 //ACK r e c e i v a l of t a r g e t6 dos . w r i t e I n t (ACK) ;7 dos . f l u s h ( ) ;8

9 //read s i z e of o b j e c t to be t ransmi t ted in bytes10 i n t msgSize = dis . readInt ( ) ;11 //ACK r e c e i v a l of payload s i z e12 dos . w r i t e I n t (ACK) ;13 dos . f l u s h ( ) ;14

52

5 Implementation

Incoming RequestsCommand DescriptionREGISTER_APP A new application registersEXIT_APP An application quitsINSTANT Incoming instant messageDEFAULT Incoming application contentCLAIM_FOCUS An application claims to be the application focused by the userTRANSMIT_PROFILE An application requests the local basic profileUPDATE_PROFILE An application requests to update the basic profileTRANSMIT_UPDATES An application requests updates targeted at itselfREQUEST_PROFILE An application requests the profile of a remote userLOOKUP_USERNAME An application requests a DHT look-up on a user nameADD_RELATION An application requests to add a new social relationEDIT_RELATION An application requests to edit an existing social relationDELETE_RELATION An application requests to delete an existing social relationCONFIRM_RELATION An application has confirmed a social relationDENY_RELATION An application has denied a social relation

Table 5.1: Incoming Application Requests

Outgoing RequestsCommand DescriptionTo a particular application: (Application Objects)INSTANT Outgoing instant messageDEFAULT Outgoing application contentREQUEST_PROFILE Outgoing previously requested remote profilePULL_PROFILE Requests the app to request a new version of the local profileERROR Outgoing application error messageTo the focused application: (System Objects)RELATION_CONFIRMED Informs the user of a confirmed social relationADD_RELATION Informs the user of a new relation requestERROR Outgoing system error message

Table 5.2: Outgoing Application Requests

53

5 Implementation

15 // c r e a t e l a r g e enough b u f f e r according to o b j e c t s i z e16 i n s t a n t B u f f e r = new byte [ msgSize ] ;17 //read payload18 dis . read ( i n s t a n t B u f f e r ) ;19 //ACK r e c e i v a l of payload20 dos . w r i t e I n t (ACK) ;21 dos . f l u s h ( ) ;22

23 // c r e a t e o b j e c t to pass to GemManager24 Appl ica t ionObjec t i n s t a n t O b j = new Appl ica t ionObjec t ( ) ;25 i n s t a n t O b j . setPayload ( i n s t a n t B u f f e r ) ;26 i n s t a n t O b j . setAppId ( appId ) ;27 i n s t a n t O b j . s e t T a r g e t I d ( t i d ) ;28 i n s t a n t O b j . s e t I n t e r n a l ( f a l s e ) ;29 i n s t a n t O b j . se tSourceId ( l o c a l S o c i a l I d ) ;30 i n s t a n t O b j . setCommand (INSTANT ) ;31 //send o b j e c t to t a r g e t I d32 gemManager . handleObject ( i n s t a n t O b j ) ;

Listing 5.1: "Handling of the INSTANT request"

Object Handling Package

The GemstoneObject of Listing 5.1 is handed to the GemManager who then forwards the ob-ject to the BasicObjectHandlerGem - as implementation of the ObjectHandlerGem interface- since it is flagged as external in its isInternal field. As mentioned before, this packageas depicted in Figure 5.5 and the ObjectHandlerGem in particular is responsible for sendingand receiving objects and realizes the Data Dissemination Module.

(1) Sending Objects. To send an object to a remote GEMSTONE node, the Gem does thefollowing: Firstly, it tries to map the target’s addressable identifiers to its own, determiningthe appropriate technology to communicate with the target. It then hands the object to theappropriate ConnectionGem via the sendObject() method. In our basic prototype, this isalways an instance of the BasicIPConnectionGem, since other ways of data disseminationare not supported. This handler then sends the object to the target, in our prototype via aTCP socket.

(2) Receiving Objects. Each implementation of the ConnectionGem interface maintainsa ConnectionHandler. This handler listens for incoming connections on predefined ports(currently 12001). The handler receives objects and hands them to the Gem. The Gem putseach received object in a ConcurrentLinkedQueue accessible by the GemManager (receive-

54

5 Implementation

Figure 5.5: Package ObjectHandling - UML Class Diagram

Object() method). This queue blocks as long as it is empty to avoid busy waiting andpushes new objects to the manager as soon as available (dequeueObject() method).

Connection Package

In the previous section ConnectionHandlers were introduced. These handlers are main-tained by implementations of the ConnectionGem as shown in Figure 5.6. In our basic pro-totype, we provide connectivity via IP through the BasicIPConnectionGem’s functionality.

The BasicIPConnectionHandler implements the java.io.Runnable interface and cre-ates a new thread for each incoming connection to avoid any concurrency problems. TheGem itself provides functionality for receiving (receiveObject() method) and sending(sendObject() method) objects. We use an instance of java.io.ObjectInputStream toreceive and an instance of java.io.ObjectOutputStream to send the serializable objectsover TCP sockets.

Data Storage Package

The GemstoneObjects being distributed by the packages described above may also be ob-jects related to decentralized data storage. This package as shown in Figure 5.7 implements

55

5 Implementation

Figure 5.6: Package Connection - UML Class Diagram

all functionality to achieve the storage of personal data in a completely decentralized fash-ion as required in Chapter 4.2.

The most important class of this package is the BasicDataStorageGem, implementingthe DataStorageGem interface. This class provides functionality for all operations on theDHT as well as the local data storage management for mirrored profiles and updates forthe corresponding nodes.

(1) DHT Related Functionality. In our implementation, we decided to use FreePastry’sPast as DHT look-up directory, since it provides an easy-to-use JAVA package - which isimportant for the portation to Android - as well as maintenance of the project and a grow-ing user community1 to discuss development problems. Note, that other implementationsof the DataStorageGem can easily use different DHTs or even completely different datastorage techniques. However, for this thesis, when referring to the DHT, we refer to Paston FreePastry.

All DHT related functionality is executed by the GemManager. This may be in the progressof booting the node into the ring (see the description of the Core Package above) or as areaction to requests by an application, a remote node, or another Gem. The first methodcalled in the boot process is setupDHTParameters(). This method creates all objects that

1http://www.mobile-ent.biz/news/36426/Mobile-Entertainments-Guide-to-Android, retrieved Nov. 8th2010

56

5 Implementation

Figure 5.7: Package DataStorage - UML Class Diagram

are required by FreePastry to boot the pastry node into the ring.

The joinOverlay() method provides the functionality to join the DHT ring. This isdone via a bootstrapping node. In our system, there is a list (in form of a SQLite2 database)of bootstrapping nodes, from which the local node can choose. These nodes are highlyavailable nodes, e.g., university servers. Although there is no guarantee that at least oneof these nodes is online at any given time, this is the most reasonable approach to allowconnectivity to the DHT with very high probability.

Further important functionality includes the lookup() and update() methods. As thename suggests, lookup() returns the value of the entry corresponding to the looked-up keyin the DHT, while update() either inserts a new (key,value) pair into the DHT or updatesan existing pair to a new value. E.g., an update is executed as the mirroring nodes for thelocal node change.

In Past on FreePastry, (key,value) pairs are stored as PastContent objects. To be ableto store such objects in the DHT, PastContentObject implements the past.PastContent

interface by extending past.ContentHashPastContent.

2http://www.sqlite.org, retrieved Nov. 8th 2010

57

5 Implementation

(2) Data Storage Management. This kind of functionality is called to send, receive andstore data related to decentralized data storage. The requestProfile() method requestsan arbitrary GEMSTONE profile from a given node’s mirroring nodes, while sendProfile-ToRemoteNodes() sends a profile as specified in the requestId field of a GemstoneObject

to the remote node specified in the payload field of the object. The payload of a received ob-ject (profile) is parsed into a new local XML-file by executing receiveProfile(), therebymirroring the data for the node specified in the sourceId field of the object. This methodis also used to receive the local node’s profile itself from its mirroring nodes. Furthermore,storeProfileAtRemoteNodes() stores a profile at a number of remote nodes specified inthe payload field of an object. These nodes are recommended by the NodeSelectionGem

as described below. Finally, sendUpdatesToMirroredNode() sends updates that were mir-rored for a remote node to that node as a reaction to an object containing a REQUEST_UP-

DATES command.

Moreover, each node can check if it is still mirroring for all the nodes corresponding tothe profiles it has currently stored by using checkMirrorStatus(). This way, no longerrequired profiles and updates can be deleted to free disk space.

Node Selection Package

This package as depicted in Figure 5.8 is also part of the implementation of the Decen-tralized Data Storage module. Its task is to select the most suitable nodes from the list ofknown nodes. This is of importance as the robustness, scalability and acceptance by theusers of the GEMSTONE system depends on how these nodes are chosen. If data avail-ability is low, users will not switch from centralized OSNs to GEMSTONE. If, however,the number of mirroring nodes required to achieve robustness, the system will not scaleproperly. To do so, the BasicNodeSelectionGem - implementing the NodeSelectionGem

interface - provides two methods: selectNodes() selects the most suitable nodes from aSQLite database stored on the local device. Note that the node table in that database isbound to a device and not to the user. This is important because different devices of a sin-gle user may favour different nodes for decentralized data storage. E.g., a mobile phoneused during work hours may select other nodes than the stationary computer used in theevening since both are used at different times of the day, as it may be the case for themirroring nodes.

The selection is based on a single recommendation value. In the prototype this is theremote node’s pure availability. This is a very simplistic approach, since no other criterialike trust or social relations are taken into account. How to improve this metric is discussed

58

5 Implementation

Figure 5.8: Package NodeSelection - UML Class Diagram

in Chapter 6.

The updateRemoteNodeInformation() method calculates a new recommendation valuefor a remote node, based on recently experienced transactions. If there is no entry for aremote node yet, a new one is created, extending the pool of nodes the set of mirroringnodes is chosen from by selectNodes(). In the prototype, this is based on the availabilityof a node again. However, we will also present an improved value calculation in Chapter6.

Social Graph Package

This package realizes the Social Graph Maintenance module. Implementing the Social-

GraphGem interface, the BasicSocialGraphGem as shown in Figure 5.9 provides function-ality for adding (addSocialRelation()), editing (editSocialRelation()) and deleting(endSocialRelation()) social relations. In our prototype, all these actions are simple XMLmanipulations using the org.w3c.dom package.

Interface Package

Finally, the Interface Package (Figure 5.10) manages the monitoring of the local node’sinterfaces. This is done via the determineActiveInterfaces() method. Currently, thismethod only provides the scanning for IPv4 addressable identifiers by looping over allexisting java.net.NetworkInterfaces. The results are then updated in the DHT. In the

59

5 Implementation

Figure 5.9: Package SocialGraph - UML Class Diagram

60

5 Implementation

future, this package will provide functionality to identify interfaces addressable by othertechnologies like Bluetooth.

Figure 5.10: Package Interface - UML Class Diagram

5.3 Functionality of the GEMSTONE Prototype

With the current implementation of the modules as designed in Chapter 4.4 the first GEM-STONE prototype provides the following functionality:

(1) Decentralized Data Storage. The prototype supports a completely decentralized wayof data storage as required in Chapter 4.2. The look-up directory was implemented byusing FreePastry 2.1. All functionality for decentralized data storage is provided, rangingfrom look-ups, insertions and updates in the DHT to sending, requesting and storing localor remote profiles. Personal data is mirrored at a set of remote nodes, chosen by a specifiedselection algorithm, providing high data availability.

(2) Social Graph Maintenance. In the current implementation it is possible to manipu-late the social graph. This includes adding, editing and deleting of social relations.

(3) Communication. Applications can easily communicate via an ASCII protocol. More-over, an application is able to transparently send arbitrary content to another node runningthe same application. This way, all kinds of applications utilizing social data can alreadybe built on top of the middleware. Applications can also use the built-in instant messagingfunction of GEMSTONE. Communication with other nodes is currently supported on IPv4base.

(4) Extensibility. The prototype can be extended by more sophisticated or entirely new

61

5 Implementation

Gems at any time. Therefore, Gems need to implement the corresponding interface.

5.4 A Demo Application

In addition to the prototype a demo application was built for this thesis. The goal of thisapplication was to test and demonstrate all functionality currently supported by the pro-totype. Moreover, the application should implement the exchange of application contentover GEMSTONE transparent to the middleware itself. A screenshot of the application isshown in Figure 5.11. The Graphical User Interface (GUI) is based on the JAVA Swing API.

Figure 5.11: Screenshot of the demo application

The application provides interactions using all functionality of the BasicApplication-

Gem, including all three ways of manipulating a social graph, confirming and denyingnewly requested social relations, requesting profiles of other users, looking up other usersby their real name, handling the look-up results returned by the middleware, updating thelocal profile, and registering and de-registering of the application.

In addition to that, users of the demo application may exchange transparent content inform of private messages to their contacts as shown in Figure 5.11. The application keepsseveral additional XML files representing a message inbox and outbox as well as a list ofinformation about users going beyond the information contained in the GEMSTONE basic

62

5 Implementation

profile.

5.5 Metrics

The code of our prototype is currently very lightweight and contains only around 3400lines of code (LOC) with the LOC ranging from 55 (Interface Package) to 1157 (ApplicationPackage) in-between packages. There are 31 classes and 210 methods.

The demo application is around 3000 LOC with most of these lines used for the cre-ation of the GUI. To communicate with GEMSTONE and use all functionality providedby the prototype so far, only 350 LOC are needed. Each command - on the side of themiddleware as well as on the side of the application - takes between 8 (EXIT_APP) and 27(LOOKUP_RESULT) LOC, keeping implementation of our protocol easy.

5.6 Deployment Considerations

Creating a middleware that realizes a decentralized OSN always requires an answer to thequestion of how to overcome the drawing power of currently established OSNs like Face-book. As there are 500 million users on Facebook (“basically everyone is on Facebook”) it isdifficult to convince users to start using a new, non-popular platform, in which neither theirfriends participate nor any guarantees for any of the claimed improvements are given. It ishighly unlikely that a cold start requiring users to change from Facebook to GEMSTONEabruptly would be sufficient. Therefore, we propose an approach, in which users can stayon Facebook and use its functionality while already using GEMSTONE. This can be real-ized by the creation of a Facebook wrapper application as depicted in Figure 5.12. Such anapplication would extract personal information from Facebook via the Facebook API andgenerate the GEMSTONE profile including friend relations in a first step. As soon this iscompleted, all other applications running on top of GEMSTONE could use this data. Theapplication would now synchronize all modifications on the social graph with Facebookvia the Facebook API again.

63

5 Implementation

Facebook Wrapper

Application

Fa

ce

bo

ok

AP

I

Application

Facebook

data

availbale

Facebook data

exchange,

bidirectional

Facebook data extraction

and insertion

Figure 5.12: Possible deployment scenario: a Facebook wrapper application

64

6 Node Selection Process

One of the design choices distinguishing GEMSTONE from other works is to mirror dataat multiple nodes and only use the DHT as a look-up directory. Following this choice fordecentralized data storage, one substantial research question arises:

How many mirroring nodes are required to achieve data availability comparable to otherways of decentralized data storage or even centralized systems?

To answer this question we divide it into two separate aspects:

(1) Is it possible to achieve high data availability using GEMSTONE? This questiondeals with the problem of whether or not it is possible to achieve data availability compa-rable to centralized systems in a system with high node churn, limited resources of nodes,no obvious incentive to store data for other nodes and no global knowledge base whereinformation about well suited nodes may be available. In other words: Is it possible tomaximize av(i, MNi), the availability of a node i’s personal data using a set of mirroringnodes MNi in GEMSTONE?

(2) How many mirroring nodes are required to achieve this? If this is possible, themost important follow-up question is: How many mirrors does each node need to achievethis? Can we minimize car = |MNi|, the cardinality of the set of mirroring nodes for eachnode i? The answer to this question has direct implications on the scalability of the system:The less mirrors one node requires, the less data has to be distributed, lowering the totalamount of traffic as well as the needed disk space for mirroring tasks on each node.

This chapter will give answers two both questions by presenting a selection process thatchooses the most suitable nodes for each node, based on only locally available informa-tion. Therefore, the following steps are taken: Firstly, an overview about how other de-centralized OSNs store their data is given, including a discussion of their weaknesses.Secondly, we present our own approach for node selection in GEMSTONE, GEMNOSE.Subsequently, we evaluate our approach and present results after describing the method-ology applied. The chapter ends in a discussion of our results and possible improvements.

65

6 Node Selection Process

6.1 Discussion of the State of the Art

As stated above, multiple works proposing decentralized data storage for an OSN havebeen presented. Each system uses a different approach to achieve high data availability. Inthis section, the approaches of Safebook [10], Persona [2] and PeerSoN [8] will be discussed.

(1) Safebook. In Safebook, data is mirrored at multiple remote nodes as well. However,as already discussed in Chapter 3.1, these mirroring nodes are the innermost nodes of amatryoshka composed of several shells. To guarantee data availability, this requires at leastall the nodes of one path through all shells towards the innermost shell to be available. In[11], the authors aim for 90% data availability, meaning that at any time one in ten profilesis not available to a requester. To achieve this, 13 to 23 nodes are required with only twoshells. As the authors propose a system with at least three shells to maintain anonymity,the number of required mirroring nodes increases. Moreover, Safebook assumes that eachuser is online at any time with a probability of 30%, referencing a study on Skype [14]. Webelieve that user availability is not equally distributed and this assumption is therefore notapplicable.

(2) Persona. As discussed in Chapter 3.3, Persona uses a different approach, in whicheach participant can choose a personal storage space to store his data on. Access to thisstorage space is granted or denied based on an ACL. The storage space is a highly availablenode, most likely a server. The authors do not conduct any studies on data availability, butit can be assumed that if this approach works, availability will be high. Although it seemsto be a well thought approach as it prevents a lot of overhead as well, we think that it isunlikely that all users can provide a reliable storage service to maintain the system. Thiswould only fit the technically affine users of the OSN. As described in 3.3, we believe thatthe realization of this data storage procedure would most likely require payments of users,due to the unattractiveness of unpersonalized advertisement for investors.

(3) PeerSoN. In PeerSoN, nodes store their data at mirroring nodes like in GEMSTONE.In their initial presentation of PeerSoN [8], the authors do not focus on any strategy forthe selection of nodes, but the need for such a strategy is acknowledged. In a recent pub-lication, the authors propose a selection based on cliques with mutual storage agreements[32]. The selection is based purely on availability of the nodes. This availability is assumedto be known and data availability of above 99% is estimated with this approach. However,the simulations base on an availability distribution where 70% of the nodes are availablewith probability p ≥ 0.75 as in [5].

In our opinion this availability assumption does not hold for OSNs: In [5], availability

66

6 Node Selection Process

is distributed based on studies on file sharing systems. However, the use cases for filesharing systems differ from the ones for social networks: File sharing is usually conductedto transfer big files like, e.g., audio or video files. Therefore, file sharing clients usuallyrun in the background for a long period of time (e.g., over night) and can thereby provideavailability for other nodes.

In contrast to this, social networking is usually conducted more intensely in a shortertime span. E.g., “only” 50% of the users of Facebook log on to the OSN on any givenday.1 Note, that this does not imply that 50% of the users log in each day. Also, the onlineperiods are shorter with an average of around seven hours2 up to 24 hours3 per user andmonth, especially when taking mobile phone usage into context.

Also, recent research has shown, that online time in social networks is power-law dis-tributed [4] and users spend the less time in an OSN the longer they have already partic-ipated in the network [15]. Hence, there are a lot of users who are far less available than75%.

6.2 GEMSTONE Node Selection - GEMNOSE

In this section, we present a solution to our research question of how to achieve data avail-ability close to 100% with a minimum of mirroring nodes. Our middleware provides statis-tics that can be used in this matter: Each node maintains a list of known nodes that includessocially related nodes as well as nodes that are known for other reasons: In particular, eachnode keeps track of its transactions to any other node. Hence, nodes mirroring a requestedprofile are known to the requesting node. Furthermore, when communicating with eachother, the local node transmits a locally computed availability value to the correspondingnode. Now, when trying to efficiently select the set of mirroring nodes MN from this list,several options may be pursued:

(1) Random Selection of Nodes. A naive approach is to select nodes randomly fromthe list of mirroring nodes. With availability information of all nodes in the list available,the local node can estimate how many nodes are required to achieve a certain target errorrate errtar. This rate describes the probability of a profile not being available consideringa set of mirroring nodes. We assume that each node wants to achieve at least 99% data

1http://www.facebook.com/press/info.php?statistics, retrieved Nov. 8th 20102http://mashable.com/2010/02/16/facebook-nielsen-stats/, retrieved Nov. 8th 20103http://blog.escherman.com/2010/08/16/does-the-average-uk-facebook-user-spend-24-hours-per-month-

on-the-site/, retrieved Nov. 8th 2010

67

6 Node Selection Process

availability and therefore computes its mirroring nodes towards an expected error rateerrexp ≤ errtar = 0.01. The random selection process serves as lower bound for any otherapproach, as it should be easy to outperform this random selection.

(2) Selection Based on Availability. Based on (1) another option is to select the highestavailable nodes to mirror the local node’s data. Therefore, the list of mirroring nodes issorted by their availability value av and the top entries are chosen until errexp is belowerrtar.

(3) Selection Based on Social Relation. Moreover, it is possible to select nodes based ontheir social relation sr to i. An incentive to store the data would be provided to the remotenode j, if there is a social relation between i and j.

(4) Selection Based on User Experience. Lastly, taking user experience into contextcould decrease the cardinality car = |MNi| for each node i and provide a minimized errexp.The term user experience ue describes the trust i has into a remote node j, independent

from that node’s availability. Hence, uej =tsuccj

ttotalj

with tsuccj and ttotalj being the successful

and total transactions of the local node i with a remote node j.

It is obvious that a selection based on social relations alone is insufficient because sociallyrelated nodes are not always highly available nodes. On the other hand, it is not feasibleto account for availability only, since such nodes often do not have an incentive to storedata of unrelated nodes, especially if their disk storage fills up quickly. Also, despite beinghighly available, a node might only provide very limited disk space to store the profilesof remote nodes. Both values are combined when considering user experience: uej will behigh for i if j is highly available to i in particular (this is of importance when consideringdiurnal patterns of node availability, see below) and does not drop i’s profile which is thecase if j has an incentive to store the data, i.e., there is a social relation between i and j.

However, availability and social relation factors are important in the startup phase of thesystem, in which a node has not established reliable values of uen for each node n in his listof nodes. Thus, in GEMNOSE, MN is selected based on a recommendation value rv with

rvj = α · avj + β · uej + γ · srj (6.1)

with α,β and γ being the weight attached to each parameter. In addition to that, nodesadd a previously (in terms of ue) unrated node to MN with a probability of 10%. Thisimplements zero-trust selection as introduced in [19]. Without zero-trust, nodes wouldselect the same nodes constantly and never consider nodes they have no experience with,

68

6 Node Selection Process

and thereby not increase their knowledge about any nodes besides their most trusted ones.

6.3 Evaluation

In the following, an estimation of the parameter weighting of α, β and γ for the Node Selec-tion Formula 6.1 that minimizes errexp and car is determined by simulating GEMSTONEusing GEMNOSE with different parameters.

6.3.1 Methodology

We base this simulation of GEMNOSE on five different social graphs of 10000 vertices(users). Since it has been shown that scale-free small-world networks like social networksfollow a power-law distribution of relations between users [23][1], we use GTGraph [3].Providing an R-MAT [9] implementation, this suite is able to create power-law distributedsocial graphs. This implies that there are few users with a lot of social relations, whilemany users only keep a few of these relations. On average, each user has 130 social rela-tions, which is the current average of a Facebook user’s friends.4 There are however userswithout friends in the resulting graph, who represent new users joining the system.

User online time in OSNs is also power-law distributed [15][4] rather than uniformlydistributed as assumed in [11]. There are few users that are online almost constantly whilemost users are online with low probability, opposing the assumptions of [32]. Therefore,node availability in our system is power-law distributed as shown in Figure 6.1. Notethat around two thirds of users are available less than 20% of the time in our simulation.Furthermore, there is no relation between the number of friends a user has and his onlinetime: A user with a lot of friends is not automatically more available than a user with lessfriends - in fact, users become less active as their number of friends increases in an OSN[15]. Node availability is determined probabilistically based on each node’s availability,diurnal patterns are not considered in our simulation.

Moreover, each node only has limited storage to mirror profiles for other nodes. InGEMSTONE, a basic profile with 130 social relations is currently of less than 8kb size,consisting of 130 social relations · (40 bytes SHA1 hash + 13 bytes XML data) plus around1kb profile information. We therefore use a normal distribution for the storage facilities of

4http://www.facebook.com/press/info.php?statistics, retrieved Nov. 8th 2010

69

6 Node Selection Process

20 30 40 50 60 70 80 90 1000

200

400

600

800

1000

1200

Node Availability in Percent

Num

ber

of N

odes

Figure 6.1: Node availability distribution

a node with a mean value of 50 profiles. This is a reasonable amount even for nodes withlow resources like mobile devices, taking less than half a megabyte of disk space. If thestorage space fills up, nodes drop profiles of remote nodes with a social drop policy. Thus,profiles of socially unrelated nodes are dropped first in a first in, first out (FIFO) fashion. Ifall stored profiles are of nodes socially related to the local node, profiles are dropped FIFOstyle as well. Again, there is no relation between the availability of a node and its storagespace. Therefore, in the worst-case scenario, the most available node might only have verylimited storage space.

In the simulation, our system performs a cold start: Only socially related nodes arestored in each node’s information database, the node table nt (a list of all simulation pa-rameters can be found in Table 6.1). However, a node i learns about other nodes over timewith learning rate lri. This happens on multiple occasions: As a node looks up anothernode in the DHT, it learns about the mirroring nodes of that node. Also, while jumpingthrough the DHT itself, the node learns about the socialIDs of nodes it comes across.Nodes have different learning rates in our system, depending on the node activity: Themore active the node is in the OSN, the more information about other nodes is gathered.In our simulation, the most active nodes discover four new nodes per iteration (lr = 4)while the least active nodes learn about one new node every two iterations (lr = 1/2).Node activity is gaussian distributed.

70

6 Node Selection Process

Parameter Descriptionerrtar The target error rate. We assume that each node wants to achieve at least

99% data availability, therefore errtar = 0.01errexp The expected error rate. Each node computes this value until

errexp ≤ errtar

lri The learning rate, with which a node i discovers new nodes in the systemMNi The set of mirroring nodes of the local node icar The cardinality of the set of mirroring nodes, arr = |MNi|nti The node table of a local node i. This is the information database each

node maintains, storing knowledge about other nodes.avj The node availability of a remote node j

uej The user experience value the local node has built for node j, uej =tsuccj

ttotalj

srj The social relation value of a remote node j. 1, if there is a social relationbetween the local node and j, 0 else

rvj The calculated recommendation value of a remote node jTable 6.1: List of Parameters for Node Selection

To obtain the availability value of each node, nodes exchange this value as they get toknow each other. The availability value is locally calculated by each node. The entry uejis 0 for each remote node j at the beginning of the simulation and 0 for each newly knownnode as well. The value of a social relation srj is obtained from the local social graph of anode and 1, if there is a social relation existing between i and j, 0 otherwise.

In order to adjust uen for each node n in the node table, the local node keeps track of alltransactions with n, including transactions in the storage process and during the retrievalof profiles.

6.3.2 Finding a Parameter Weighting

In a first step to find a parameter weighting to minimize errexp and car, we compare theperformance of different weightings of av and ue to calculate rv. As shown in Figure 6.2we observe that for all scenarios data availability is only around 80% at the beginning.

71

6 Node Selection Process

This is caused by the power-law distribution of social relations in our graphs, where manynodes only maintain very few social relations or even join the system without any relation(cold start). As nt in the beginning consists of socially related nodes only, this distributionresults in high data losses. However, as nodes add information to their node table, dataavailability improves quickly. Furthermore, our learning model is very conservative, aseven the most active nodes in the OSN learn about four nodes only each iteration, whichis the number of nodes each node encounters with just one look-up in the DHT (log(n)).

0 50 100 150 2000.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Time t

Data

Availa

bili

ty

AV 0.8:0.2 UE

AV 0.7:0.3 UE

AV 0.5:0.5 UE

AV 0.3:0.7 UE

AV 0.2:0.8 UE

AV 0.1:0.9 UE

Figure 6.2: Data availability by parameter weighting

Another observation is that all parameter weightings achieve above 99% availability af-ter around 100 time steps. One time step is equal to one learning cycle in which eachnode i learns about other nodes according to lri. However, after some time, availabilitydecreases quickly for weightings, that favour availability considerably. This is because themost available nodes get well known after some time and an increasing number of nodeswill store their profiles on these highly available nodes. This can also be concluded fromFigure 6.3: As nodes increase their knowledge of the network and have information aboutthe most available nodes at hand, the number of replicas they distribute decreases, becauseeach node believes that these highly available nodes are sufficient. However, even the mostavailable nodes only provide limited diskspace. Due to this, more profiles are dropped asknowledge spreads, which results in less data availability.

Therefore, an oversized α will result in low data availability, while an accurate weightingof β decreases the error rate. However, av has to be taken into context in some way at the

72

6 Node Selection Process

20 40 60 80 100 120 140 160 180

4

6

8

10

12

14

16

18

20

22

24

Time t

Nu

mb

er

of

Mirro

rin

g N

od

es

AV 0.8:0.2 UE

AV 0.7:0.3 UE

AV 0.5:0.5 UE

AV 0.3:0.7 UE

AV 0.2:0.8 UE

AV 0.1:0.9 UE

Figure 6.3: Average number of mirroring nodes by parameter weighting

time a node builds its node table since uej is zero at that point for each node j in i’s nodetable. While not considering the social relation parameter in this step, the values α = 0.1and β = 0.9 are chosen to calculate

rvj = α · avj + β · uej (6.2)

based on the simulation results.

Addition of a Social Relation Filter

In a next step, we add a social relation filter sr to evaluate whether or not choosing asocially related node over an unrelated node improves our system and calculate

rvj = α · avj + β · uej + γ · srj (6.3)

with α = 0.1, β = 0.7 and γ = 0.2. We now compare the results to using only a combinationof availability and user experience.

Figure 6.4 provides several insights: Firstly, the introduction of sr leads to a faster con-vergence than using av and ue alone. Secondly, we can observe a decrease of data avail-ability for each parameter weighting at some time in the simulation. This is again due to

73

6 Node Selection Process

0 200 400 600 800 10000.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Time t

Data

Availa

bili

ty

AV: 0.1, UE: 0.9, SR: 0.0

AV: 0.2, UE: 0.8, SR: 0.0

AV: 0.1, UE: 0.7, SR: 0.2

Figure 6.4: Data availability introducing srj with weight γ = 0.2

dropped profiles as soon as the buffers of the best rated nodes overflow. However, thedecrease is only temporary and less distinct the more weighted ue is. Furthermore, witha high weighting of ue, data availability increases again after some time, as nodes adjusttheir values of uej for each node j in their node table. Lastly, although converging slower,the parameter weighting from Formula 6.2 catches up to the weighting of Formula 6.3.

The average number of mirroring nodes required to achieve this data availability isshown in Figure 6.5. With the addition of sr to the calculation of rv, car is minimized to 6.5nodes after 1000 steps as shown in a close-up in Figure 6.7. This value will converge to-wards the optimum with time passing, as uej for each node j in i’s node table becomes moreprecise. Also, the introduction of sr reduces the number of required replicas in the systemduring the time each node builds its node table, comparing to the weighting consideringavailability and user experience only. Figure 6.6 also shows, that, by using this approach,we are able to achieve 99.5% data availability with a standard deviation of about 0.42% fordata availability and 0.43 nodes for the average number of mirrors for all graphs.

Therefore, we are able to achieve a higher data availability while using less replicas inthe system compared to Safebook, in which 90% availability is achieved by using 13-23replicas. Moreover, our results are almost comparable to PeerSoN [32] without assum-ing unrealistic availability distributions. If we assume the same availability distribution(that is: 10% of nodes 95% available, 25% - 87% available, 30% - 75% available, 30% - 33%

74

6 Node Selection Process

0 200 400 600 800 10005

10

15

20

25

Time t

Nu

mb

er

of

Mirro

rin

g N

od

es

AV: 0.1, UE: 0.9, SR: 0.0

AV: 0.2, UE: 0.8, SR: 0.0

AV: 0.1, UE: 0.7, SR: 0.2

Figure 6.5: Mirroring nodes by parameter weighting

available) GEMSTONE works with almost zero losses and only four mirroring nodes onaverage, as shown in Figure 6.8 and Figure 6.9 respectively.

However, adding sr comes with a cost. Figure 6.10 shows that by adding sr to the calcula-tion of rv the number of dropped profiles increases. A dropped profile is a profile that wasdeleted by a mirroring node based on its drop policy in case of a storage buffer overflow.The increase is due to the power-law distribution of social relations again. Nodes with themost social relations will be required to store a huge number of profiles. This number willquickly exceed their storage capabilities and therefore result in a higher dropping rate.

This dropping rate is also determined by the drop policy each node pursues: As de-scribed above, in our system nodes follow a socially related drop policy, in which profilesof unrelated nodes are dropped with priority. FIFO style dropping is only performed ifall stored profiles are of socially related nodes. Figure 6.11 shows a comparison of this ap-proach with pure FIFO style dropping. After performing similarly at the beginning, socialdropping performs better as soon as the buffers of the best rated nodes fill up. This is dueto nodes selecting socially related nodes to store their data on with higher probability.

75

6 Node Selection Process

600 650 700 750 800 850 900 9500.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

Time t

Data

Availa

bili

ty

AV: 0.1, UE: 0.9, SR: 0.0

AV: 0.2, UE: 0.8, SR: 0.0

AV: 0.1, UE: 0.7, SR: 0.2

Figure 6.6: Close-up view of data availability, α = 0.1,β = 0.7,γ = 0.2

600 650 700 750 800 850 900 9505

5.5

6

6.5

7

7.5

8

8.5

9

9.5

10

Time t

Nu

mb

er

of

Mirro

rin

g N

od

es

AV: 0.1, UE: 0.9, SR: 0.0

AV: 0.2, UE: 0.8, SR: 0.0

AV: 0.1, UE: 0.7, SR: 0.2

Figure 6.7: Close-up view of the average number of mirroring nodes, α = 0.1,β = 0.7,γ = 0.2

76

6 Node Selection Process

20 40 60 80 100 120 140 160 1800.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

1

Time t

Data

Availa

bili

ty

Figure 6.8: Data availability with a more optimistic availability assumption, and α = 0.1,β = 0.7,γ = 0.2

10 20 30 40 50 60 70 80 904

5

6

7

8

9

10

11

12

13

Time t

Num

ber

of M

irro

ring N

odes

Figure 6.9: Average number of mirroring nodes with a more optimistic availability assumption, α = 0.1,β = 0.7,γ =0.2

77

6 Node Selection Process

0 200 400 600 800 10000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Time t

Perc

. of dro

pped P

rofile

s

AV: 0.1, UE: 0.9, SR: 0.0

AV: 0.2, UE: 0.8, SR: 0.0

AV: 0.1, UE: 0.7, SR: 0.2

Figure 6.10: Number of dropped profiles by parameter weighting

0 50 100 150 200 250 3000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Time t

Perc

enta

ge o

f D

ropped P

rofile

s

Social Dropping

FIFO Dropping

Figure 6.11: Number of dropped profiles by drop policy

78

6 Node Selection Process

6.3.3 GEMNOSE Availability Evaluation

Finally, we conducted four different experiments, comparing our approach to differentstrategies: First, we select nodes randomly. In a second scenario, we calculate rv based onpure node availability. Afterwards, we add user experience as a factor to the calculationof rv with our weighting determined above. Lastly, we add a social relation filter to thecalculation of rv, as proposed with GEMNOSE. Each scenario is simulated for the fivedifferent social graphs.

Figure 6.12 shows the data availability for each scenario: For random selection, roughly96% data availability are achieved. When selecting based on node availability alone, dataavailability is roughly 97% at most, but dropping rapidly after around 80 rounds to aneventually very low 40% afterwards due to overflowing buffers at the most available nodesas knowledge spreads throughout the system. In contrast to this, selection based on userexperience achieves high availability of above 99% which is maintained throughout thewhole simulation. Also, the addition of sr leads to a slightly quicker convergence andslightly higher data availability in the early stages of the simulation.

0 100 200 300 400 500 600 700 800 900

0.4

0.5

0.6

0.7

0.8

0.9

1

Time t

Da

ta A

va

ilab

ility

Random

Availability

User Experience

Socially Filtered

Figure 6.12: Data availability by strategy

Figure 6.13 shows the average number of mirroring nodes that is required to reach theavailability in each scenario. When selecting randomly, the average number of mirroringnodes increases to 12 nodes after about 40 rounds. All other approaches need less mirror-

79

6 Node Selection Process

ing nodes, with a minimum of four nodes for the scenario of selecting nodes on availabilityalone. This is because as each node increases its knowledge about the system less nodes arerequired to achieve high availability from each node’s local point of view. However, whenconsidering the global view, this leads to low data availability. Both the user experiencescenario and the social relation filter scenario require a large number of mirroring nodesin the beginning. This number is decreased constantly as the values of user experiencebecome more precise.

0 200 400 600 800 10000

5

10

15

20

25

Time t

Nu

mb

er

of

Mirro

rin

g N

od

es

Random

Availability

User Experience

Socially Filtered

Figure 6.13: Average number of mirroring nodes by strategy

Based on these results we conclude the following: Node selection based on availabilityalone only works, if the most available nodes have unlimited storage. Otherwise, dataavailability will drop far below the values of the random selection scenario. Our approachof relying mainly on user experience with favouring a socially related node to a unrelatednode if the value of ue is comparable, constantly achieves high data availability. Further-more, the number of replicas in the system is reduced as time progresses and the nodetables become more accurate.

Moreover, our system is fair with regards to nodes with only few social relations: InSafebook [10], a node with very few contacts might be unable to build its Matryoshkaproperly, as it may not be able to provide enough nodes for the innermost shell. Also, theless available a user’s friends are, the less available a path through the shells and there-fore the user’s data is. In contrast to this, GEMSTONE only prefers socially related nodes

80

6 Node Selection Process

as mirrors if they have a comparable user experience value and thereby neutralizing thedependency on the ’quality’ of a node’s social relations.

Using our system, we can additionally improve scalability by reducing traffic as nodesstore their data on very few nodes only. Traffic is further reduced as updates for a notavailable node have to be sent to very few nodes as well. This also results in a reducedload on the storage space of the participating nodes, as they need to store less profiles andupdates.

6.3.4 Possible Improvements

Although our approach already provides good results, there is a lot of room for improve-ment. Firstly, considering diurnal patterns instead of a probabilistic node availabilitymodel in our simulation would decrease the number of mirroring nodes. This is becausenodes with the same diurnal patterns would share higher values of ue, as they are availableto each other with higher probability than to a node with a different diurnal pattern.

Secondly, our estimation of the learning rate is very conservative. As described above,the most active nodes currently learn of four nodes per time step only. This is less thanone look-up for a remote node j in the DHT would achieve (log(10000) with DHT hoppingplus |MNj|, the mirroring nodes of j). E.g., considering an application like the Facebooknews feed, a node would conduct several look-ups in the DHT to get the most recent statusmessages when just viewing the feed. Also, as nodes are most active directly after creatingan account in an OSN [15], it would be reasonable to assume a higher learning rate in thebeginning, which would decrease convergence time.

Thirdly, our assumptions on the social graph of the OSN are very pessimistic: Currently,almost 20% of the nodes have five or less friends in the beginning, with over 10% havingzero friends.

Moreover, we assumed that each node provides only very limited space (< 500kb) tostore profiles for other nodes. If this space is increased, our system is likely to achievehigher data availability as well.

Furthermore, we did not account for any altruistic users, contributing to the system byproviding nodes with high availability and high resources. As the existence of such nodespropagates through the network, availability will increase while the number of replicadecreases. The main problem right now is low data availability when a node joins thesystem and has very few knowledge about other nodes. Our selection strategy can be

81

6 Node Selection Process

improved to achieve faster convergence for these nodes by, e.g., providing a list of suchaltruistic nodes when distributing GEMSTONE. The nodes in that list could enforce a strictpolicy to allow mirroring only for new nodes to prevent droppings of profiles due to ashortage of resources, if needed.

Finally, bandwidth is not considered in our simulations. This is of importance as nodeswith low bandwidth but high rating will store a lot of profiles with our current approach.This will reduce the overall system performance due to a great number of requests towardsthese nodes. Therefore, the impact of such nodes has to be evaluated. To limit this influ-ence, another parameter cap, denoting the capabilities of a node, may be introduced intothe node selection formula.

82

7 Conclusion and Outlook

This work introduced the GEMSTONE project as a realization of a peer-to-peer online so-cial networking system, alleviating the deficiencies of centralized OSNs. A generic mid-dleware that provides decentralized data storage, manipulations on the social graph andcommunication with other nodes was designed and implemented as a foundation for fur-ther research. Moreover, the middleware is able to communicate with any kind of appli-cation via a powerful and easily understandable ASCII protocol, offering an interface toapplication developers. With the prototype being extensible due to its modular design, theintroduction of more sophisticated modules is easily possible, enabling analysis on socialnetworks after the deployment of GEMSTONE.

Also, with the portation to the Android platform in progress, the middleware may bedeployed on mobile devices as well, making mobile social networking possible. In an-other step, traces of user movements or actions within the network may be collected in ananonymized fashion for further research on user behaviour in social networks.

Parallel to such considerations additional work has to be conducted: Firstly, the proto-type needs to be improved in terms of security and privacy. Initial ideas on how to realizeencryption and thereby finely granulated access to personal user data in GEMSTONE havebeen presented, although their implementation was out of the scope of this thesis: By com-bining attribute based encryption with traditional public-key cryptography both goals canbe achieved. Secondly, the integration of technologies supporting delay tolerant network-ing for mobile devices has to be pursued. Additionally, a more sophisticated demo appli-cation offering real social networking and an appealing user interface has to be providedin order to attract users as well as application developers from existing OSNs. Besides this,the prototype has to be evaluated in terms of bandwidth and power consumption as wellas scalability with more than a few nodes in a testbed. This is especially of importancewhen considering mobile devices with confined capabilities.

Alongside the prototype implementation, the contribution of this thesis is a node selec-tion process that selects the most suitable nodes for decentralized data storage in GEM-STONE. By applying a simple trust model that computes values based on previous expe-

83

7 Conclusion and Outlook

rience with remote nodes, data availability of above 99% is achieved in our system. Thisimproves availability by about 10% compared to Safebook [10] and performs equally wellcompared to a recent PeerSoN approach [32] without assuming node availability distribu-tions that do not match user behaviour in social networking.

Furthermore, the node selection process of GEMSTONE is able to reduce the numberof replicas required in the system by more than half in comparison to Safebook even witha more pessimistic node availability distribution. As nodes become more available as inPeerSoN it was shown that GEMSTONE is able to achieve data availability of almost 100%with even fewer replicas of the data in the system.

Moreover, our evaluation of the node selection process is based on a very pessimisticassumption, especially with neglecting diurnal patterns and considering a very low learn-ing rate of nodes. In future work, the effect of introducing patterns of user availability incontrast to our probabilistic model as of now will be studied. It is likely that with takingsuch patterns into context, data availability will be increased since user experience valuesfor nodes with similar online schedules will be more precise.

Also, we will try to find a more appropriate learning rate that fits user activity in socialnetworks. With a higher learning rate especially immediately after joining the network,nodes increase their knowledge of the network faster, most probably decreasing the con-vergence time of our approach. However, the impact of a faster learning curve is unclearand needs to be researched.

Finally, another interesting research option is to evaluate our approach on real data setsfrom existing OSNs. Right now, we base our evaluation on randomly generated socialgraphs that follow a typical distribution of social relations for small-world scenarios likesocial networks. Still, real data sets could provide different results due to peculiarities (e.g.,addressing only business users) of different existing OSNs.

Overall, this thesis presents an option on how to take the long needed step away fromcentralized OSNs by designing and implementing GEMSTONE, a decentralized platformfor online social networking that can be deployed to mobile devices as well, while main-taining high data availability and therefore mitigating one of the most substantial problemsthat arise when eliminating the centralized structure of current OSNs.

84

Bibliography

[1] AIELLO, W., AND CHUNG, F. Random Evolution in Massive Graphs. In FOCS ’01:Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (Washing-ton, DC, USA, 2001), IEEE Computer Society, p. 510.

[2] BADEN, R., BENDER, A., SPRING, N., BHATTACHARJEE, B., AND STARIN, D. Per-sona: An Online Social Network with User-defined Privacy. In SIGCOMM ’09: Pro-ceedings of the ACM SIGCOMM 2009 Conference on Data communication (New York,NY, USA, 2009), ACM, pp. 135–146.

[3] BADER, D. A., AND MADDURI, K. GTGraph: A Synthetic Graph Generator Suite,2006.

[4] BENEVENUTO, F., RODRIGUES, T., CHA, M., AND ALMEIDA, V. Characterizing UserBehavior in Online Social Networks. In Proceedings of the 9th ACM Internet Measure-ment Conference (IMC) (New York, NY, USA, November 2009), ACM, pp. 49–62.

[5] BERNARD, S., AND LE FESSANT, F. Optimizing Peer-to-Peer Backup Using LifetimeEstimations. In Proceedings of the 2009 EDBT/ICDT Workshops (New York, NY, USA,2009), ACM, pp. 26–33.

[6] BETHENCOURT, J., SAHAI, A., AND WATERS, B. Ciphertext-Policy Attribute-BasedEncryption. In SP ’07: Proceedings of the 2007 IEEE Symposium on Security and Privacy(Washington, DC, USA, 2007), IEEE Computer Society, pp. 321–334.

[7] BUCHEGGER, S., AND DATTA, A. A Case for P2P Infrastructure for Social Networks- Opportunities and Challenges. In Proceedings of the Sixth International Conference onWireless On-Demand Network Systems and Services WONS 2009 (2009), pp. 161–168.

[8] BUCHEGGER, S., SCHIÖBERG, D., VU, L.-H., AND DATTA, A. PeerSoN: P2P SocialNetworking: Early Experiences and Insights. In SNS ’09: Proceedings of the SecondACM EuroSys Workshop on Social Network Systems (New York, NY, USA, 2009), ACM,pp. 46–52.

[9] CHAKRABARTI, D., ZHAN, Y., AND FALOUTSOS, C. R-MAT: A Recursive Model forGraph Mining. In Fourth SIAM International Conference on Data Mining (April 2004).

85

Bibliography

[10] CUTILLO, L., MOLVA, R., AND STRUFE, T. Safebook: A Privacy-preserving OnlineSocial Network Leveraging on Real-life Trust. Communications Magazine, IEEE 47, 12(2009), 94–101.

[11] CUTILLO, L., MOLVA, R., AND STRUFE, T. Safebook: Feasibility of Transitive Coop-eration for Privacy on a Decentralized Social Network. In Proceedings of the IEEE In-ternational Symposium on a World of Wireless, Mobile and Multimedia Networks & Work-shops WoWMoM 2009 (2009), pp. 1–6.

[12] FALL, K. A Delay-tolerant Network Architecture for Challenged Internets. In SIG-COMM ’03: Proceedings of the 2003 Conference on Applications, Technologies, Architec-tures, and Protocols for Computer Communications (New York, NY, USA, 2003), ACM,pp. 27–34.

[13] FELDMAN, M., PAPADIMITRIOU, C., CHUANG, J., AND STOICA, I. Free-Riding andWhitewashing in Peer-to-Peer Systems. IEEE Journal on Selected Areas in Communica-tions 24, 5 (May 2006), 1010–1019.

[14] GUHA, S., DASWANI, N., AND JAIN, R. An Experimental Study of the Skype Peer-to-Peer VoIP System, February 2006.

[15] GYARMATI, L., AND TRINH, T. A. Measuring User Behavior in Online Social Net-works. IEEE Network 24, 5 (September 2010), 26 –31.

[16] HSIEH, C.-T., LIANG, C.-M., AND CHOU, S.-C. Personalized Advertising Strategyfor Integrated Social Networking Websites. In WI-IAT ’08: Proceedings of the 2008IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Tech-nology (Washington, DC, USA, 2008), IEEE Computer Society, pp. 369–372.

[17] HUA CHU, Y. Considering Altruism in Peer-to-Peer Internet Streaming Broadcast. InProceedings of ACM NOSSDAV (2004), ACM Press, pp. 10–15.

[18] HUI, P., CROWCROFT, J., AND YONEKI, E. BUBBLE Rap: Social-based Forwardingin Delay Tolerant Networks. In Proceedings of ACM MobiHoc (2008).

[19] KAMVAR, S. D., SCHLOSSER, M. T., AND MOLINA, H. G. The Eigentrust Algo-rithm for Reputation Management in P2P Networks. In WWW ’03: Proceedings ofthe 12th International Conference on World Wide Web (New York, NY, USA, 2003), ACM,pp. 640–651.

[20] KUROSE, J. F., AND ROSS, K. W. Computer Networking: A Top-Down Approach, 5th ed.Addison-Wesley Publishing Company, USA, 2009.

86

Bibliography

[21] LINDGREN, A., DORIA, A., AND SCHELÉN, O. Probabilistic Routing in Intermit-tently Connected Networks. Lecture Notes in Computer Science 3126 (January 2004),239–254.

[22] MAYMOUNKOV, P., AND MAZIÈRES, D. Kademlia: A Peer-to-Peer Information Sys-tem Based on the XOR Metric. In IPTPS ’01: Revised Papers from the First InternationalWorkshop on Peer-to-Peer Systems (London, UK, 2002), Springer-Verlag, pp. 53–65.

[23] NEWMAN, M. E. J. The Structure and Function of Complex Networks. SIAM RE-VIEW 45 (2003), 167–256.

[24] NGUYEN, H. A., GIORDANO, S., AND PUIATTI, A. Probabilistic Routing Protocolfor Intermittently Connected Mobile Ad hoc Network (PROPICMAN). In Proceed-ings of the IEEE International Symposium on a World of Wireless, Mobile and MultimediaNetworks WoWMoM 2007 (2007), pp. 1–6.

[25] NORDSTRÖM, E., GUNNINGBERG, P., AND ROHNER, C. Haggle: A Data-centric Net-work Architecture for Mobile Devices. In Proceedings of the 2009 MobiHoc S3 Workshopon MobiHoc S3 (2009).

[26] PELUSI, L., PASSARELLA, A., AND CONTI, M. Opportunistic Networking: Data For-warding in Disconnected Mobile Ad Hoc Networks. IEEE Communications Magazine44, 11 (November 2006), 134–141.

[27] PIETILÄINEN, A.-K., OLIVER, E., LEBRUN, J., VARGHESE, G., AND DIOT, C. Mo-biClique: Middleware for Mobile Social Networking. In WOSN ’09: Proceedings ofthe 2nd ACM Workshop on Online Social Networks (New York, NY, USA, 2009), ACM,pp. 49–54.

[28] PINGDOM. Social Network Downtime in 2008, September 2009.

[29] RATNASAMY, S., FRANCIS, P., HANDLEY, M., KARP, R., AND SHENKER, S. A Scal-able Content-Addressable Network. In SIGCOMM ’01: Proceedings of the 2001 Confer-ence on Applications, Technologies, Architectures, and Protocols for Computer Communica-tions (New York, NY, USA, 2001), ACM, pp. 161–172.

[30] RHEA, S., GODFREY, B., KARP, B., KUBIATOWICZ, J., RATNASAMY, S., SHENKER,S., STOICA, I., AND YU, H. OpenDHT: A public DHT Service and its Uses. In SIG-COMM ’05: Proceedings of the 2005 conference on Applications, Technologies, Architec-tures, and Protocols for Computer Communications (New York, NY, USA, 2005), ACM,pp. 73–84.

87

Bibliography

[31] ROWSTRON, A., AND DRUSCHEL, P. Pastry: Scalable, Decentralized Object Location,and Routing for Large-Scale Peer-to-Peer Systems. Lecture Notes in Computer Science2218 (2001), 329–350.

[32] RZADCA, K., DATTA, A., AND BUCHEGGER, S. Replica Placement in P2P Storage:Complexity and Game Theoretic Analyses. In IEEE 30th International Conference onDistributed Computing Systems (ICDCS) (June 2010), pp. 599 –609.

[33] STOICA, I., MORRIS, R., KARGER, D., KAASHOEK, F. M., AND BALAKRISHNAN, H.Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. SIGCOMM’01: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, andProtocols for Computer Communications 31, 4 (October 2001), 149–160.

[34] ZHU, B., JAJODIA, S., AND KANKANHALLI, M. S. Building Trust in Peer-to-Peersystems: A Review. International Journal of Security and Networks 1, 1/2 (2006), 103–112.

88