Rtp Over Tls

Georg-August-UniversitätGöttingenZentrum für Informatik

ISSN 1612-6793Nummer GAUG-ZFI-BM-2007-28

Masterarbeitim Studiengang "Angewandte Informatik"

RTP over Datagram TLS

John-Patrick Wowra

Computer Networks Group

Bachelor- und Masterarbeitendes Zentrums für Informatik

an der Georg-August-Universität Göttingen

17. September 2007

Georg-August-Universität GöttingenZentrum für Informatik

Lotzestraße 16-1837083 GöttingenGermany

Tel. +49 (5 51) 39-1 44 14

Fax +49 (5 51) 39-1 44 15

Email [email protected]

WWW www.informatik.uni-goettingen.de

Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keineanderen als die angegebenen Quellen und Hilfsmittel verwendet habe.

Göttingen, den 17. September 2007

Masterarbeit

RTP over Datagram TLS

John-Patrick Wowra

17. September 2007

Betreut durch Prof. Dr. FuComputer Networks Group

Georg-August-Universität Göttingen

AcknowledgementI would like to acknowledge my advisor Prof. Dr. Xiaoming Fu for excellent guidance,motivation and encouragement, my parents and Katerina for their support and ChristianDickmann for his patience and helpfulness.

Abstract

The popularity of Internet Telephony has been rising continuously in recent years. Witha rising number of users inevitably the number of malicious users rises as well. Hencesecurity is a major concern for Internet Telephony.Commonly RTP is used with Internet Telephony for transmission and reception of audioand video data. Traditionally, RTP runs over UDP, and RTP traffic is in most cases trans-mitted without any protection.Datagram TLS is a modified version of TLS that functions properly over datagram trans-port. This thesis studies an RTP extension based on DTLS, and includes conduction of aprototype implementation and further analysis of the design towards securing RTP andthus Internet Telephony.

4

Contents

1 Introduction 81.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Background 122.1 Voice over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Real Time Transport Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 SSL/TLS and DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Session Initiation Protocol SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Related Work 283.1 Security in VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.1 Internet Protocol Security, IPsec . . . . . . . . . . . . . . . . . . . . . . 293.1.2 Comparison between IPsec and DTLS . . . . . . . . . . . . . . . . . . 32

3.2 Secure Real Time TransportProtocol . . . . . . . . . . . . . . . . . . . . . . . 33

4 Security Considerations for VoIP 354.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 Confidentiality in VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.2 Availability in VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Threats and Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 RTP over DTLS 395.1 Introduction to RTP over DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1.1 SRTP Compatibility Mode . . . . . . . . . . . . . . . . . . . . . . . . . 405.1.2 Packet size Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 415.1.3 Security Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Implementation Design 426.1 Analysis of Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2 System Idea/Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2.1 DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5

Contents

6.2.2 RTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2.3 SIP Softphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3 RTP over DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.4 Choice of Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.4.1 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.4.2 CCRTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.4.3 Twinkle Softphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7 Design Details 497.1 Design Components: RTP - ccRTP, DTLS - OpenSSL and SIP - Twinkle . . . 49

7.1.1 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.1.2 Socket Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507.1.3 Session Initialisation with ccRTP . . . . . . . . . . . . . . . . . . . . . 507.1.4 Sending Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.1.5 Receiving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.1.6 Closing Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527.1.7 Types of Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.2 SIP Session Initiation with Twinkle . . . . . . . . . . . . . . . . . . . . . . . . 527.3 Implementation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537.4 Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.5 Problems and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

8 Testing 578.1 Testing Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578.2 Testbed Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.3 Measurement Methods and Tools . . . . . . . . . . . . . . . . . . . . . . . . . 588.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.5 Standard RTP Packet Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.6 RTP over DTLS Packet Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 638.7 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648.8 Test Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

9 Conclusion and Future Work 669.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669.2 Future Work and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Bibliography 69

6

List of Figures

2.1 Strukture of an RTP packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Schematic representation of the SSL handshake protocol with two way au-

thentication with certificates [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 DTLS in the TCP/IP stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 DTLS packet struckture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 DTLS state machine [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Initialisation of a SIP session . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 IPsec in the TCP/IP stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Structure of an IPsec packet with AH . . . . . . . . . . . . . . . . . . . . . . . 313.3 Structure of an IPsec packet with ESP . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Struckture of an RTP packet sent over DTLS . . . . . . . . . . . . . . . . . . . 40

7.1 Implementation status after phase 1 . . . . . . . . . . . . . . . . . . . . . . . 537.2 Implementation status after phase 2 . . . . . . . . . . . . . . . . . . . . . . . 547.3 Implementation status after phase 3 . . . . . . . . . . . . . . . . . . . . . . . 547.4 RTP over DTLS class structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

8.1 Testbed for RTP over DTLS tests . . . . . . . . . . . . . . . . . . . . . . . . . 598.2 Delay for normal RTP packets . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.3 Delay for RTP over DTLS packets . . . . . . . . . . . . . . . . . . . . . . . . . 63

7

1 Introduction

1.1 Motivation

Today enterprises have to maintain two networks in order to use the services of Internet

and Telephone. But traditional landline phones as we all know them are bit by bit replaced

with new Internet Phones for their advantages.

Internet Telephony is the routing of voice information over the Internet (or other IP based

networks). The telephone calls are handled by protocols which are commonly referred to

as Voice over Internet Protocol (VoIP). VoIP technology provides a wide range of services

to users. As an additional feature VoIP offers for example video calls. VoIP calls are also

cheaper than traditional phone calls; calls between two VoIP participants are even free.

Enterprises with branches in different cities that are connected by a VPN might use VoIP

technology for internal communication between the branches and can thereby reduce costs

significantly.

Beside the reduction of costs for calls, the infrastructure has become more flexible because

VoIP technology provides open platforms in contrast to traditional Telephony. In tradi-

tional Telephony networks standards were only known to a small circle of developers at

the network provider. Nowadays with VoIP the protocols, software and tools can be im-

proved and adjusted to the needs of the users not only by their developers.

The total number of VoIP users has been rising continuously over the past years. With

a rising number of users inevitably the number of malicious users rises as well. Hence

8

1 Introduction

security is a major concern for Internet Telephony. Since VoIP is based on IP [3], it is vul-

nerable to all of the attacks that can plague traditional IP networks, like packet snooping,

unauthorised access, spoofing and especially denial of service attacks. Usually a conversa-

tion over a traditional phone is established over the communication’s provider’s network.

All companies involved in the connection are known and have to be trusted. With VoIP

data is transmitted through a lot of networks where not all providers are known. Anyone

with access to a machine along the path of communication could access the transmitted

data. Therefore VoIP calls are more vulnerable to eavesdropping than landline telephones.

However this is a known problem from other applications transmitting confidential data

over an insecure network such as the Internet.

Cryptographic protocols can be used to secure data from being eavesdropped or altered. A

well known and as reliable considered security protocol is Secure Sockets Layer/Transport

Layer Security (SSL/TLS) [4]. SSL/TLS residues above Transport Layer and commonly

uses the Transmission Control Protocol (TCP) [5] or alike. TCP is a reliable and connection

oriented protocol with mechanisms for buffering and retransmission. Thereby it is assured

that the received data is exactly the same as the data transmitted. This is however not the

primary desired feature in VoIP. The problem hereby is the buffering and retransmission

mechanisms. The data is sent with unreliable IP protocol. Hereby packets might not arrive

in the order they were transmitted or they can get lost on the way. TCP reassembles the

packets to the right order and waits for lost packets to be retransmitted to reassemble the

data as it was sent. This is very useful for services like e-mail where the received data is

desired to be the same as the data transmitted, but in a VoIP a stream of data is played

continuously to the receiver and a delay caused by retransmission results in a pause of the

media stream.

A delay in VoIP is defined as the time the voice takes on its way from the mouth of the

speaker to the ear of the listener. It is the sum of time needed to digitalise the voice to

audio data, fragment the stream of audio data to data packets and transmit the data to

9

1 Introduction

the destination. Delays are commonly known in traditional telephony. For instance long

distance phone calls used to have quite long delays until the spoken word is received on

the other side. These delays make a fluent conversation like in a face to face conversation

impossible. Therefore in VoIP traffic the highest priority is not the exact transmission of

the data, instead the data needs to be transmitted to the receiver as fast as possible to re-

duce the delays caused by the transmission over the Internet. Thus VoIP protocols such as

the Real Time Transport Protocol (RTP) [6] rely on connectionless transmission using the

User Datagram Protocol (UDP) [7]. UDP has no mechanism for retransmission of lost data

packets. Hence in RTP lost, damaged or late packets are discarded and the media stream

is played continuously. In case a packet gets lost the next received packet will be played

immediately and as long as the amount of lost packets does not exceed a certain amount,

the receiver does not even notice that packets are lost.

With the goal of securing real time media such as VoIP, TLS was enhanced in order to work

with UDP datagrams. This advancement of TLS is called Datagram Transport Layer Se-

curity (DTLS) [8] and it was standardised in spring 2006 by the Internet Engineering Task

Force (IETF1). In the same time the IETF published an Internet Draft on RTP over DTLS [9].

The core of this thesis is the design, implementation and test of a prototype implementa-

tion of RTP over DTLS.

1.2 Goals

Goal of this thesis is taking part in the design of a unified media security framework for

Internet Telephony, using RTP and DTLS. The focus herby lies in the interaction of RTP

and DTLS components of the framework. A critical aspect in terms of efficiency of the im-

plementation framework is the packet loss. Packet loss in media streaming occurs, when

data packets do not arrive within a time limit to be inserted into the data stream any more.

1http://www.ietf.org/

10

1 Introduction

The critical aspect hereby is the delay, the sum of time it takes to transmit voice data from

caller to callee. The recommended threshold for delays in Telephony is 150 milliseconds

according to the International Telecommunication Union Standardisation Sector (ITU-T2).

For Telephony a packet loss rate of up to 5% is still acceptable according to the ITU-T.

Therefore the implementation of RTP over DTLS shall provide a packet loss rate lower

than 5%.

A prototype of a VoIP application using RTP over DTLS was implemented in order to deter-

mine whether RTP traffic can be effectively transmitted over DTLS without compensation

of the quality of the call. This prototype was developed based on existing implementations

of RTP and DTLS. Technical premises and detailed requirements to this implementation

needed to be analysed to lead to an adequate approach. The prototype was tested for the

critical aspects in order to determine the usability of the approach in order to lay path to

further development of the unified media security framework.

1.3 Thesis Organisation

This thesis is organised as follows: It starts with an overview on basic VoIP components in

the second Chapter as an introduction and for better understanding of this thesis. Chap-

ter 3 presents related work and a discussion about alternative concepts of how Internet

Telephony can be secured. Chapter 4 provides security considerations and an overview

of possible attacks in VoIP traffic. Chapter 5 presents the general concept of RTP over

DTLS. Chapter 6 deals with the implementation design and the choice of the libraries to

use. In Chapter 7 the implementation process is described along with problems and a de-

tailed description how the libraries work together. In Chapter 8 the prototype is tested

as an evaluation of the approach. Chapter 9 deals with open issues and future work and

contains the conclusion.

2http://www.itu.int/ITU-T/

11

2 Background

This chapter describes the basic concepts which are necessary to understand this thesis.

An introduction to Internet Telephony is given along with a description of the main pro-

tocols used in this thesis. Due to space limitations the level of detail is kept moderate and

interested reader are suggested to follow the references of this thesis.

2.1 Voice over IP

VoIP also called IP Telephony, Internet telephony, Broadband telephony, Broadband Phone

and Voice over Broadband is the routing of voice conversations over the Internet or any

other IP-based network. First transmissions of digitalised audio data from one computer

to another were achieved in 1973 in the Advanced Research Projects Agency Network

(ARPANET1) with a throughput of 3.490 bit/s [10].

A VoIP call is established in a similar way like a traditional phone call. There are three

general phases: Connection Initiation, Transmission of Voice and Connection Termination.

The initiation and termination are done over a signalling protocol. A common signalling

protocol today is the Session Initiation Protocol (SIP) [11] which is presented in detail in an

upcoming section, H323 [12], IAX (InterAsterisk eXchange) [10] and Skype2 [13].

To initiate a VoIP call with SIP the caller invites the callee to a so called session. In or-

1http://www.darpa.mil/2http://www.skype.com

12

2 Background

der to establish a connection, the caller needs to know where and whether the callee is

available. Subscribers of a SIP provider have a so called SIP-Uniform Resource Iden-

tifier (SIP-URI). These addresses are similar to e-mail addresses in the URI format (e.g.

sip:[email protected]). Before any user can call another user or receive a call, the

terminal device must register to the central server of the SIP provider and thereby inform

their provider that they are online and ready to receive calls. The server has now informa-

tion about the location of the logged user, thereby the user is reachable through the server

to other SIP users.

For connection initiation the caller sends an invite message to the server, which will be for-

warded to the callee, whose terminal device will be ringing then. Upon acknowledgement

of the call an accept message is send back to the caller along with the current IP address

of the callee. The servers are not needed for the session anymore because the session is in-

tiated. The media channel is now established directly between the participants with RTP.

A detailed description of a VoIP call initiation using SIP is provided in the upcoming SIP

section. It is generally (e.g. with SIP) possible to establish a connection directly between

caller and callee without servers, but then the IP address of the callee must be known to

the caller. This is somewhat impractical as we know from telephone numbers and from

the Internet. Nobody remembers a website by its IP address but by its domain name. A

name is a much better association to a person or company and much better rememberable.

Furthermore, IP addresses are dependent on the users location (e.g. at work and at home).

The transport of the audio data is achieved with the Real Time Transport Protocol (RTP)

[6], which is presented in detail in an upcoming section. RTP divides the audio data stream

into small packets which are then transmitted via IP usually directly from speaker to lis-

tener, where an audio stream is generated from the received data packets that is played to

the receiver.

In enterprises VoIP is used more and more to reduce infrastructure costs since only one

network infrastructure is needed instead of two, one for IP and one for Telephony. For en-

13

2 Background

terprises and private users a great benefit is the saving of telephone call costs. Calls from

VoIP to VoIP are normally free. Enterprises therefore tend to use VoIP for internal commu-

nication and traditional Telephony for outbound calls. Connections to landline phones are

possible through gateway services (which are provided e.g. by SIP providers) but these

connections are usually charged. In order to be reachable through such a gateway by a

traditional phone providers offer their customers additionally to their address a landline

phone number. The users are similarly to e-mail reachable through the same address or

telephone number worldwide regardless of the current residence as long as the user is

connected to the Internet.

As terminal device a large variety of devices can be used, that can connect to networks

(IPphones, cellphones, PCs, PDAs, Analogue Phones with special adapters, ...). Another

benefit of VoIP is the flexibility provided by the open standards. Thereby new services can

easily be added to VoIP. With reduction of costs, increased reachability, flexibility and ad-

ditional services like video calls VoIP will play a significant role in the future of Telephony.

2.2 Real Time Transport Protocol

In VoIP RTP [6] is the commonly used protocol for the transmission of audio data. RTP

provides end-to-end delivery services for data with real-time characteristics, such as inter-

active audio and video or simulation data. Those services include payload type identifica-

tion, sequence numbering, time stamping and delivery monitoring.

Typically RTP runs over UDP in order to achieve timely delivery of the data packets. TCP is

not used because of its retransmission mechanism. The reordering of retransmitted packets

leads to head of line blocking in the media stream which delays the packets which arrived

in time. RTP does not provide any mechanism to ensure timely delivery or provide other

quality of service guarantees, but relies on lower-layer services to do so (e.g. NSIS3 [14]).

3http://tools.ietf.org/wg/nsis/

14

2 Background

VoIP applications using RTP require at least two participants who communicate by trans-

mitting and receiving multimedia (voice and/or video) data to each other. An association

among a set of participants communicating with RTP is called an RTP session or confer-

ence. A participant may be involved in multiple RTP sessions at the same time.

The data transport of RTP is augmented by the RTP Control Protocol (RTCP) [6] to al-

low monitoring of data delivery in a manner scalable to large multicast networks, and to

provide minimal control and identification functionality. RTCP is based on the periodic

transmission of control packets to all participants in the session. The primary function is to

provide feedback on the quality of the data distribution. In its second function RTCP car-

ries a persistent transport-layer identifier for an RTP source called the canonical name, or

CNAME. While other idenifiers, as the later explained SSRC may change during a session,

the CNAME remains the same. It is used to identify a participant during a session. By

having each participant send its control packets to all other participants of a session, each

can independently observe the number of participants. This number is used to calculate

the rate at which the packets are sent. Hereby more users in a session result in less frequent

transmission of RTCP packets by each participant. This is necessary because otherwise the

RTCP data traffic could take bandwidth from the connection and cycles from the CPU that

are needed by the RTP data traffic.

To establish an RTP session a pair of ports is reserved one for audio data and the other one

for control (RTCP) packets. The audio conferencing application (the so called VoIP-phone)

is used by each RTP session participant and sends audio data in small chucks of approx-

imately 20 ms duration. Each chunk of audio is preceded by an RTP header indicating

what kind of audio encoding (e.g. PCM, ADPCM or LPC) is contained in each packet, so

that senders can change the encoding during a conference. To cope with lost packets and

delays the RTP header contains timing information and a sequence number that allow the

receivers to reconstruct the timing produced by the source. Hence the audio stream can be

played out continuously. Conferences of both, audio and video are realised by transmit-

15

2 Background

Figure 2.1: Strukture of an RTP packet

ting each in a separate RTP session.

In case one of the participants of an RTP session has a lower bandwidth connection to the

network than the other participants, an RTP-Proxy (or so called mixer) can be used to solve

this issue. A mixer is placed in the low bandwidth area; the mixer resynchronises incom-

ing audio packets from multiple sources to a single audio stream. Thereby the audio data

can be further compressed by using a different codec to enable the user with the low band-

width connection to receive packets from multiple sources. Mixers can be used as well to

compose a single video stream as a composition of multiple sources to a group scene of the

participating users.

The source of the stream of RTP packets is identified by a numeric value in the header of

RTP packets. This 32-bit numeric value is called Synchronisation Source (SSRC) identifier.

Therefore it is independent upon the network address. Since all packets from an SSRC

form part of the same timing and number space, a receiver can group packets by the SSRC

for playback. The outgoing RTP packets of the mixer are then identified by the mixer’s

SSRC value.

The structure of an RTP packet is shown in figure 2.1 on page 16. The RTP Payload consists

of the media that is being transmitted. The RTP Header contains information related to the

payload e.g. the source, encoding etc. The RTP packet is then wrapped in a UDP packet

which is encapsulated in an IP packet to be transferred over an IP based network.

16

2 Background

2.3 SSL/TLS and DTLS

The first versions of SSL were developed by Netscape4 as a security protocol for Internet

traffic with the Netscape Internet Browser. Netscape’s competitor Microsoft5 developed its

own Security Protocol, the Private Communications Technology (PCT) which was derived

from the second version of SSL. The IETF chartered the Transport Layer Security (TLS)

working group to try to standardise an SSL like protocol in May 1996 to harmonise the

different approaches with the result that SSL was from then on enhanced under the name

TLS.

Today TLS is a widely deployed protocol for securing network traffic. It is currently used

for protecting Internet traffic (e.g. Internet Banking) with the Hyper Text Transfer Protocol

Secure (HTTPS) [15] and for e-mail protocols. It provides a secure channel to applications

with three primary security features:

• Authentication of the server

• Confidentiality of the communication channel

• Message integrity of the communication channel

Optionally TLS can provide authentication of the client. Public key based digital signa-

tures are used which are backed by certificates. The server authenticates by decrypting a

secret encrypted under his public key or by signing a random challenge.

The TLS handshake is a conventional two round trip algorithm negotiation and key estab-

lishment protocol. Hereby the most common variant is RSA based handshake [16]. Figure

2.2 on page 18 presents the handshake which can be divided into four phases: In phase

1 the client sends a client_hello to the server who responds with a server_hello. In these

messages the latest supported protocol version is transmitted to negotiate the version to be

4http://www.netscape.com5http://www.microsoft.com

17

2 Background

Figure 2.2: Schematic representation of the SSL handshake protocol with two way authentication with certificates [1].

18

2 Background

Figure 2.3: DTLS in the TCP/IP stack

used, a 32 bit random number upon which the pre-master secret will be generated, the Ses-

sion Identifier (Session ID) and the cipher suite to use. Phase two and three are optional, in

phase two the server identifies himself with a certificate to the client. The client identifies

himself to the server in case a certificate is available. Additionally the client verifies the

server certificate which contains the public key of the server. If the certificate cannot be

verified, the connection is closed. The handshake is finished in phase four with the genera-

tion of the Master Secret, a single use symmetric key that is used during the connection for

en-/ and decryption of messages. From now all messages will be transmitted encrypted.

With the rising popularity of VoIP and other multimedia services it became necessary to

use TLS as well with the faster UDP protocol. TLS itself could not be used directly, because

after a packet loss the following data packets cannot be decrypted anymore.

Datagram Transport Layer Security (DTLS) [8], which was standardised in April 2006, is

a datagram capable version of TLS; therefore it is extremely similar to TLS. The DTLS

protocol allows client/server applications to communicate in a way that is designed to

prevent eavesdropping, tampering, or message forgery. DTLS reuses almost all the proto-

col elements of TLS, with minor but important modifications for it to work properly with

unreliable transport protocols. Figure 2.3 on page 19 shows DTLS in the five layer TCP/IP

protocol stack.

DTLS packets have a structure as in figure 2.4 on page 20. In contrast to TLS in the DTLS

19

2 Background

Figure 2.4: DTLS packet struckture

handshake protocol a stateless cookie exchange is used to prevent denial of service. Addi-

tionally message fragmentation and re-assembly was added. DTLS handshake messages

may be lost, since transmission takes place over datagram transport; therefore DTLS needs

a mechanism for retransmission during handshake. This is achieved by incorporating a

timer at each end point. Each end-point keeps retransmitting its last message until a reply

is received.

Furthermore DTLS unlike TLS is vulnerable to two types of denial of Service attacks. The

first attack is a standard resource consumption attack. The second attack is an amplifi-

cation attack, where the attacker sends a client_hello message apparently sourced by the

victim. In order to avoid these attacks, DTLS uses the cookie exchange technique that has

been used in protocols such as Photuris [17].

Before the handshake proper begins, the client must replay a cookie provided by the server

in order to demonstrate that it is capable of receiving packets at its claimed IP address. The

DTLS client_hello message contains a cookie field, which is empty in case there is no cached

cookie from a prior exchange. The message contains the DTLS version, a list of algorithms

and compression methods that the client will accept. The server responds with three mes-

sages, the server_hello contains the server’s choice of version and algorithms. The certifi-

cate contains the server’s certificate chain. The server_hello_done is a message to inform

the Client that the handshake is done. Because of the possibility that DTLS handshake

messages get lost, DTLS implements retransmission using a single timer at each endpoint.

Each endpoint keeps retransmitting its last message until a reply is received.

A state machine implements the timer and resulting retransmissions. Figure 2.5 on page

20

2 Background

Figure 2.5: DTLS state machine [2]

21

2 Background

21 shows this state machine. Once in the ‘Read Message Fragment’ state, transitions are

triggered by the arrival of data fragments or the expiration of the retransmission timer. If a

data fragment is the expected next handshake message then the fragment is returned to the

higher layers and the timer is revoked. Otherwise, the fragment is buffered or discarded

as appropriate and the timer is allowed to continue ticking. When the retransmit timer

expires, the implementation retransmits the last messages that it transmitted.

DTLS is perfectly predetermined to be used with VoIP because the security of TLS is com-

bined with fast delivery of UDP filling this gap with the existing protocols.

2.4 Session Initiation Protocol SIP

The Session Initiation Protocol is a protocol to enable multi-user communication sessions

regardless of media content. SIP is specified by the IETF in RFC 3261 [11]. SIP emerged in

the mid-1990s from the research of people among whom some were involved in the spec-

ification of RTP. SIP is an application-layer control protocol that can establish, modify and

terminate multimedia sessions such as VoIP calls.

SIP transparently supports name mapping and redirection services, which supports per-

sonal mobility. Thereby SIP provides the basic requirements in communications like:

• User location

• User availability

• User capabilities

• Session setup

• Session management

User location: SIP determines the location of a user by a registration process. When a VoIP

22

2 Background

phone is activated, it sends out a registration to the SIP server announcing availability

to the communications network.

User availability: User availability is a method of determining whether a user would be

willing to answer a request to communicate. A user can have several locations regis-

tered, but might only accept incoming communications on one device. If that is not

answered, it transfers to another device or an application, such as voice-mail.

User capability: There are many methods and standards of multimedia communications,

this method checks for the users’ capabilities, for example whether a camera for video

calls is available or which encryption/decryption methods a user can support.

Session setup: SIP establishes the session parameters for both ends of communications,

the actual session establishment, when one user calls and another user answers.

Session management: This method manages for example the transfer of a call from one

device to another (e.g. from a laptop to a mobile-phone and vice versa) without caus-

ing a noticeable impact to the communication partner. Another example is the invi-

tation to a third user to a VoIP session and thereby the establishment of a conference

call (multiuser session).

SIP is not a vertically integrated communications system. SIP is rather a component that

can be used with other IETF standardisations, like RTP to build a complete multimedia

architecture. An important feature of SIP is that it does not define the type of session that

is being established, only how it should be managed. This flexibility means that SIP can

be used for a huge number of applications and services. To date, the 3G Community6 has

selected SIP as the session control mechanism for the next generation of cellular networks.

Microsoft has chosen SIP for its real-time communications strategy and has deployed it in

6http://www.3gpp.org/

23

2 Background

various products.

There are four major components in the SIP architecture:

• SIP User Agents

• SIP Registrar Server

• SIP Proxy Servers

• SIP Redirect Servers

These components deliver messages embedded with the Session Description Protocol (SDP)

[18] defining their content and characteristics to complete a SIP session. The terminal de-

vices of SIP are called the SIP User agents (UAs), which can be any kind of a device capa-

bility of transmitting voice or other media over a network (e.g. cell-phones, PCs, PDAs,...).

These devices are used to create and manage a SIP session. Every User Agent needs a

unique identifier which is called SIP-URI. SIP addresses use like e-mail addresses the URI

format: “sip:[email protected]”. Another address system are the URLs for Telephone

Calls (tel-uri) which are described in [19] where a traditional phone number can be mapped

to a SIP address. This is used by gateway servers that many SIP providers maintain in order

to enable traditional phone users to call VoIP users. Basically a connection is established,

when a User Agent Client (caller) sends an invitation message and the User Agent Server

(callee) responds to it. This initiation can be achieved directly (peer-to-peer), in case the

current IP address of the User Agent server is known. For the user it is more comfortable

to initiate the session with the SIP provider using a SIP-URI.

The SIP Registrar Servers are databases that contain the location of all User Agents within

a domain. These servers retrieve and send participants messages and other information to

the SIP Proxy Server.

SIP Proxy Servers accept session requests made by a SIP User Agent and query the SIP

Registrar Server to obtain the recipients User Agent’s addressing information. The SIP

24

2 Background

Figure 2.6: Initialisation of a SIP session

25

2 Background

Proxy Server then forwards the invitation to a session directly to the recipient User Agent

if it is located in the same domain or to a Proxy Server if it is located in another domain.

The SIP Redirect Servers allow SIP Proxy Servers to redirect SIP session invitations to ex-

ternal domains. The SIP Redirect Server, the SIP Registrar Server and The SIP Proxy Server

may reside in the same hardware. Figure 2.6 on page 25 illustrates the establishment of a

SIP session between two Internet Service Providers (ISPs).

Before any session may be established both users must power their devices and register

their availability and their IP addresses with the SIP Proxy Server in the ISP’s network in

case the connection is established with a SIP provider. User A initiates the call by notifying

the Proxy Server in domain A.com a request to communicate with User B.

1. The SIP proxy Server in Domain A recognises that User B is outside its domain upon

reception of the request from user A

2. SIP proxy Server A then queries a request for User B’s IP address to the SIP Redirect

Server which location can be in Domain A or B. Note that the lookup at the Redirect

Server is not SIP queried, it is for instance a DNS lookup.

3. The SIP Redirect Server returns User B’s Proxy Server address.

4. The SIP Proxy Server in Domain A forwards the session initiation request to the SIP

Proxy Server in Domain B.

5. The SIP Proxy requests the current IP Address of User B from the Registrar Server in

Domain B.

6. The Registrar Server returns User B’s SIP Address.

7. The SIP Proxy relays User A’s invitation to communicate with User B to User B. This

request includes information about the media (audio and/or video). Hereby SDP is

used.

26

2 Background

8. User B informs the SIP Proxy that User A’s invitation is accepted and that he is ready

to receive the message.

9. The response from User B is forwarded to User A. Hereby the return path is provided

since all servers left their address in a specific field of the invitation.

10. The response from User B is forwarded to User A.

11. User A and B create a point-to-point RTP connection enabling them to interact.

27

3 Related Work

The following chapter presents related work such as alternative approaches to secure VoIP

traffic. The IPsec protocol is presented and compared to the chosen DTLS protocol along

with reasons for this choice.

3.1 Security in VoIP

Securing a traditional phone is neither an easy task nor cheap since additional devices

would have to be installed to secure the communication channel. Both communication

partners would need such a device which results in high additional costs on both sides. It

is much easier to implement a security service to a VoIP phone; both communication part-

ners may download and install the software and are thereby able to communicate over a

secured channel without much effort. Surprisingly many consumer VoIP solutions do not

support any encryption yet. Hence it is not a complex task to eavesdrop on VoIP calls and

even change their content [10].

There are some open source solutions that facilitate sniffing of VoIP conversations. One

example is the Voice Over Misconfigured Internet Phones (VOMIT)1 [20] software enables

even unprofessional users to easily eavesdrop VoIP calls. The software extracts the au-

dio data from a stream of data that is being transmitted over an insecure network like the

Internet. Some vendors use compression to make eavesdropping more difficult. The exist-

1http://vomit.xtdnet.nl/

28

3 Related Work

ing secure standard SRTP [21] and the new ZRTP [22] protocol are available on Analogue

Telephone Adapters (ATAs) as well as various softphones. Although some devices sup-

port SRTP, and thus enabling encrypted VoIP calls, the problem herby is that in standard

configuration the keying material is transmitted unencrypted in clear text over the net.

Eavesdroppers are thereby able to access the keying material which makes the encryption

(almost) useless. Furthermore users need to study the manual to find out how to enable

the secured key sharing [23].

It is possible to use IPsec to secure peer-to-peer VoIP by using opportunistic encryption,

which will be presented in the coming section. Skype, a proprietary peer-to-peer Internet

Telephony network is closed source, which means that the source code is not published,

has over 200 million users worldwide. Skype does not use SRTP, but uses encryption which

is transparent to the Skype provider. The user cannot turn encryption on or of, and has to

rely on the software and provider. The Voice VPN solution provides secure voice for en-

terprise VoIP networks by applying Internet Protocol Security (IPsec) [24] encryption to

the digititalsed voice stream [10]. IPsec will be explained in the upcoming section as an

alternative approach to secure VoIP traffic.

3.1.1 Internet Protocol Security, IPsec

Internet Protocol Security [24] is a suite of protocols for securing Internet Protocol (IP)

[3] communications by authenticating and/or encrypting each IP packet in a data stream.

Additionally IPsec includes protocols for cryptographic key establishment. IPsec was de-

veloped in 1998 as an approach to fill the shortcomings in terms of security of IP. IPsec

provides confidentiality, authenticity and integrity.

The main document to IPsec describes the architecture of the protocol suite, referencing the

following RFCs upon which IPsec relies: Authentication Header (AH) [25], Encapsulating

Security Payload (ESP) [25] and Internet Key Exchange (IKE) [26].

29

3 Related Work

Figure 3.1: IPsec in the TCP/IP stack

IPsec operates on network layer, therefore it is capable of securing TCP- and UDP-based

protocols, which residue on transport layer, as illustrated in figure 3.1 on page 30. IPsec

operates in two different modes: transport mode and tunnel mode. In transport mode,

only the payload of the data packet is encrypted and/or authenticated. The routing is in-

tact, since the IP header is neither modified nor encrypted. Transport mode is used for

peer-to-peer communications. In tunnel mode, the entire packet is encrypted and/or au-

thenticated; therefore it must be packed into a new IP packet for routing to work. The

tunnel mode is used for peer-to-peer communication as well as for network-to-network

and host-to network connections.

The first thing that needs to be done upon connection initiation is the exchange of the key-

ing material. Hereby the possibly most complex component of IPsec is used, IKE. IKE is

using the Diffie-Hellman Key Agreement Method [27] for exchange of keys over an in-

secure network and is based on the Internet Security Association and Key Management

Protocol (ISAKMP) [28], the IPsec Domain of Interpretation (DOI) [29] and the Oakley Key

Determination protocol [30] and SKEME [31]. Both sides of the connection need to authen-

ticate themselves to the other side and agree to a keying algorithm.

The AH guarantees connectionless integrity and the data origin authentication of IP data-

gramms. It can optionally protect against replay attacks by using the sliding window tech-

nique and discarding old packets. AH protects the IP payload and all header fields of an IP

datagram except for those that might be changed during transmission. Figure 3.2 on page

31 shows a TCP packet before the AH is inserted and after.

30

3 Related Work

Figure 3.2: Structure of an IPsec packet with AH

Figure 3.3: Structure of an IPsec packet with ESP

The ESP protocol provides origin, authenticity, integrity and confidentiality of a packet.

Unlike AH, the IP packet header is not protected by ESP. Figure 3.3 on page 31 shows a

TCP packet before and after ESP is applied in tunnel mode. The IPsec support is usually

implemented in the kernel and the key management is carried out from the user space.

However, as there is a standard interface for key management, it is possible to control one

kernel IPsec stack using key management tools from a different implementation.

IPsec is part of IPv6. It was intended to provide either transport mode or tunnel mode,

where packets can be provided to several machines; furthermore it can be used to cre-

ate Virtual Private Networks [32]. In comparison to TLS IPsec is a peer-to-peer protocol,

designed as a generic security mechanism for Internet Protocols. There are a number of

problems using IPsec for securing datagram traffic generated by client server applications

31

3 Related Work

which will be discussed in the comparison of IPsec and DTLS in the next section.

3.1.2 Comparison between IPsec and DTLS

In contrast to DTLS, IPsec consists of three protocols: Authentication Header (AH) [33],

Encapsulating Security Payload (ESP) [25] and Internet Key Exchange (IKE) [26]. These

technologies work together to provide security for IP traffic. The IETF standardised IPsec

first in [28], ’Internet Security Association and Key Management Protocol’ (ISAKMP). The

architecture of IPsec is described in [24]. The key exchange and parameter management of

IPsec is provided by ISAKMP and IKE while data protection is provided by AH and ESP.

All these separate developments are connected with security associations (SAs). ISAKMP

and IKE are used to establish SAs which are used by AH and ESP to protect the data.

The SAs can somehow be compared to a DTLS Session while an SA is only unidirectional

in comparison to a DTLS Session. Each SA has a unique 32 bit identification tag which is

carried in each packet. IPsec has two methods to establish an SA, manual and automatic

keying where automatic keying is similar to the DTLS handshake. The IKE key exchange is

based on STS [34], Oakley [30] and SKEME [31]. The two parties exchange Diffie-Hellman

public keys and use the shared key to derive traffic encryption and message authentica-

tion.

One big disadvantage of IPsec is the complete failure as soon as a router performs Net-

work Address Translation (NAT). This technology allows a large number of users to use a

small amount of IP addresses. Hereby the machines behind a NAT router obtain private IP

addresses and the router translates the private address to a public address when the ma-

chine connects to a server in the Internet. Since the users private IP address is not known

outside the sub network behind the router and IPsec is used to create connections between

machines, IPsec cannot establish a connection between the private IP address behind the

router and an IP address in the Internet since outside the subnet only the public address of

32

3 Related Work

the sub network is known.

Another problem is the lack of standardisation among IPsec APIs resulting in portability

problems when an application wishes to control the keying policy. In DTLS portability can

be achieved although DTLS APIs are not standardised either since an application can be

shipped along with the DTLS toolkit. For IPsec this is not so easily achievable because of

its residence in the kernel space in contrast to DTLS which residues in application space.

In order to simplify key negotiation, IPsec uses a reliable TCP connection to secure a sepa-

rate datagram channel. This design is smart but has some problems. First, the application

now has to manage two different sockets and synchronise them, where synchronisation is a

significant programming problem. If the TCP connection is left open after key negotiation,

unnecessary system resources are wasted. On the other hand when the TCP connection is

closed after key negotiation, any renegotiation must be done over UDP requiring another

implementation for the keying negotiation over UDP which would make the key nego-

tiation over TCP obsolete. Therefore it is more useful to have key negotiation and data

transfer on the same channel.

To secure RTP traffic DTLS is more suitable since RTP runs over UDP, any unnecessary

connection (e.g. TCP for key negotiation) is a waste of system resources. VoIP is time sen-

sitive therefore the addition of a security overhead should cost the least possible system

resources thus providing enough security to be reliable. Furthermore for the use of IPsec

as it resides in the kernel, for its use on a system not supporting IPsec the TCP/IP stack

needs to be changed. To secure the application with DTLS only another application needs

to be used.

3.2 Secure Real Time TransportProtocol

The Secure Real Time Transport Protocol (SRTP) [21] defines an already implemented pro-

file for RTP, which intends to provide encryption, message authentication and integrity,

33

3 Related Work

and replay protection to the RTP data in Unicast and multicast applications. Note that

SRTP must not be confused with RTP over DTLS.

SRTP was published as RFC 3711 [21] in March 2004. This tightly coupled encryption mode

for RTP provides a number of benefits. The RTP header is left unencrypted which enables

header compression (see [35], [36], and [37]) and easy debugging. The packets appear to be

RTP packets, which is a benefit for firewall compatibility. There is a zero header overhead.

SRTP relies on an external key management protocol to set up the initial master key. Two

protocols specifically designed to be used with SRTP are ZRTP [22] and Mikey [38]. There

are also other methods to negotiate the SRTP keys, several vendors offer products that use

the SDES key exchange method.

For encryption and decryption of the data flow, SRTP standardises utilization of only a

single cipher. The Advanced Encryption Standard (AES) [39] is used by SRTP. AES can be

used in two cipher modes, which turn the originally block AES cipher into a stream cipher.

Since SRTP does not provide a keying mechanism and has to rely on other protocols it

cannot be regarded as solution to secure VoIP traffic. In combination with ZRTP VoIP traf-

fic can be secured. However SRTP is not widely used since users claim a reduced audio

quality as a reason to turn ZRTP protection off. Furthermore ZRTP is not a widely known

security architecture like TLS and therefore not as trustworthy as RTP over DTLS can be.

34

4 Security Considerations for VoIP

As already mentioned the transport of voice data over insecure networks as the Internet

is harmed by various threats. This chapter provides security considerations about these

threats, pointing out attacks and security goals to achieve in order to countervail these

attacks.

4.1 Introduction

In order to classify the threats to VoIP properly first the security-goals must be formulated.

VoIP is IP traffic and thus the same attacks can be used. This is why VoIP calls are vulnera-

ble to a variety of threats that traditional telephone calls are not. Any data being transmit-

ted is at some risk of being eavesdropped. Data packets can be eavesdropped on anywhere

along the transmission path. Alternatively the eavesdropped data could be changed and

transmitted to the receiver, who would not notice receiving altered data, which is called a

man in the middle attack. By transmitting the same message, e.g. an invitation to a VoIP

phone call many times, the receiving machine could be kept so busy that no real calls can

come through. This is called a denial of service attack.

There are three classical primary security goals in modern communication systems:

• Confidentiality

• Integrity

35


• Availability

Confidentiality has been defined by the International Organisation for Standardisation

(ISO) as "ensuring that information is accessible only to those authorised to have access"

[40]. Integrity is the protection of unauthorised alteration of the transmitted data. Message

integrity is as well as confidentiality a part of DTLS. It ensures the user that the received

data has not been changed without his notice. Availability means that the transmitted data

will reach its destination and will thereby be available to the receiver. The Integrity of the

voice data is hereby an important issue. Certainly it is easier to recognise whether someone

on the phone is the person he or she claims to be than to recognise whether an e-mail was

really written by the declared sender. This argument however applies mostly to private

communication and communication among people who know each other well. But voice

messages can be recorded, edited and replayed resulting in not letting the receiver notice

that the caller is not the person he or she claims to be. Besides the integrity of the voice

data, as well the signalling data needs to be integer and unaltered.

The identity of the caller and the callee needs to be protected. If an attacker manages to

manipulate his own identity he might achieve that the callee will be displayed a different

id of the caller upon reception of a call. This can be used to reach persons on the phone

who usually are not taking calls from anybody (e.g. the chief executive of a company). By

acquiring a fake identity the billing of the VoIP provider can be bypassed and called will

be charged to original owner of the account.

4.1.1 Confidentiality in VoIP

Confidentiality is an important security goal. In the context of VoIP the focus lies in the con-

fidentiality of the voice data. This means that calls cannot be eavesdropped. In VoIP con-

fidentiality is threatened more than in traditional telephony. In traditional telephony the

attacker needs to have physical access to the network. Traditional telephony runs mostly

36


over separate networks while in VoIP the voice data is transmitted over the Internet, where

all connected machines have the potential to be accessed through security holes. Many pro-

tocols in traditional telephony are barely published; therefore the analysis and attacks to

traditional phone calls require special hard and software. The amount of people who are

capable of eavesdropping phone calls is hereby reduced but it is not impossible.

4.1.2 Availability in VoIP

Availability in VoIP networks has two primary meanings, first, the availability of the tele-

phone service, which means that in case of SIP the SIP Proxy Servers are available and able

to initiate sessions properly. Availability may also be harmed by unwanted calls, a problem

which will be explained in the upcomming section. The second aspect of availability is the

quality of the VoIP call. Both communication partners need to be able to understand each

other clearly.

4.2 Threats and Attacks

As described in the preceding section VoIP calls are threatened by the same attack as other

applications running over IP networks. Therefore an overview of different attacks is given

in this section to classify the threats to VoIP calls. However a new technology that offers

new services to users also offers new possibilities to attackers.

Attacks can be divided into two groups, the first group is the group of passive attacks,

which include eavesdropping calls and sniffing messages which are transmitted over the

Internet. Much stronger is the second group of active attacks. Messages are manipulated

during their transmission or faked messages are sent. An example of such an active at-

tack is the Man-In-the-middle attack, where the attacker gains control of a router between

two communicating systems and redirects transmitted packets. So called network or Port-

Scans are used to plan an attack, by searching for weaknesses, hereby the attacker sends

37


various requests to a network or host in order to acquire information needed for further

steps, like the operating system or installed services. For a so called Spoofing Attack, mes-

sages or data packets with faked information are used. For example the IP address or

MAC address of the sender can be changed so that the receiving machine assumes that

the packet was sent from a trustworthy source. Another example for spoofing is DNS

Spoofing; hereby DNS answers are changed, which results the requesting machine to com-

municate with a machine the hacker prepared. Denial of Service attacks replay request

messages to servers in such high amounts that the server’s service is not available any-

more to regular users, targeting the availability of a system.

VoIP might also be target of new attacks which are enabled through VoIP. Spam is a com-

monly known problem these days. Spamming is the abuse of e-mail to indiscriminately

send unsolicited bulk messages. E-mail spam involves sending nearly identical messages

to numerous recipients. As already mentioned SIP uses a similar address format as e-mail

thus the problem of e-mail spam might become a problem for VoIP in the future. VoIP

spam is not yet an existent problem, nonetheless it receives a great deal of attention from

marketers and trade mark press. VoIP spam is also referred to as SPIT (Spam over Internet

Telephony). Hereby malicious users could be telemarketers or prank callers. Currently

there are rules for e-mail systems that block unwanted e-mail, such systems could (and

probably will) also be applied to VoIP systems. SIP as the technology has been designed

to support presence natively. Thereby incoming callers know the availability before even

attempting to initiate a call.

The three security services are realised through DTLS and implemented in the OpenSSL

library which makes it a reasonable choice to secure VoIP traffic. Unfortunately no encryp-

tion can prevent the biggest threat, a virus or trojan on the endpoint giving a hacker access

to the machine and thereby to the decrypted data.

38

5 RTP over DTLS

This section describes the basic idea of RTP over DTLS and possibilities for its realisation.

Furthermore the performance of the intent is considered in comparison to SRTP.

5.1 Introduction to RTP over DTLS

RTP is using UDP to transmit data over IP based networks. Implementations typically

have interfaces to UDP socket classes to open/close sockets and transmit/receive data.

DTLS is using UDP sockets as well for transmission and reception of data. Therefore RTP

can operate as well on top of DTLS instead of just UDP, when functionality for connection

initiation is added. Thus an encryption scheme is added to RTP providing key exchange

and encryption/decryption of data. The basic idea to realise this is an interface that is used

by an alternative RTP class for the underlying transport protocol that manages connection

requests and connection acknowledgements. Thereby SIP softphones could simply start

RTP over DTLS session as an alternative RTP profile, instead of a standard RTP session.

Since normal RTP and RTCP payloads are sent in a UDP packet, the can be send as well in

a DTLS packet. Therefore an RTP packet send over DTLS has the layout as in figure 5.1 on

page 40.

RTP opens typically two sessions, one for data traffic and one for RTCP traffic. In order to

secure RTP traffic at least for the data session should be a DTLS session should be initiated.

Securing the RTCP session would be possible as well but since the RTCP packets do not

39

5 RTP over DTLS

Figure 5.1: Struckture of an RTP packet sent over DTLS

contain confidential data, this is not mandatory.

RTP over DTLS is a trustworthy approach in order to achieve secured VoIP calls. DTLS is

practically designed to be used in a VoIP scenario and because of its well known predeces-

sor likely to gain the trust of users as well.

5.1.1 SRTP Compatibility Mode

SRTP Compatibility Mode is a profile for RTP over DTLS which is presented in [9]; it de-

pends on two extensions to SRTP which reduce the pre-record bandwidth of the data chan-

nel and allow partial encryption of record bodies .

This profile depends on ’Extensions for DTLS in Low Bandwidth Environments’ [41] and

on ’TLS Partial Encryption Mode’ [42]. In this profile, the RTP header is left unencrypted,

which enables header compression. With unencrypted headers the packets appear as RTP

packets which results in firewall compatibility. Furthermore this profile provides encryp-

tion with a zero header overhead, and thus improved performance in comparison to RTP

over DTLS. For this profile, implementations need to negotiate the TLS partial encryption

extension, the DTLS implicit application data header and the TLS MAC truncation exten-

sion. Thereby the RTP over DTLS packets would look identical to SRTP packets with a

10-byte MAC value. They can only be distinguished with access to the DTLS or SRTP key-

ing material.

Since the RTP header is clear, header compression and debugging both work. The security

properties of DTLS are not affected by these extensions. This extension to RTP over DTLS

40

5 RTP over DTLS

is not part of the implementation conducted in this thesis but worth to note for future

development of the profile.

5.1.2 Packet size Comparison

This section provides a comparison of packet sizes in order to estimate the performance of

RTP over DTLS in comparison to RTP, SRTP and the SRTP compatibility mode.

Since most of the RTP infrastructure is reused, the overhead for SRTP is low. A 20 ms RTP

packet encoded with G.729 codec has a size of 60 bytes. This RTP packet would be just 4

bytes longer, when SRTP is used, but only as long as SRTP is used without a master key

identifier. But as already described in a previous section this is not desired. With master

key identifier the SRTP packet has a size of 68 bytes. When DTLS is used, the same packet

would be 98 bytes long while in SRTP compatibility mode the packet size could be reduced

to 70 bytes which marks an excellent result. Therefore the SRTP compatibility mode should

be added to RTP over DTLS in the future.

5.1.3 Security Considerations

RTP over DTLS can be considered secure since DTLS is based on TLS, which has seen ex-

tensive security analysis. The handshake algorithm incorporated in DTLS works over an

insecure channel. Only the certificates have to be proved to be correct. In the standard au-

thentication strategy of DTLS a PKIX [43] certificate is exchanged. When the client verifies

the certificate he checks whether the name in the certificate matches the server’s domain

name. This works because there are relatively small number of servers with well defined

names; a situation which does not usually occur in the VoIP context [9]. Alternatively the

certificates could be self signed but then the client must be able to verify the server’s certifi-

cate correctly and vice versa. An approach to address this is using SIP [11] and the Session

Description Protocol (SDP) [18] and is described in [44] and [45].

41

6 Implementation Design

This chapter provides an analysis of requirements along with a description of the choice

of implementations used in this thesis. Hereby the chosen libraries are presented as well.

The system idea is presented in a more detailed way along with the functionality and

interaction of the single components used for the prototype implementation.

6.1 Analysis of Requirements

The most important phase of a software project is the analysis. Empirical Studies on fail-

ures of software projects have proven that indistinct formulation of goals and requirements

are with distant the most popular reasons for a failure. Small mistakes with their root in the

early development stage caused by inaccuracies can lead to big problems in the final stage

of the development process because of error propagation. The detailed documentation of

requirements in the early stage of the development process is therefore indispensable as a

guideline through the project.

The requirements and goals formulated in this section base on studies of the protocols and

their implementations and discussions with my supporting Prof. Dr. Xiaoming Fu. During

the process of development requirements can be altered or extended to reach the goals and

to react flexibly to problems on the way.

42


6.2 System Idea/Intent

The concept of the system to implement is based on H. Tschofenigs Internet Draft [9] and

the concept was discussed in acknowledge sessions with H. Tschofenig and Professor Dr.

X. Fu.

The basic idea is to secure RTP data traffic using the DTLS protocol. In order to achieve this,

a prototype needs to be implemented to prove the functionality of the idea. In the following

steps the surrounding framework of software needs to be extended to support this option.

In the first step DTLS has to be well investigated to formulate the demands of changes

needed at the RTP side of the project to reach the goal. The second step involves analysis

of the RTP implementation and protocol to determine the functionality of the connection

DTLS shall provide. As the consecutive step the interfaces of the implementations are used

to derive the implementation design of the project.

When audio data is successfully transmitted with RTP over DTLS the next step is to prove

the functionality in a SIP application such as a softphone.

6.2.1 DTLS

The DTLS protocol is designed to secure data between communicating applications. It is

designed to run in application space, without requiring any kernel modifications.

DTLS uses regularly one UDP socket per connection and endpoint. Therefore upon con-

nection initiation at each endpoint a socket is created before the DTLS handshake can be

initiated. After successful completion of the handshake the sockets are ready to transmit

and receive secured data. Upon termination of the connection both sockets are closed.

6.2.2 RTP

RTP has no possibilities to initiate a connection between two hosts itself. Therefore addi-

tionally SIP is used to initiate a Session between two computers. Upon connection initation

43


RTP initialises two sessions on each host, one for data and one for RTCP traffic. Each of

these sessions normally consist of two sockets, one for reception and one for transmission.

Next the RTP stack is started and packet transmission and reception starts on each session

until the RTP stack execution is stopped.

Beside unicast conferences RTP is also capable of multicast conferences. This feature can

not mapped to a DTLS secured session since the key exchange protocol of DTLS is designed

only for host to host communication and the DTLS key exchange is one of the cornerstones

of DTLS’s benefits to the implementation.

RTP data (and control) packets are usually transmitted via UDP; therefore RTP comes with

an underlying transportation layer similar to the transportation layer DTLS uses. A reuse

of these functions shall be reviewed in order to keep changes slim and simple in the up-

coming design section.

6.2.3 SIP Softphone

The RTP media channel is initiated through SIP as presented in figure 2.6 on page 25.

Therefore the RTP over DTLS session will also be initiated by the SIP softphone client

application. A softphone application is the best choice to test the implementation frame-

work. A media channel between two hosts will be established using the RTP stack with an

underlying DTLS. Herby options for key generation and administrative functions for the

certificate files should be implemented.

6.3 RTP over DTLS

The requirements regardless of implementation design can now be formulated more de-

tailed. In order to build a unified media security framework changes need to be done to all

components that interact together.

Before any connection with RTP over DTLS can take place the user must choose the option

44


that RTP over DTLS should be used if available for both communication partners. The SIP

component needs to support mechanisms necessary to cope with basically four cases. In

first case the connection can be established without errors, when both communication part-

ners have a proper running system which supports RTP over DTLS. In second case there

is an error on the caller side which might occur, when certificates cannot be accessed. The

caller should be notified by that already when settings are adjusted to use RTP over DTLS

for calls in the setup. In case the RTP over DTLS feature is not supported by the callee

either the connection will be established without any protection, or the next supported se-

curity system supported by both sides will be used. Hereby of course the caller must be

notified that the connection is not secured in the intended way. At last there is of course

the chance that security certificates cannot be verified or the DTLS connection could not be

initialised properly for other reasons and therefore a secure connection therefore cannot be

guaranteed. In this case the users needs to be informed immediately about the situation

and get an advise what this means and what to do.

When the call is accepted by the callee and both parties have RTP over DTLS available this

component is started to initialise the DTLS sockets. The RTP session hereby needs to be

divided to a server and client (passive and active) part, where the client initiates the DTLS

connection to the server and the server accepts the client’s connection request. When the

connection is established the data transfer of RTP can start. At the end of the session the

DTLS connection needs to be properly shut down. DTLS negotiates the ciphers during

handshake (see Background section) and exchanges certificates and keys. These keys must

be generated as well and certificates provided. This task will be done by the SIP application

in connection with functions provided through OpenSSL.

45


6.4 Choice of Libraries

This section presents the choice of libraries implementing the protocols used, like DTLS,

RTP and SIP. The choice of the DTLS implementation the prototype is based on is straight-

forward, since OpenSSL is the only known implementation supporting this protocol to the

best of our knowledge.

For RTP a choice has to be made since some implementations exist (e.g. ORTP, CCRTP...).

CCRTP provides in comparison to ORTP object oriented C++ code and is therefore better

suited for the change to a different underlying transport protocol. The online documenta-

tion of the ccRTP library is a great helper in understanding the class structure of the library.

This makes the choice of the ccRTP library easy.

To complete the prototype the Twinkle Soft phone client seems the most reasonable choice

as a SIP client using the ccRTP library as RTP stack.

6.4.1 OpenSSL

OpenSSL1 [46] is the de facto standard open source TLS/SSL implementation [2]. It has

proven to be stable and is used by numerous production quality servers such as Apache

Web Server.

OpenSSL implements SSLv2. SSLv3, TLSv1 and DTLSv1. Each of these protocols is im-

plemented by sharing as much code as possible, with virtual functions handling protocol

differences. The library is implemented in C and from the library’s standpoint, DTLS ap-

pears to be another version of the TLS protocol.

1http://www.openssl.org/

46


6.4.2 CCRTP

GNU ccRTP2 is an implementation of RTP, the real-time transport protocol from the IETF

(RFC 3550, RFC 3551, and RFC 3555). The library is implemented in C++ and based on

GNU Common C++3. Therefore it can provide a high performance, flexible and extensible

standards-compliant RTP stack with full RTCP support. It is defined rather as an applica-

tion layer framework than a typical Internet transport protocol such as TCP or UDP.

In the design for ccRTP support for audio and video data is considered. Unicast, multi-

unicast and multicast transport models are supported, as well as multiple active syn-

chronization sources, multiple RTP sessions (SSRC spaces), and multiple RTP applications

(CNAME spaces). This allows its use for building all forms of Internet standards based

audio and video conferencing systems [47].

CcRTP uses packet queue lists for reception and transmission of data packets. The synchro-

nisation of both (outgoing and incoming) media is automatically handled within the packet

queues. There is support for RTCP and other standard and extended features needed

for both compatible and advanced streaming applications. The implementation uses tem-

plates to isolate threading and sockets related dependencies, so that it can be used to im-

plement real time streaming with different threading models and underlying transport

protocols which is an essential feature for this work. At its highest level, ccRTP provides

classes for the real-time transport of data through RTP sessions, as well as the control func-

tions of RTCP. The main concept in the ccRTP implementation of RTP sessions is the use of

packet queues to handle transmission and reception of RTP data packets/application data

units. In ccRTP, a data block is transmitted by putting it into the transmission (outgoing

packets) queue, and received by getting it from the reception (incoming packets) queue.

2http://www.gnu.org/software/ccrtp/3http://www.gnu.org/software/commoncpp/

47


6.4.3 Twinkle Softphone

Twinkle4 [48] is a softphone for VoIP and instant messaging communications using the SIP

protocol which is based on open source and open standards. Twinkle is using the ccRTP

stack qualifying it to be the SIP application in the RTP over DTLS prototype implementa-

tion. As a useful feature the Twinkle softphone implements as well direct IP to IP phone

communication where a SIP proxy is not needed. The SIP invitation will be directly sub-

mitted to the IP address of the callee. This is a useful feature for developent and testing,

since in the testbed the whole SIP architecture with Proxy also does not need to be mapped.

The current version does not provide video calls but this feature is planned for future re-

leases and does not mark a problem at this stage for the RTP over DTLS prototype since

the focus lies primarily on functionality tests for voice calls.

Video calls and securing them will be an interesting topic for future work when it is proven

that RTP over DTLS works unobjectionably.

4http://www.twinklephone.com/

48

7 Design Details

This chapter describes the implementation process, milestones and problems which were

handled along the way. Hereby first the protocol operations are presented and then how

the components in the prototype implementation of the unified media security framework

function together.

The previous chapter provides an analysis serving all necessary information to design suc-

cessfully a solution method. In this chapter the architecture and interfaces of the compo-

nent to develop will be designed and the adaptation to the existing structure and interfaces

projected.

7.1 Design Components: RTP - ccRTP, DTLS - OpenSSL and SIP -

Twinkle

This section decribes the interaction of the components used to design the unified media

security framework. Each library used is decribed with its interaction to other libraries.

7.1.1 OpenSSL

The OpenSSL website provides an online documentation of the application programming

interface (API) to ease the implementation of a secure socket. However although DTLS is

already supported by OpenSSL for more than a year, DTLS is not mentioned at all in the

documentation. Merely TLS is mentioned as an optional protocol version.

49

7 Design Details

7.1.2 Socket Initialisation

According to the documentation at first the library must be initialised, thereby all available

ciphers and digests are registered. Next an SSL_CTX object is created as a framework to

establish SSL based connections. An SSL_method object is then assigned to the context

in order to determine the protocol version used. Various options regarding certificates,

algorithms etc. can be set in this object. After a network connection has been created, it

can be assigned to an SSL object. The SSL object has been created with the SSL_CTX ob-

ject created before. Next the handshake is performed with SSL_accept and SSL_connect.

SSL_write and SSL_read functions are used to read and write data on the connection while

SSL_shutdown is used to shut down the connection.

Additional hints how a DTLS connection can be established are provided through the

demonstration programs s_server and s_client. These all-rounder examples are able to

establish any kinds of SSL connections with their roundabout 3500 lines of code. The code

itself is barely commented and provides only poor information which functions need to be

called to establish a connection. As an example in the s_client.c file a comment in line 735

starts with "This is an ugly hack that does a lot of assumptions [...]"[46] However there is a

huge mailing archive providing a handful of issues about DTLS connections.

7.1.3 Session Initialisation with ccRTP

Upon initialisation of an RTP session an object of the class RTPSession is created. There

are two kinds of constructors. The first one takes two mandatory arguments: local net-

work address and local transport port, which is the place where incoming packets will be

expected. The second constructor is not of interest, since it takes a multicast address as

argument to join a multicast group. By calling the startRunning() method, an RTPSession

object is signalled to start execution of the stack thread.

After these steps, the application can receive data, but will not transmit to any destination.

50

7 Design Details

In order to transmit, the method addDestionation is called along with the internet-address

and port of the host to be transmitted to.

7.1.4 Sending Data

Data packets are sent through the method putData, which takes as first parameter the RTP

timestamp for the data specified as second parameter. By default, the marker bit of the sent

packets is not set. Its value for the next packet (the one that will convey the data provided

in the next call to putData can be set through the setMark method, which takes a Boolean

as argument.

CcRTP also supports fragmenting data blocks into several RTP packets. The setMaxSend-

SegmentSize method can be used to request that no RTP packet be transmitted with a

payload length greater than the value specified through setMaxSendSegmentSize.

When data blocks greater than the maximum segment size are provided through putData,

two or more packet will be inserted in the outgoing packet queue. All these packets but

the last one will have length equal to the maximum segment size, whereas the last one’s

size will be lower or equal to the maximum segment size.

7.1.5 Receiving Data

To receive data from the incoming packet queue the getData method is used. This method

checks with a defined timeout whether data can be read from the socket and in that case

then returns a pointer to an AppDataUnit object as opposed to a pointer to a memory

block. In ccRTP application data units are represented through objects of the AppDataUnit

class, which provides access to the synchronization source of the data and other related

properties. The incoming packet queue takes care of functions such as packet reordering

or filtering out duplicate packets.

51

7 Design Details

7.1.6 Closing Sessions

To close an RTP session simply the RTPSession objects have to be destroyed. The stack will

then transmit a BYE packet, indicating the end of the session, to all destinations when the

destructor of the sessions is called.

7.1.7 Types of Sessions

Upon creation of an RTPSession object, two DualRTPChannel objects are created with Du-

alUDPIPv4Socket. This defines a communication channel for RTP and/or RTCP streams.

In this class a socket is implemented as a pair of UDP IPv4 sockets, allowing both, trans-

mission and reception of packets.

The implementation relies on the Common C++ UDP Socket class and provides a flat in-

terface that includes all the services required by the RTP stack. There are two ways to use

this class, to instantiate the DualSocket template, which will be used to instantiation RTP

stack template or to directly instantiate an RTP stack template. This class offers an example

of the interface that other classes should provide in order to specialise the ccRTP stack for

different underlying protocols.

7.2 SIP Session Initiation with Twinkle

As already mentioned Twinkle can be operated in regular SIP mode using a SIP provider

for discovery of a communication partner by the SIP Address as seen in figure 2.6 on page

25 or in direct mode. In both cases when the callee accepts the call, the RTP media channel

is set up. Twinkle uses a Symmetric RTP session consisting of two Single Thread RTP

sessions, one for data traffic and one for control packets. In order to establish a secure

RTP over DTLS session the modified templates in the ccRTP library are used instead of the

regular ones.

52

7 Design Details

Figure 7.1: Implementation status after phase 1

7.3 Implementation Process

The implementation process is divided into three consecutive steps. The consecutive mo-

del allows changes in the implementation and the marking of milestones to confirm the

success of the achieved progress.

In the first phase a DTLS client server application is implemented in C++ as the bases for

any further development. As a result of the poor documentation provided by OpenSSL this

step marked a much greater challenge than expected in advance. A DTLS Client/Server

example1 written in C was used as guideline in this phase because examples provided by

OpenSSL were not clearly arranged and therefore not usable.

The result of the first step is illustrated in figure 7.1 on page 53 where a DTLS connection

is established between host A and host B. Figure As soon as a secured connection between

two hosts is possible this connection-imitation functionality can be used by the ccRTP stack

replacing the regular underlying UDP sockets with DTLS sockets. Figure

At the end of this stage first transmissions of audio data with test applications should work

and demonstrate the functionality of RTP over DTLS as presented in figure 7.2 on page 54.

Hereby test.au is an audio file in the au file format. The au file format is a simple audio file

1found at http://linux.softpedia.com/get/Security/DTLS-Client-Server-Example-19026.shtml

53

7 Design Details



format introduced by Sun Microsystems2. Further information can be found at [49]. Upon

setting up the RTP connection between the two hosts, the DTLS connection is established

during initialisation of the transport channel, where before the UDP sockets were initiated.

In the last stage all parts of preceding steps have to work perfectly together in order to

function as a secured VoIP call. Figure 7.3 on page 54 illustrates the progress at this stage.

While stage 3 marks the goal of this thesis this is however not the end of the process.

Further implementation work is needed to provide a usable application. These steps will

be presented in the future work section at the end of this thesis.

2http://www.sun.com

54

7 Design Details

Figure 7.4: RTP over DTLS class structure

7.4 Class Structure

This section describes the changes applied in the libraries and hereby presents the new files

inserted in to the class hierarchy.

In the ccRTP library the channel.h file defines the RTPSession types and initialisation of

underlying transport protocols. The regular RTP session inherits from the CommonC++

UDP Socket class the UDP socket and implements the functionality for the RTP stack.

In order to implement RTP over DTLS two template classes were added to the channel.h

file, RTPDTLSServer and RTPDTLSClient. Each of these templates is associated to an inter-

face to the OpenSSL library providing the functionality for connection initiation and cer-

tificate verification. Theses files are placed in the /src/rd directory of ccRTP. These classes

make direct use of the OpenSSL API and socket classes. Figure 7.4 on page 55 presents the

structure of the added components. With this structure any program using RTP is enabled

to initialise RTP over DTLS session instead or as an alternative to regular RTP sessions.

55

7 Design Details

7.5 Problems and Discussion

The solution provided is functional but unfortunately not perfect due to the manner RTP

functions. Upon initiation of an RTP session two sockets are created, one for transmission

and one for reception. Since RTP supports multicast these sockets do not have any infor-

mation about the destination host upon initialisation. RTP uses a setPeer function which is

called periodically to set the destination IP address on the socket. This feature is not com-

patible with a DTLS connection. In order to initialise a DTLS connection the destination IP

address must be known upon initialisation. Therefore the DTLS connection is initialised

upon first call of the setPeer function and not upon call to the constructor of the session

since the destination IP address could not be handed to the constructor without changes

to the RTP-stack implementation. Further calls to the add Destination function therefore

must not cause any action.

56

8 Testing

The prototype implementation of RTP over DTLS is tested in order to confirm the usability

of the approach. There is a wide range for testing the approach, however due to space and

time restrictions in this thesis not all aspects of RTP over DTLS were analysed so far.

8.1 Testing Methodology

Before the results of the tests are presented, an overview about the testing methodology

and testbed will be given. As formulated in the goals of this thesis, for Telephony a packet

loss rate of up to 5% is still acceptable according to the ITU-T. Therefore the implementation

of RTP over DTLS shall provide a packet loss rate lower than 5%. Certainly not more

packets get lost because the underlying transmission is changed to a DTLS connection, but

due to encryption, decryption and increased header size, resulting in a higher bandwidth

needed to achieve the same throughput, data packets might reach the destination too late

to be inserted into the output media stream. Therefore the question to be answered is

whether RTP over DTLS is capable of delivering the audio data within the strict time limits

allowing acceptable voice quality during a call. During the phone call the delays must be

kept within the restrictions for VoIP traffic to allow a fluent conversation. According to

the ITU-T, delays in telephony should not exceed 150 ms in order to provide a satisfying

quality for all users.

57

8 Testing

8.2 Testbed Setup

The testing experiments were run on standard PCs with a Suse Linux Kernel 2.6.18-05 with

following hardware:

• Machine A:

Intel Pentium D processor with 3.06 GHz

512 MB RAM

40 GB of hard disk

1 100MBit Network Interface Card (NIC)

• Machine B:

AMD Duron processor with 800 MHz

612 MB RAM

60 GB hard disk

1 100MBit NIC

The hosts are connected in a 100 Mbit Ethernet Network with a topology presented in

figure 8.1 on page 59.

8.3 Measurement Methods and Tools

To prove the functionality and usability of the RTP over DTLS implementation prototype,

modified versions of demonstration programs provided in the ccRTP library were used.

Timestamp output was added to the applications in order to determine the delay between

transmission and reception of a data packet. Open Office1 Calc, a spreadsheet analysis

1http://www.openoffice.org/

58

8 Testing

Figure 8.1: Testbed for RTP over DTLS tests

program was used to calculate the delay of a data packet as the time difference between

the transmission and reception timestamps. Plots and summaries from the tests were gen-

erated with Gnuplot2 [50] from the report files.

8.4 Results

This section presents the results from the experiments. Aim of the performance test is

to determine the delay caused by the encryption with DTLS for RTP traffic. Tests were

performed with modified versions of the ccRTP demonstration programs audiorx and au-

diotx. These applications initiate RTP sessions and transmit audio data from audiotx to

audiorx where audiorx plays the audio data over the systems audio interface. In the origi-

nal version these applications use the loopback address to simulate RTP traffic on a single

machine. By changing the IP addresses used, these programs are capable of transmitting

data from one host to another.

Audiorx is using a 50 ms jitter buffer to assure a continuous media stream during recep-

tion. The jitter is the variation of packet interarrival time. While the sender is expected to

transmit a packet every 20 ms, these packets can be delayed throughout the network and

2http://gnnuplot.info

59

8 Testing

not arrive at that same regular interval at the receiver side. The difference between when

the packet is expected and when it is actually received is jitter. The jitter buffer conceals

the interarrival packet delay variation. Data packets arriving with a delay greater than 50

ms will not be played; instead the next packet that arrived is played. In VoIP applications

the jitter buffer is flexible in order to adapt to the delay in the current call. In order to anal-

yse the RTP over DTLS performance instead of a regular RTP session, the RTP over DTLS

server and client session objects were initialised in these applications. In order to obtain

comparable results a 62.5 KB audio file was used for transmission to simulate voice data

of a call which has a play time of 7 seconds. Thereby 399 data packets of audio data were

transmitted.

Taking account of possible measurement inaccuracy and errors due to the experimental

environment, all tests were done repeatedly to verify the results.

60

8 Testing

0

20000

40000

60000

80000

100000

0 50 100 150 200 250 300 350

Del

ay in

mic

rose

cond

s

Packet No.

Transmission of Audio Data

RTP Packet Delay

Figure 8.2: Delay for normal RTP packets

8.5 Standard RTP Packet Delay

The first performance test examines the packet delay of regular RTP packets in order to

have a reference value for comparison. Figure 8.2 represents a typical output for this test.

The average delay of an RTP packet was measured with 13 ms. however some packets

arrived significantly later; the maximum delay was measured with 51 ms while the min-

imum delay was only 4ms. The standard deviation of the delay during the experiment

was calculated at 6.1 ms. During this experiment one data packet did not arrive within

the preset time limit, therefore the packet loss rate is 0.25%. In the graph the lost packet is

marked by the high peak (51ms delay) shortly before the 200th packet. As the audio file is

played during reception the subjective impression of the result can be expressed as well.

61

8 Testing

The sound file was played continuously without any disturbance as clear as it would be

played locally.

62

8 Testing

0

20000

40000

60000

80000

100000

0 50 100 150 200 250 300 350

Del

ay in

mic

rose

cond

s

Packet No.

Transmission of Encrypted Audio Data

RTP over DTLS Packet Delay

Figure 8.3: Delay for RTP over DTLS packets

8.6 RTP over DTLS Packet Delay

The second experiment determines the delay of RTP traffic over DTLS. For this experiment

the same demo applications were used as in the preceding one with the change that now

RTP over DTLS sessions are initiated by the programs. Repeated tests showed similar

results as in figure 8.3. The average delay of an encrypted RTP packet was measured

with 34 ms. however some packets arrived significantly later; the maximum delay was

measured with 92 ms while the minimum delay was only 9ms. The standard deviation of

the delay during the experiment was calculated at 7.7 ms. During this experiment one data

packet did not arrive within the preset time limit, therefore the packet loss rate is 0.25%. In

the graph the lost packet is marked by the high peak (92ms delay) at the beginning. The

63

8 Testing

sound file was played continuously without any disturbance as clear as it would be played

locally.

8.7 CPU Usage

In a separate experiment the average CPU load was measured. Since a machine has to

handle both, transmission and reception, when performing a VoIP call, this experiment was

carried through on a single machine. The 3.06 GHz machine was used in this experiment.

Audiotx and Audiorx were initialising the DTLS connection and transmitting an audio file

with a length of 1:25 minutes. Repeated test showed that RTP has an average CPU load of

1.45 % while RTP over DTLS has an average CPU load of 3.4 %.

The significant increase is caused by the handshake and the encryption/decrytpion during

the session. For normal PCs used for VoIP this increase does not mark a problem, but

this could be a problem for today’s generation of handheld devices. Therefore further

investigation is neccessary to analyse the impact of increased CPU load upon different

terminal devices, like cell-phones.

8.8 Test Summary

The test results of the two experiments show that RTP over DTLS works in an acceptable

manner. The delay of encrypted RTP packets was expected to be higher than the delay

of unencrypted packets, due to encryption/decryption operations and extended packet

overhead by DTLS. The question to be answered was whether the delay of encrypted RTP

packets meets the requirements for VoIP traffic.

In the experiments RTP packet delay was measured at an average rate of approximately

13 ms, a maximum of 50 ms, a minimum of 4ms and a standard deviation of 6.1 ms. The

delay of encrypted RTP packets was measured at an average rate of approximately 34 ms

64

8 Testing

with a maximum of 92 ms, a minimum of 9 ms and a standard deviation of 7.7 ms. The

important values in the results of these experiments are the average delay, the packet loss

rate and the standard deviation. The average delay is increased by approximately 20 ms

when DTLS encryption is used.

According to the ITU-T a delay of 125 ms is noticeable by humans, therefore they recom-

mend that delays should not exceed 150 ms. A delay from 200 to 280 ms still satisfies most

of the users, while delays higher than 300 ms dissatisfy some users and a delay higher than

400 ms is unacceptable because most users are dissatisfied [51]. Most of the delay in real

scenarios is caused by the network infrastructure. For a distance of less than 5000 km VoIP

connections are likely to experience a delay smaller than 150 ms. For intercontinental con-

nections delays in the mid-200 ms range can be expected, which does not mark a problem

according to the ITU-T because users expect differences to regional calls.

Compared directly, the RTP over DTLS delay average has more than twice the length of

regular RTP delays, but the delays should be set in relation to ITU-T restrictions. Thus an

average delay increase of 20 ms marks an increase of about 13% to the recommendation

of a 150 ms delay. The small increase (1.6 ms) in the standard deviation is a good result

as well. This means that the jitter buffer does not need to be increased by a relevant size.

Therefore RTP over DTLS is well suited for encryption of life media as in VoIP.

65

9 Conclusion and Future Work

9.1 Conclusion

The growing acceptance among users of VoIP Telephony brings as well new challenges

in terms of security issues. VoIP calls are threatened by various known attacks since the

data is transported over insecure networks. The security considerations section pointed

out these attacks along with security goals to achieve. Furthermore new attacks (e.g. SPIT)

which are enabled through the extended capabilities and new services introduced by VoIP

may threaten the widespread use of this technology in the future. Therefore security is a

major concern in the further development of VoIP services. So far no solution to secure

VoIP calls in an acceptable manner is widely used. The approach of RTP over DTLS has

the potential to overcome the shortcomings of other approaches and take part in future

developments of a security framework for VoIP. DTLS provides

Authentication This allows both participants of the call to verify the identity of the other

party.

Confidentiality This ensures that the VoIP call can not be eavesdropped or understood by

a third party.

Integrity This allows VoIP applications to detect if data was modified during transmis-

sion.

66


Unfortunately DTLS cannot solve all issues in securing Internet Telephony. Denial of Ser-

vice attacks against the SIP infrastructure cannot be secured by DTLS, since RTP over DTLS

is initiated after the SIP interaction takes place to initiate the session. The approach also

does not address the issue of SPIT for the same reason, the authentication can help to solve

the issue since SPIT calls could be traced back to the users, but this would only possible

when all users use the DTLS authentication. This is however not possible yet, since reach-

ability through traditional phones is still desired.

In this thesis a prototype of RTP over DTLS was implemented and tested in order to prove

the usability of the approach.

The upcoming sections summarise and evaluate the test results of the prototype. Further-

more an outlook is given to future work which will be necessary to goal of the development

of a unified media security framework for VoIP.

The datagram capable version of TLS was designed in order to secure media streaming

without compromising the quality of the media streamed or the widely accepted security

features of TLS. The test results show the good performance of the prototype implemen-

tation of RTP over DTLS in comparison to unencrypted RTP. The increase in the delay of

approximately 20 ms is in an acceptable range in order to allow secure communication

without impact on the quality of the VoIP call. These results allow planning of future steps

that need to be done on the way to a unified media security framework for VoIP which are

presented in the upcoming section.

9.2 Future Work and Open Issues

The prototype implementation of RTP over DTLS is capable of connection establishment

and data transmission. The performance of the DTLS and RTP components was tested

with acceptable results, but not in detail. There is however the potential for improvement

of performance. There might be some optimisation possible in the connection establish-

67


ment, since this part was developed with almost no documentation from the developers

of DTLS in OpenSSL. The next suggested step in further development includes improvisa-

tion at DTLS level. H. Tschofenig an E. Rescorla introduced the SRTP compatibility mode

[9]. With the thereby presented enhancements to RTP over DTLS the performance can be

increased since overhead is reduced to a value comparable to ZRTP. In the following the

performance of SRTP compatibility mode of RTP over DTLS can be compared with exper-

iments to ZRTP in order to evaluate the approach.

The integration to the Twinkle softphone is as well not completely finished. This thesis

focuses on taking part in the development of a unified security framework regarding all

components in the system. Due to time restrictions the focus lies on the interaction of RTP

and DTLS components to provide a basis for further development. A concept of user in-

teraction in connection with the encryption scheme needs to be designed and integrated

to the softphone and SIP. Hereby the challenge lies in the combination of understanding

what is happening and ease of use in order to achieve acceptance among users. Thereby

the management of certificates needs to be integrated to the softphone along with notifica-

tion about the state of security and proper error handling upon possible DTLS handshake

failure and user notification about the security state of the connection. Furthermore at the

SIP (and SDP) side of the framework RTP over DTLS needs to be integrated to the session

invitation, so that the caller is able to inform the callee about the wish to establish an RTP

over DTLS session when connections are initiated over the SIP network.

68

Bibliography

[1] Christian Friedrich. Schematic representation of the ssl handshake protocol with

two way authentication with certificates— Wikipedia, the free encyclopedia, 2007.

[Online; accessed August 2007].

[2] N. Modadugu and E. Rescorla. The Design and Implementation of Datagram TLS,

2004.

[3] J. Postel. Internet Protocol. RFC 791 (Standard), 1981. Updated by RFC 1349.

[4] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.1.

RFC 4346 (Proposed Standard), 2006. Updated by RFCs 4366, 4680, 4681.

[5] J. Postel. Transmission Control Protocol. RFC 793 (Standard), 1981. Updated by RFC

3168.

[6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol

for Real-Time Applications. RFC 3550 (Standard), 2003.

[7] J. Postel. User Datagram Protocol. RFC 768 (Standard), 1980.

[8] E. Rescorla and N. Modadugu. Datagram Transport Layer Security. RFC 4347 (Pro-

posed Standard), 2006.

[9] E. Rescorla H. Tschofenig. Real Time Transport Protocol (RTP) over Datagram Trans-

port Layer Security. Internet Draft, February 2006.

69

Bibliography

[10] Wikipedia. Voice over ip — Wikipedia, the free encyclopedia, 2007. [Online; ac-

cessed 22-April-2007].

[11] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks,

M. Handley, and E. Schooler. SIP: Session Initiation Protocol. RFC 3261 (Proposed

Standard), 2002. Updated by RFCs 3265, 3853, 4320, 4916.

[12] H. Schulzrinne and C. Agboh. Session Initiation Protocol (SIP)-H.323 Interworking

Requirements. RFC 4123 (Informational), 2005.

[13] T. Berson. Skype security evaluation, October 2005.

[14] R. Hancock, G. Karagiannis, J. Loughney, and S. Van den Bosch. Next Steps in Sig-

naling (NSIS): Framework. RFC 4080 (Informational), 2005.

[15] E. Rescorla. HTTP Over TLS. RFC 2818 (Informational), May 2000.

[16] Shamir A. Rivest, R. and L.M. Adleman. Cryptographic communications system and

method. US Patent 4405829, 1977.

[17] P. Karn and W. Simpson. Photuris: Session-Key Management Protocol. RFC 2522

(Experimental), 1999.

[18] D. Brezinski and T. Killalea. Guidelines for Evidence Collection and Archiving. RFC

3227 (Best Current Practice), 2002.

[19] H. Schulzrinne. The tel URI for Telephone Numbers. RFC 3966 (Proposed Standard),

2004.

[20] Voice over misconfigured internet telephones - (vomit). http://vomit.xtdnet.nl/.

[21] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The Secure Real-

time Transport Protocol (SRTP). RFC 3711 (Proposed Standard), 2004.

70

Bibliography

[22] Ed.Avaya J. Callas P. Zimmerman, A. Johnston. ZRTP: Media Path Key Agreement

for Secure RTP. Internet Draft, 2007.

[23] Jörg Schwenk André Adelsbach, Mark Manulis. Voipsec Studie. Technical report,

Bundesamt für Sicherheit in der Informationstechnik.

[24] S. Kent and K. Seo. Security Architecture for the Internet Protocol. RFC 4301 (Pro-

posed Standard), 2005.

[25] S. Kent. IP Encapsulating Security Payload (ESP). RFC 4303 (Proposed Standard),

2005.

[26] C. Kaufman. Internet Key Exchange (IKEv2) Protocol. RFC 4306 (Proposed Stan-

dard), 2005.

[27] E. Rescorla. Diffie-Hellman Key Agreement Method. RFC 2631 (Proposed Standard),

1999.

[28] D. Maughan, M. Schertler, M. Schneider, and J. Turner. Internet Security Association

and Key Management Protocol (ISAKMP). RFC 2408 (Proposed Standard), 1998.

Obsoleted by RFC 4306.

[29] D. Piper. The Internet IP Security Domain of Interpretation for ISAKMP. RFC 2407

(Proposed Standard), 1998. Obsoleted by RFC 4306.

[30] H. Orman. The OAKLEY Key Determination Protocol. RFC 2412 (Informational),

1998.

[31] H. Krawczyk. Skeme: A versatile secure key exchange mechanism for internet. In

Proceedings of the 1996 Symposium on Network and Distributed System Security (SNDSS

’96), 1996.

71

Bibliography

[32] Wikipedia. Ipsec — Wikipedia, the free encyclopedia, 2007. [Online; accessed June

2007].

[33] S. Kent. IP Authentication Header. RFC 4302 (Proposed Standard), 2005.

[34] Whitfield Diffie, Paul C. van Oorschot, and Michael J. Wiener. Authentication and

authenticated key exchanges. Designs, Codes and Cryptography, 2(2):102–125, 1992.

[35] S. Casner and V. Jacobson. Compressing IP/UDP/RTP Headers for Low-Speed Se-

rial Links. RFC 2508 (Proposed Standard), 1999.

[36] C. Bormann, C. Burmeister, M. Degermark, H. Fukushima, H. Hannu, L-E. Jons-

son, R. Hakenberg, T. Koren, K. Le, Z. Liu, A. Martensson, A. Miyazaki, K. Svanbro,

T. Wiebke, T. Yoshimura, and H. Zheng. RObust Header Compression (ROHC):

Framework and four profiles: RTP, UDP, ESP, and uncompressed. RFC 3095 (Pro-

posed Standard), 2001. Updated by RFCs 3759, 4815.

[37] T. Koren, S. Casner, J. Geevarghese, B. Thompson, and P. Ruddy. Enhanced Com-

pressed RTP (CRTP) for Links with High Delay, Packet Loss and Reordering. RFC

3545 (Proposed Standard), 2003.

[38] D. Ignjatic, L. Dondeti, F. Audet, and P. Lin. MIKEY-RSA-R: An Additional Mode

of Key Distribution in Multimedia Internet KEYing (MIKEY). RFC 4738 (Proposed

Standard), November 2006.

[39] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES—The Advanced Encryp-

tion Standard. Springer-Verlag, 2002.

[40] ISO/IEC. Information technology – security techniques – code of practice for infor-

mation security management, June 2005.

72

Bibliography

[41] E. Rescorla N. Modadugu. Extensions for dtls in low bandwidt environments. draft-

rescorla-tls-partial-00, October 2005.

[42] E. Rescorla. Tls partial encryption mode. draft-rescorla-tls-partial-00, October 2005.

[43] Certicom T. Kause A. Kapoor, R. Tschalar. Internet x.509 public key infrastructure –

transport protocols for cmp. Internet-Draft, feb 2004. http://tools.ietf.org/id/draft-

ietf-pkix-cmp-transport-protocols-05.txt.

[44] H. Tschofenig J. Fischl. Session initiation protocol (sip) for media over transport

layer security (tls), February 2006.

[45] H. Tschofenig J. Fischl. Session description protocol (sdp) indicators for datagram

transport layer security (dtls). draft-fischl-mmusic-sdp-dtls-00, February 2006.

[46] The openssl project. http://www.openssl.org.

[47] The gnu ccrtp library. http://www.gnu.org/software/ccrtp/.

[48] The twinkle softphone project. http://www.twinklephone.com/.

[49] Header file for the au-file format.

http://www.opengroup.org/public/pubs/external/auformat.html.

[50] Gnuplot. http://www.gnuplot.info/.

[51] International Telecommunication Union. Recomendation G.114 - One-way Trans-

mission Time. Series G: Transmission Systems and Media, Digital Systems and Net-

works, May 2003.

73

Rtp Over Tls

Documents

Transcript of Rtp Over Tls