Steg Final

download Steg Final

of 29

Transcript of Steg Final

  • 8/9/2019 Steg Final

    1/28

    Randal Calcote

    Ty Welborn

    MUSI 1335: Commercial Music Software

    15 January, 2010

    Models of Steganography for Digital Audio

    Thesis:

    Digital audio provides several robust media for embedding steganographic messages by

    working with a variety of composition and recording techniques. The techniques of information

    hiding are simple by comparison to mathematically complex methods currently in use

    Introduction

    Steganography is the process of hiding a specific message within a larger body of

    information. Historically, it dates back to the Greco-Persian wars (490-487 B.C.)

    Heroditus, (486-425 B.C.,) cites examples of transmitting secret tactical information using

    wooden tablets and the human body as media. The Chinese and Europeans have also

    developed systems of hiding information. In modern times, steganography has become

    more important in the fields of privacy and financial transaction security. Increased

    government restriction of cryptography and industry driven initiatives to protect

    commercial and intellectual property have resulted in an ever increasing interest in

    steganographic applications.

  • 8/9/2019 Steg Final

    2/28

    The distinction between cryptography and steganography is necessarily soft.

    Cryptography is a process of encoding a secret message based on a predetermined

    algorithm. Instead of a specific algorithm, pure steganography uses random processes for

    distributing information within the context of a larger, less specific message. In practice,

    both methods are used in tandem to provide greater security than either one alone. For

    transmissions such as information and finance to be effective, it is necessary to validate

    and maintain both the identity and integrity of both the sender and receiver. All digital

    transactions use encrypted passwords for this purpose. Intellectual and commercial rights

    are usually established by public, digital signatures known as watermarks. These marks

    vary from complex serial numbers to embedded logos in the software and digital media.

    They may be visible or hidden, and they must withstand attempted removal procedures,

    digitally or otherwise. These marks identify the owner or creator of film, music and

    software. Similar to watermarking, cattle branding has been practiced from the 13th

    century to the present to identify herd ownership at market. Watermarks were also added

    to paper that would be used to print money in an attempt to recognize forgeries. Now,

    watermarks are an integral part of most products transmitted digitally. They are usually

    visible for graphics and invisible for data or sound files (Petitcolas 7).

    Cryptographic and steganographic models evolved from the wax writing tablets of

    the Greeks to the digital watermarks and copyright symbols of todays digital rights

    management. Extensive material has already been published on the relevant mathematics

    of cryptography based on large prime numbers and steganographic computer software for

    distributing information within digital media. This paper will therefore, only briefly refer to

  • 8/9/2019 Steg Final

    3/28

    relevant algorithms and texts. The systems will be presented summarily to demonstrate

    some general principles of embedding covert information within several different media

    formats.

    Audio and sound are similar, but they are not exactly the same. Sound is produced

    when a physical action creates an oscillating field of pressure in the air between the

    occurrence of the event and the human ear. The ear converts these vibrations to a set of

    electrical impulses which the human brain interprets as sound. The event is mechanical,

    but the human experience of it is subjective. Audio is a representation of a sonic even

    which is transmitted via electronic channels. It is then recorded for later use or broadcast

    to a mass audience. Audio information, either analog or digital, provides a robust set of

    media for embedding encrypted, or steganographic objects.

    1. Practical History

    Steganography is ubiquitous in modern society. All commercially recorded music

    and video products carry digital watermarks to identify the owner of a digital copyright.

    Music, video and software products also carry encrypted serial numbers to aid in tracking

    unauthorized copies. MP3 files are a compressed audio file format developed by the

    Motion Picture Experts Group for portability across the internet. Each file has ID 3v1 and

    ID 3v2 tags embedded in the computer code that becomes the sound you hear when an

    MP3 is played. The ID tags contain information about the music, its owner and point of

    origin. Though they are hidden in plain sight,

  • 8/9/2019 Steg Final

    4/28

    they can be viewed and edited by most mp3 players. However, the majority of users are

    completely unaware of the existence of these tags (Chaum 85).

    These electronic signatures are not only present in music and video files, but are

    also included in electronic devices intended for reproduction and transmission across

    extended computer and entertainment networks. For example, every printer carries a

    unique identifier which it embeds in the printed output of every page (Brassil 1278-1287).

    When anyone logs on to any computer or ATM, they do so using encrypted passwords to

    allow access to that particular communications channel. These codes and watermarks

    represent the most common, active elements in steganographic useage. The covert

    channel is any communication path not originally designed to transfer information, but,

    rather to validate the origin of materials. In computer systems, these channels are used to

    return information to their owner while performing a service for another user or program,

    such as trojans, ad-bots and spyware. They use the same structure as legitimate

    programs made to validate identification and distribute information (Lampson 615).

    Passwords and PINs, although not purely steganographic objects, are mentioned here

    because cryptography is currently an important aspect of transmitting hidden information.

    They are employed in an attempt to prevent malicious abuse of information systems.

    Thus, cryptographic and steganographic systems used in tandem provide theoretically

    more secure channels for transmitting covert or private information (Petitcolas 11).

    These processes have been important for both commerce and war, and have

    evolved throughout history. Herodtus cited several examples of steganography in his

    Histories. In 440 B.C., Histius was the military leader of Miletus under Darius I, King of

  • 8/9/2019 Steg Final

    5/28

    Persia. While living in Susa at the command of Darius, although loyal to the Persians, he

    was unhappy with his condition and wanted to return to his home. He shaved the head of

    a trusted slave, and tattooed a message on to his skin. After the slaves hair grew back, it

    hid the message. This message was relayed in order to instigate a revolt against the

    Persians, so that conditions at home would require his return to oversee the conflict

    [Heroditus 84-87).

    Another example cited in the Histories is that of Demerits, the King of Sparta from

    515-491 B.C. Cleomenes, a rival for the throne, bribed the Delphic Oracle to denounce him

    as an illegitimate King. After being deposed, he was forced to flee to the Persian court.

    Upon learning of the Persian invasion plans of Xerxes in 480 B.C., he devised a plan to

    warn Sparta of a coming invasion. After removing the wax from a writing tablet, he

    scratched a message on the bare wood and applied a fresh coat of wax to the tablet to

    cover the message. This tactic worked so well that the Spartans thought it was a fresh,

    new writing tablet. In fact, the Spartan king Leonidas almost did not find the message in

    time to contrive an adequate defense at the Battle of Thermopylae (Heroditus 87-89).

    One method of steganography attributed to Julius Caesar is known as the shift

    ciphe. It is simple to construct, and not very secure under scrutiny. In practice, each

    letter in a plaintext message is replaced by a letter some fixed number of positions down

    the alphabet. For example, with a shift of 3, a would be replaced by d, b would be

    replaced by an e, and so on with x, y and z being replaced with a, b and c to complete

    the cipher (Wilkipedia.org).

  • 8/9/2019 Steg Final

    6/28

    Thus a message like hello world would be replaced by khoor zruog.

    However, this message would be easily deciphered, even if the space between words were

    omitted, khoorzruog. Only a slight improvement of the message security is achieved by

    reversing the spelling of the whole message, gourzroohk, or even by reversing the

    spelling and order, roohkgourz ollehdlrow.

    What this means is that simple transposition of letters is basically insecure.

    Randomizing letters and ciphers brings an increasing complexity to the job of deciphering

    stegotext. In ancient China, masks were made with holes in them to indicate the position

    of hidden letters embedded within a larger text. The mask was used as an overlay to

    decipher the secret held within the larger text(Katzenbeisser 21).

  • 8/9/2019 Steg Final

    7/28

    This process was reinvented by the 16th century Italian mathematician, Giralamo

    Cardano in 1550. He was known to the French as Jerom Cardan. Hence, the grille cipher

    is named the Cardan Grille in homage to Cardano. He proposed a method of hiding a

    message in plain sight via use of a parchment cipher.

    Holes were punched in the parchment at random intervals, and a message was

    written inside the holes.

    Then, the parchment was removed and the message was surrounded with letters

    and numbers.

    Cardanos proposal was primarily used as a literary game, quite common among

    European aristocracy of the time. Decryption required the original, or an identical

  • 8/9/2019 Steg Final

    8/28

    parchment cipher. This made for relatively good security provided the original author

    composed a reasonable cover message around the intended stegotext. Although well

    received by European nobles, the grille cipher served little use other than as a source of

    amusement. The two main weaknesses of the process were the necessity of composing a

    suitable cover, and the incriminating nature of possessing a grille cipher if apprehended by

    an enemy (Wilkipedia.org).

    Playing games with words held a place of esteem among European literata as early

    as the Italian poet Boccaccio, (1313-75). Boccaccio wrote the worlds longest acrostic in

    the form of a set of sonnets, the Amorosa Visione which used the first letter of every line

    of the 1,500 word poem to pay homage to a certain, noble lady who would be forever

    beyond his means. (Wilkins, E.H. 105-106)The acrostic is a literary form that takes the

    first letter of every syllable, word or line to construct a new word. It would be difficult to

    understand the decrypted message which would result from this poem in Italian.

    However, a variant example in English will illustrate the concept. Here is an excerpt of text

    from an email which I recently sent to a friend:

    you should be glad you are notHere.my days begin very Early.a day in college is very Long.my classes always end Late.i am always glad when they are Over.no one fails if they Work.this is not really Obvious.

    it does not seem Right.no one gets to do what they Like.our work is never Done.

    In this example, the first letter of the last word in each line was chosen as a cipher,

    and a text message was constructed around it. A cipher is a set of rules to follow in order

  • 8/9/2019 Steg Final

    9/28

    to understand a secret communications. The Caesar Shift and Cardan Grille are both

    examples of ciphers. Obviously, the letters were not bold, underlined or upper-case in the

    original message. The message and cover text were both contrived to illustrate a basic

    principle of steganography, that the most secure transmission should not only be hidden

    to all but the intended receiver, but not even be discernable to casual observation.

    The word steganography is derived from Greek words meaning covered and

    writing. Various tactics have been used, such as highlighting specific letters with invisible

    ink or changing the stroke length of important letters in a hidden message. Printing in the

    late 16th and early 17th centuries provided a medium for embedding messages by

    concealing, instead of encrypting information (Brassil 88). As seen in the image below, the

    inaccuracies inherent in the printing process left random spaces and spurious information

    characteristics that made casual detection difficult. J. Wilkins published a pamphlet in

    1694, with an excruciatingly long title, wherein he describes how letters could be accented

    by long strokes, errors, fonts and stylistic features at random points within the text in

    order to, . . . . send swift and secret communications with a friend in privacy . . . Wilkins

    would poke small holes above significant letters to distinguish them in the cover text

    (Wilkins, J. 88-96).

  • 8/9/2019 Steg Final

    10/28

    The German scientist, Gaspar Schott (1608 1666) made the first drawings of

    universal joints, air pumps and other devices long before they were actually invented

    somewhere else. Like many scholars of the time, he also studied music, and described a

    steganographic model which substituted music notes for letters as a means of encoding

    and decoding messages. The assignments were, and still may be random, or specific to a

    given message. The outcome was rarely musical. However, it did serve the purpose of

    hiding information from non-musicians (Petitcolas 13).The following illustration shows

    both an encoded message, and the cipher which relates letters to specific notes. We will

    return to this concept later when we examine the potential uses of steganography in

    conjunction with MIDI.

  • 8/9/2019 Steg Final

    11/28

    As time passed, cultures, and their methods of hiding information became

    increasingly sophisticated. The French photographer Ren Dagron, was granted the first

    microfilm patent in 1859. During the siege of Paris 1870-71 by the Prussian army, Dagron

    sent carrier pigeons with messages on microdots across German lines. This was the first

    military application of microfilm. Once Dagron achieved a photographic reduction of more

    than 40 diameters, the microfilms produced weighed approximately 0.05 grams each, and

    a pigeon could carry up to 20 at a time (Wilkipedia.org).

    In 1940, actress Hedy Lamarr met her fourth husband, George Antheil, an avante

    garde composer and author. In 1942, they received a patent for a secure communications

    system that provided radio guidance for torpedoes. Remote control guidance of torpedoes

    was first proposed in 1906 by Wilhelm von Siemens. Lamarr gained her knowledge of

    guidance systems from her first husband, Fritz Mandl, an Austrian arms dealer. What

    Lamarr added to the new system was the idea of frequency hopping, or spread spectrum

    transmission, which is still used extensively in military communications. It involves

    transmitting a signal over a seemingly random set of radio frequencies, switching between

    them at split-second intervals. A radio receiver synchronized to the same switching pattern

  • 8/9/2019 Steg Final

    12/28

    will receive the full transmission, but any radio that is not in sync will not be able to

    decode a complete message. Such radio receivers will only detect small portions of the

    broadcast, and instead, will only be able to intercept what appear to be static blips, thus

    hiding the message from all but the intended recipients.

    Antheil contributed ideas from his composition and performance experience. He

    based his part of the design on a mechanism similar to the one used in his Ballet

    Mcanique. The Ballet was first performed in Paris in June of 1926. The Ballet, which was

    composed for various mechanical instruments, featured a player piano, electric bells, air

    plane propellers and sirens, with all the devices on stage controlled mechanically by rolls

    of paper tape punched with holes, similar to the ones used to control a player piano. In

    the original patent, Antheil incorporated this concept as a means to control the rapid

    switching of the both the transmitters and receivers used to relay messages via what

    became known as spread spectrum technology.

    The design was not received well by the US Navy. The patent described the

    mechanism as . . . being similar to that of a player piano . . . The US Navy, in

    considering the patent submission as a practical solution to guiding torpedoes,

    disregarded the entire proposal based on their considered opinion that it would not be

    feasible to, . . . fit a player piano inside a torpedo. Antheil responded, unsuccessfully,

    that the device could be manufactured to be about the size of a watch. Consequently, the

    design was not used until 1962 during the Cuban missile crisis. By this time, the original

    patent had expired, but researchers at Sylvania repeatedly cited the patent as the original

    source for developing the idea to a useful stage. The lesson to be learned from this is that

  • 8/9/2019 Steg Final

    13/28

    times of military crisis are not very good times to introduce newly developing technologies,

    regardless of their potential. This case also demonstrates that new ideas can come from

    unlikely sources that do not rely on established methods (Braun 11-15).

    Mathematicians from Cardan to the present have speculated on developing a

    method for encrypting messages that will be secure against detection, and that will be

    statistically sound under mathematical scrutiny. A model for achieving this goal was

    developed by Merrill Flood and Melvin Dresther while they were working at the RAND

    Corporation in 1950. This has become a classic, textbook model of invisible

    communications.

    Alice and Bob are prisoners in separate cells and they want to plan an escape. To do

    so, they must communicate in secret. The prison warden, Wendy, arbitrates all aspects of

    their daily lives, including communications. As an opponent of their plan, she may be

    either a passive, active or malicious agent. If passive, she will allow communications to

    pass between them. If active, she may either block or alter their messages. But, if she is

    malicious, she can send fake messages to either or both parties, or put them both in

    solitary confinement, thus preventing any chance of an escape. If they send messages

    with content that is scrambled, they are using encryption. If they use steganographic

    techniques, however, they will be sending messages that attempt to conceal the fact that

    there is any covert communication. Under this system, they can openly send messages

    along unclassified channels, which contain confidential information (Simmons 51-67).

    To achieve this goal, they both select a pair of random numbers, which they will use

    to encrypt and decrypt their information. The reciprocal pair of equations used to derive

  • 8/9/2019 Steg Final

    14/28

    and verify results during the communication process also allows them to verify the identity

    of the sender and the validity of the message content. After choosing their numbers, they

    will each send their first number in the pair to the other one. They will keep their

    respective second numbers secret. These will be used in verifying that the sender has used

    the number that was originally exchanged to encrypt all of their communications.

    Modular arithmetic plays an important role in deriving the random numbers used to

    encrypt, or hide and decrypt, or observe sent messages. The expression used in this

    process is stated as C = M K(mod n). M and n, like their secret prime numbers, are assigned

    specific values, and are placed within the context of the following equations to assure

    secure transmissions. Let M = 7, n = 13, a = 5 and b = 8. 5 and 8 are Alices and Bobs

    secret numbers.

    By choosing these values, Alice and Bob can now establish encryption keys with the

    following calculations: A = M a (mod n), which she sends to Bob. She also receives B from

    Bob, which he must calculate as B = Mb

    (mod n), and send. She will compute her

    decryption key by calculating K = B a (mod n), and Bob will also receive his key with the

    equation K = A b (mod n). Therefore,

    A = M a (mod n)A = 7 5 (mod thirteen)A = 16,807 (mod thirteen)A = 11

    B = M b (mod n)B = 3 5 (mod thirteen)B = 243 (mod thirteen)B = 9

  • 8/9/2019 Steg Final

    15/28

    Since every natural number is equivalent to the remainder obtained by dividing X

    by n, and this number is called the residue of a (mod n), the residue obtained will become

    the encryption key in this manner:

    7 5 = 16,807,16,807 / 13 = 1,292.8461541,292.846154 1.292.000000 = 0.8461540.846154 x 13 = 1116,807ten = 11thirteen

    When Alice wants to send Bob a message, she will create harmless content, known as a

    cover object that will include A=11. Bob will use A to interpret the order of encrypted

    letters as being every 11th character is significant to the encoded text of the message.

    Similarly, Bob will use 3 5 = 243,

    243 / 13 = 18.69230718.692307 18.000000 = 0.6923070.846154 x 13 = 9243ten = 9thirteen

    Bob will send his message to Alice with the key of B = 9. Now, each of them can check the

    results by comparison with their respective, secret values for a and b as follows:

    B a = (M b) a B = M b= M ba (am) n = a m n - Rule of exponents= M ab a b = b a commutative property= (M a)b (am) n = a m n - Rule of exponents= A b A = M a

    So, Alice and Bob can now, theoretically, exchange secret information over an insecure

    channel, hoping that Wendy will not notice the message within their cover objects (Miller,

    Herren, Hornsby, et al. 240-257). No third party observer should be able to distinguish

    whether the sender is passively sending an empty cover object or an active message.

  • 8/9/2019 Steg Final

    16/28

    The security of invisible communication depends entirely on the inability to distinguish

    between a cover object and a secret transmission. Modifications should not be visible to

    anyone but those involved in the communication process. For security purposes, the same

    cover object should not be used more than once, as this would provide a framework for

    deciphering future communications. Both the sender and receiver should destroy all sent

    and received cover objects. No potential opponent should have access to the cover object

    before the time of transmission. And finally, the cover object must contain a sufficient

    amount of redundant data or space to conceal all of the secret information. Most

    encryption and steganography software exploits the LSB, or least significant bit portions of

    a binary file. This requires a short discussion on the topic of how computers speak

    (Katzenbeisser 32).

    2. Digital Media

    A cover object can be any data, image or sound file. At the fundamental level, any

    digital file consists of a series of 1s and 0s. Each 1 or 0 is called a bit, and a group of bits

    placed in a sequence is called a byte. Bits are grouped in to 8s, 16s, 32s and 64s.

    Computers are actually complex systems of electrical switches that are either turned on,

    (1) or off, (0.) When electricity flows through a switch, current travels to a specific device

    that performs its predefined function.

    Each device in a computer system accesses a stored table of binary values which

    correspond to a list of specific conditions for performing a task. For example, if you press

    the a on a computer keyboard, the keyboard and computer are wired in such a way that

  • 8/9/2019 Steg Final

    17/28

    a signal is sent through an electronic network, where it is then compared to the binary

    value 01100001 from a standard, shared list of 8 bit combinations. 01100001 is the 97 th

    entry in the ASCII standard table of binary communication codes. If there is a positive

    match, then the human symbol a is first stored at a location in memory, and then sent

    through a computer program to either be displayed on a monitor screen, printed by a

    printer or stored for later recall in a collection of bytes that will be some type of text file.

    If, however, the upper case A is sent by using the shift and a key at the same time, the

    computer system uses 01000001 to transmit instructions between components as the

    human symbol A. The following chart lists the 52 upper and lower case letters as

    represented in the ASCII.

    This system of substitution, called the ASCII code, was developed by the x3.2

    Committee of the American Standards Association from 1960 to 1986. ASCII stands for

    American Standard Code for Information Interchange, and was developed for Bell Labs as

  • 8/9/2019 Steg Final

    18/28

    a method of transmitting telegraphic code. 128 binary codes for printed characters and

    control characters are used to transmit text across the internet and other electronic

    communications media (Wilkipedia.org).

    In a text file, any letter, number, punctuation mark or keyboard command can be

    expressed as a byte, a set of 1s and 0s. To say cat, typing the ASCII characters C

    (#67=01000011,) A (#65=01000001) and T (#84=01010100) will print out the word CAT

    to a computer screen, a file or a printer(Huber and Runstein 216-219). Each byte has

    eight bits. The one at the left of the chain represents the largest numerical value in binary

    and is called the most significant bit, or MSB. The number on the far right represents the

    smallest binary value and is called the least significant bit, or LSB. Most computer

    documents, text files, emails, and even pictures are relatively small, requiring only a few

    thousand bytes for a complete representation of their data. By comparison, audio files are

    large, with sizes going up in to millions of bytes, and are therefore more complex simply by

    virtue of their size. The assembled bytes represent information about frequency (musical

    pitch), amplitude (loudness) and elements of time, or the duration of a pitch. All of these

    bytes have a MSB and a LSB. Current steganographic software uses the LSB section of files

    as the medium of space to embed hidden messages. Either bit can be used, but the LSB is

    the most common area for hiding information within digital media. Encryption software is

    used to generate this code and its distribution within the file (Katzenbeisser 37).

    In the example of The Prisoners Problem above, single digit prime numbers were

    used to generate the encryption keys which were used by both parties to send private

    messages. The resulting coded messages soon became insecure with repeated

  • 8/9/2019 Steg Final

    19/28

    transmissions. This short term usefulness led to the development of the RSA secure

    transaction standard at MIT in the early 1970s. The significant improvement lies in the

    fact that the variables for the equation C = M K(mod n) which generate the encryption keys

    have increased from single digit numbers to prime numbers with up to several hundred

    digits. This drastically increases time and magnitude of both the calculations and

    encryption/decryption process. Absolute security is never guaranteed, but robustness

    against attack is certain.

    3. Models of transmission beat coding

    It is convenient at this point to move away from current theories and methods

    which are so harshly burdened with math in order to consider some models of information

    hiding within other disciplines, particularly those of music and audio recording.

    Computers speak binary, or digital, at the fundamental level of machine

    operation. Sophisticated electronic switching networks turn circuits either on (1) or off (0).

    The sequence of 1s and 0s tells the computer how to generate electrical impulses which

    will eventually be heard as sound. Large grouping of bits are assembled to make digital

    audio files.

    Although audio files are digital, their output is ultimately analog sound. When

    sound is recorded using computers and software, it becomes audio, and therefore digital.

    Noise or any other information can be added to a recording at any time. From this point

    on in this discussion, it may safely be assumed that a recording is digital. The models being

    presented were produced on software currently available to anyone. None of the software

  • 8/9/2019 Steg Final

    20/28

    is spy-ware or malicious code, and it is not considered to be a steganographic tools. It was

    actually designed for recording music and producing audio in various formats.

    Digital audio has two major format divisions. They are digital audio and MIDI. The

    distinction between digital audio and sampling is slight. It mainly pertains to the length

    of the sound being recorded. In modern culture, a sample refers to a short, recorded

    sound. This can be a short piece of music, a celebrity quote, or a sound effect. They are

    used in hip hop records, broadcast commercials, movies and live theater.

    A digital recording converts electrical impulses from a sound source to a string of

    1s and 0s, by splitting a original sound into thousands of extremely short slices of digital

    information, which are called samples in the realms of consumer and professional audio

    recording. They are recorded and played back so fast, generally at 44,100 times per

    second, or 44.1 k samples per second, that the human ear interprets them as continuous

    sound in much the same way that a movie filmed at 24 frames per second projects an

    illusion of unbroken activity on a movie screen. By comparison, analog tape recorders

    capture real time sound events as a constantly changing and continuous stream of

    information that mirrors the experience of sound.

    Regardless of the format, digital or analog, the end result is the same. We produce

    a recording of an event, which we can store, replay and manipulate with the proper

    software and equipment. In that sense, all recordings are equal, and therefore, all content

    is equal. Content is susceptible to manipulation. A typical recording of a rock

    band would include drums, bass, guitar and vocals. These are all recorded separately, and

    later mixed together to make a sound file that will become a commercial CD.

  • 8/9/2019 Steg Final

    21/28

    Besides the typical group of instruments, an engineer, band, or songwriter might

    want to include any other combination of instruments from kazoos to symphony

    orchestras. These decisions are always arbitrary, and they always contribute to the final

    recorded sound of all the instruments playing together. Many commercial recordings are

    made using Pro Tools, a software package that emulates a traditional recording studio

    inside a computer. A typical recording session brings some musicians in to a room who are

    connected through microphones and cables to a DAW, or Digital Audio Workstation. The

    DAW is comprised of the computer, Pro Tools, or some other recording software, and the

    associated hardware to connect the musicians to their virtual recording studio. As the

    musicians perform their song, an engineer establishes electrical contact and records the

    performance on separate tracks, which will later be manipulated to produce a recording of

    their song for commercial release.

    Consumer demands and industry standards guarantee that almost any recording

    will be in stereo. This means that there will be two separate channels, or sets of sound

    coming out of the speakers used for listening. These two separate channels will be almost

    identical, but will have slight differences in content for the purpose of recreating an

    experience of live performance. Our hypothetical session has gone well, and we have a

    song ready to release, when, suddenly the bass player says that he wants to add a secret

    message for all of his fans. Being an Eagle Scout, and a great fan of steganography, he

    decides to use Morse code to send his message, and he is going to spread it across the left

    and right channels to indicate the dots and dashes, respectively. He could have just used

    two notes played repeatedly on his bass to substitute for the dot and dash. But, he decided

  • 8/9/2019 Steg Final

    22/28

    that the left, right distinction between the dots and dashes would add another layer of

    security, since most people would not be looking for a secret coded message in a

    tambourine track. He taps out the message with the tambourine, and it is recorded on to

    a track. The pattern is simple, with no syncopation or ornamentation, and each tap always

    falls exactly on a quarter note in the song. The engineer later separates the designated

    beats to the left and right tracks of the final mix.

    There are ten letters in the phrase hello world. The Morse code ciphers for the

    letters are no longer than four pulses each, so one letter can be placed in each measure.

    Their distribution across time is less important than their placement in the left or right

    channel. The Morse code for hello world is:

    H E L L O W O R L D* * * * * * - * * - - * * - - - * - - - - - * - * * - * * - * *L L L L L L R L L L R L L R R R L R R R R R L R L L R L L R L L

    The L and R notes below the * and directly transpose the Morse code elements to a

    stereo mix of the song. The part is not very musical or long, but it is also neither offensive,

    nor out of place within the context of a rock song.

    Everyone in the studio agrees that this is a good method of hiding the code,

    everyone, that is, except the drummer, who feels like the tambourine is directing attention

    away from his drum part. But, he does have a valid argument when he says, . . . that not

    many people will easily notice the difference between a single beat being in the left

    channel versus the right. So, he proposes that the code be placed using two different

    notes on a piano. The notes will be played on the same beats as the tambourine, still

  • 8/9/2019 Steg Final

    23/28

    audible, but less intrusive. The engineer adds some reverb to the piano track to make it

    sound far away and groovy, which it does.

    While all of this has been going on, the guitar player has been out in the parking lot

    trying to flag down a pizza delivery truck, to no avail; there is no pizza in sight. So, when

    he comes back to the studio and hears the crazy stuff that has been done to his song, he

    is livid. Agreeing that a secret message in the song could boost sales, he proposes a way

    to keep it really secret. Instead of using notes or beats that can actually be heard, he

    suggests using short pulses of high frequency noise placed on the same beats. The human

    ear can detect frequencies ranging from 20 hertz, or cycles per second on the low end to

    20,000 hertz, or 20,000 cycles per second. The higher the number of vibration cycles that

    occur, the higher the pitch will sound. The 20-20,000 Hz range is an average range. In

    fact, many people cannot hear all the way up to the top of this frequency range, actually

    losing the ability to hear sounds between 15-20,000 Hz. Knowing this fact, the band

    agrees to use a pitch of 15,800 Hz, which will be almost indiscernible to the ear, but which

    will none the less, appear in the left and right channels of the final mixed output. This

    method works so well that it cannot be heard without a little electronic processing. Using

    a high pass audio filter which will remove lower frequencies, and a compressor to reduce

    the dynamic range, or difference between the loudest and softest sounds heard in the

    song, the message can now be heard as a series of high frequency beeps playing along

    with the rest of the song, which sounds like it is being played through very bad speakers.

    If we ignore the inferior sound quality that this final processing creates, our secret

    message can now be heard, regardless of the fact that most consumers will not be aware

  • 8/9/2019 Steg Final

    24/28

    of it being in the mix, nor will they have a copy of Pro Tools on their home computer or car

    stereo. However, a real spy could use a piece of equipment known as a spectrum

    analyzer to graphically represent the frequency and duration of the high frequency pulses,

    thus revealing the code to a trained eye. The band has successfully hidden their message

    in the audio of their song.

    But, digital audio is only one method of creating music with computers. MIDI is an

    industry standard set of computer codes which allows computers and other electronic

    musical instruments to exchange information and create music. This differs from digital

    audio in that, no sound is played outside the computer for recording purposes. Instead, a

    set of binary codes introduced in 1982, and known as the general MIDI standard instructs

    the instrument and computers to generate a note with an electronic music synthesizer

    inside the computer or controlling instrument.

    Digital Performer is a computer based music notation program which allows

    composers several methods of arranging notes in a song, by either playing them on an

    electronic music keyboard, writing them on a virtual musical staff or by editing the actual

    computer code that represents pitch, volume, duration and special effect processing to

    make the notes sound more natural. The main benefit to MIDI over digital audio is that it

    creates very small files which can easily be transferred over the internet. It is a convenient

    way to write down musical scores which can then be played by musicians in a live setting.

    INSERT BERG AND TRADITION

  • 8/9/2019 Steg Final

    25/28

    For the last set of examples, we will again take the same message, hello world,

    and apply the system of note substitution demonstrated by Gaspar Schott. As previously

    mentioned, the method of substitution is arbitrary. The possibilities include using the

    notes in the scale of a piece of music, a chromatic or other type of scale. In both cases, a

    simple letter to note relationship is established, and the message is spelled out using

    notes instead of letter. Here is the cipher for this example:

    Since H is the first letter in the message, C5, or the C one octave above middle C on a

    piano was chosen, with the rest of the letters being sequenced consecutively on either

    side of C5.Now, by playing these notes on a MIDI keyboard that is hooked up to Digital

    Performer, a melody is created.

    In the next examples, our melody which, although not very musical, offers a medium to

    transmit a message. By adding harmony parts to this melody, a la Cardon, the message

    can be obscured even more from casual observation.

  • 8/9/2019 Steg Final

    26/28

    In this example, the melody has been placed in the alto part, the second staff of notes

    from the top of our song. Another approach would be to embelish our message with

    ornamental notes to intentionally distract the eye and ear from tne message.

    And, finally, a combination of harmony and spread spectrum distribution of notes across

    the staff could be set up to create even more confusion for potential attackers looking for

    our message.

  • 8/9/2019 Steg Final

    27/28

    In this example, the music begins with the first note being placed in the top staff,

    the second note in the next staff down and so on until all ten notes have been played. The

    placement of notes on the staff begins again with the fifth and ninth being placed in the

    top staff, and all other notes cascading across the staves in order to spell out our message.

    While this pattern is being spelled out, the bass line, in the bottom staff,, walks steadily

    through the same melody, with a few extra notes at the end to make it last as long as the

    rest of the song.

    These three embellishments represent only a small sampling of the possible tactics

    of embedding a secret message within a MIDI file. Other tactics might include an acrostic

    approach, where the first note of each measure or phrase held a significant note. An

    encryption process could also require a message to be transposed with a shift cipher

    before the note substitution ever began.

  • 8/9/2019 Steg Final

    28/28

    In all of the audio examples presented, there is one important quality that

    separates the examples from most hidden communications. As mentioned earlier, most

    steganography and cryptographic software and practices hide their information in the LSB

    portion of the cover object, since this has little or no effect on the appearance of the cover.

    By placing the coded information squarely on the main beats of the song, either by note

    values, or by adding extra sound, the information now resides in the MSB area of the song.

    It is not randomized noise, but, it has been made a fundamental aspect of constructing

    the file. This placement does not exempt the code from scrutiny by computer analysis, but

    places it within a context of the composition , and therefore validates its placement,

    making it, potentially less suspect as code that was added after the cover object was

    generated.

    4. Conclusion

    It is possible to embed covert communications at several different levels of audio

    production that do not adhere to current steganographic conventions. It is uncertain

    whether current algorithms can detect these contextual, arbitrary ciphers. They will

    appear as binary code that is subject to decryption. However, the key and randomization

    variables are derived outside of normal decryption methods.Further study will be

    required to determine exactly how robust they are to detection by current methods. Given

    these questions, it is still fair to assume that the best played scans of 1s and 0s might still

    lead someone else astray from the intended message, and may hint at a new set of tactics

    in the field of audio forensics and steganography.