Further Topics in MIR - audiolabs-erlangen.de...Meinard Müller, Christof Weiss, Stefan Balke...

40
Meinard Müller, Christof Weiss, Stefan Balke Further Topics in MIR International Audio Laboratories Erlangen {meinard.mueller, christof.weiss, stefan.balke}@audiolabs-erlangen.de Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik

Transcript of Further Topics in MIR - audiolabs-erlangen.de...Meinard Müller, Christof Weiss, Stefan Balke...

Meinard Müller, Christof Weiss, Stefan Balke

Further Topics in MIR

International Audio Laboratories Erlangen{meinard.mueller, christof.weiss, stefan.balke}@audiolabs-erlangen.de

TutorialAutomatisierte Methoden der Musikverarbeitung47. Jahrestagung der Gesellschaft für Informatik

Why is Music Processing Challenging?

Chopin, Mazurka Op. 63 No. 3 Example:

Why is Music Processing Challenging?

Waveform

Chopin, Mazurka Op. 63 No. 3 Example:A

mpl

itude

Time (seconds)

Why is Music Processing Challenging?

Waveform / Spectrogram

Chopin, Mazurka Op. 63 No. 3 Example:Fr

eque

ncy

(Hz)

Time (seconds)

Why is Music Processing Challenging?

Waveform / Spectrogram

Performance– Tempo– Dynamics– Note deviations– Sustain pedal

Chopin, Mazurka Op. 63 No. 3 Example:

Why is Music Processing Challenging?

Waveform / Spectrogram

Performance– Tempo– Dynamics– Note deviations– Sustain pedal

Polyphony

Chopin, Mazurka Op. 63 No. 3 Example:

Main Melody

AccompanimentAdditional melody line

Decomposition of audio stream into different sound sources

Central task in digital signal processing

“Cocktail party effect”

Source Separation

Source Separation

Decomposition of audio stream into different sound sources

Central task in digital signal processing

“Cocktail party effect”

Several input signals

Sources are assumed to be statistically independent

Source Separation (Music)

Time

Time

Main melody, accompaniment, drum track

Instrumental voices

Individual note events

Only mono or stereo

Sources are often highly dependent

Harmonic-Percussive Decomposition

Mixture:

Harmonic-Percussive Decomposition

Harmonic component

Percussive component

Clearly percussive soundsClearly harmonic sounds

Mixture:

Harmonic-Percussive Decomposition

Clearly percussive soundsClearly harmonic sounds

Mixture:

Harmonic component

Residualcomponent

Percussive component

Harmonic-Percussive Decomposition

Mixture:

• Clearly harmonic sounds of singing voice and accompaniment

• Drum hits• Fricatives &

plosives in singing voice

• Noise-like sounds• Vibrato/glissando

sounds

Demo: https://www.audiolabs-erlangen.de/resources/2014-ISMIR-ExtHPSep/

Harmonic component

Percussive component

Residualcomponent

Literature: [Driedger/Müller/Disch, ISMIR 2014]

Singing Voice Extraction

Singing voice Accompaniment

Original Recording

Singing Voice Extraction

Original recording HPR

Harmonic component Residual componentPercussive component

Harmonic portion singing voice

MR TR SL

F0 annotation

Harmonic portion accompaniment

Fricativessinging voice

Instrument onsetsaccompaniment

Vibrato & formantssinging voice

Diffuse instruments soundsaccompaniment

+ +

Estimatesinging voice

Estimateaccompaniment

Time

Freq

uenc

y

Score-Informed Source SeparationExploit musical score to support separation process

Time

Pitc

hP

itch

Time

Pitc

h

Time

Freq

uenc

y (H

z)

Render

Parametric Model Approach

Estimate

Parameters

Time (seconds) Time (seconds)

Freq

uenc

y (H

z)

Rebuild spectrogram information

NMF (Nonnegative Matrix Factorization)

≈N

K

K

M

≥ 0 ≥ 0 ≥ 0

M

NMF (Nonnegative Matrix Factorization)

Templates Activations

N

M K

K

M

Magnitude Spectrogram

Templates: Pitch + Timbre

Activations: Onset time + Duration

“How does it sound”

“When does it sound”

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Random initialization

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

yFr

eque

ncy

Note number

Not

e nu

mbe

r

Time

Learnt templates Learnt activations

Initialized template Initialized activations

Random initialization → No semantic meaning

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Constrained initialization

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Activation constraints for p=55

Initialized template Initialized activations

Template constraint for p=55

Constrained initialization

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

yFr

eque

ncy

Not

e nu

mbe

r

Time

Org

Model

Note number

Initialized template Initialized activations

Constrained initialization → NMF as refinement

Learnt templates Learnt activations

Score-Informed Audio Decomposition

500

580

523

Freq

uenc

y (H

ertz

)

0 10.5Time (seconds)

9876

1600

1200

800

400

9876

1600

1200

800

400

500

580

554Fr

eque

ncy

(Her

tz)

0 10.5Time (seconds)

Application: Audio editing

Informed Drum-Sound Decomposition

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparationLiterature: [Dittmar/Müller, IEEE/ACM-TASLP 2016]

Remix:

Loop Decomposition of EDM

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-ISMIR-EMLoopLiterature: [López-Serrano/Dittmar/Müller, ISMIR 2016]

Decomposition Patterns Activations

Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be

Mosaic signal: Let it Bee

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBeeLiterature: [Driedger/Müller, ISMIR 2015]

NMF-Inspired Audio Mosaicing

. =

Non-negative matrix factorization (NMF)

Proposed audio mosaicing approach

.

Non-negative matrix Components Activations

Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram

fixed

learnedfixed

learned

fixed

learned

=

Time source

Freq

uenc

y

Tim

e so

urce

Time targetTime target

Freq

uenc

y

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Core idea: support the development of sparse diagonal activation structures

Activation matrix

This image cannot currently be displayed.This image cannot currently be displayed.

Iterative updates

Preserve temporal context

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times

Mosaic signal

Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep

Mosaic signal

Motivic Similarity

Motivic Similarity

B A C H

Teaching

Academic training of students

Fundamental research

Summary

Music information retrieval

Audio decomposition techniques

Machine learning

Music applications & musicology

Multimedia scenarios

Web-based interfaces

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de