Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena...

Post on 20-Sep-2020

6 views 0 download

Transcript of Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena...

Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova

35th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, Portland, USA

• In the modern world people are producing a large amount of visual content

• Photo sharing is one of the most popular activities in social applications

235th SIGIR Conference Portland, USA 12/08/12

Such images can be of a highly sensitive nature, disclosing many details of

the users' private sphere. For example photos showing weddings, family

holidays and private parties.

Privacy directed

Search

and

Diversification

Support sharing

Decision

335th SIGIR Conference Portland, USA 12/08/12

Technical challenge

Private

Public

Work Sea Winter Water

Automatic privacy directed image detection and search

435th SIGIR Conference Portland, USA 12/08/12

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

535th SIGIR Conference Portland, USA 12/08/12

Overview: Sensitive Information on Web

• Colleges keep track of student online activities. The posting of

personal information by students has consequences1,2

• Only a minimal percentage of users changes the highly permeable

privacy preferences (4000 students)3

~90% contain an image, birthday, real name; 40% phone

number

• Even people who did not publish any compromising information,

can leave discoverable footprints (mark-a-friend in Facebook)

1. V. Schleswig-Holstein. Statistische Erfassung zum Internetverhalten Jugendlicher und Heranwachsender. In A study of the

consumer organization in Schleswig-Holstein, Germany, March 2010.

2. S. B. Barnes. A privacy paradox: Social networking in the united states. First Monday, 11(9), Sept. 2006

3. Gross and A. Acquisti. Information revelation and privacy in online social networks. In WPES '05.

.

635th SIGIR Conference Portland, USA 12/08/12

Overview: State of the Art

• Privacy prediction: Based on tags and manually defined user privacy profile

(Vyas et al. 2009, Ahern et al. 2007)

•Access control policies: Access to parts of social graph, use of tags and FOAF relations

(Felt et al. 2008, Au Yeung et al. 2009)

• Image analysis: Textual features in Web2.0

(Figueiredo et al. 2009, San Pedro et al. 2009)

Visual features for photo quality

(Yeh et al. 2010)

735th SIGIR Conference Portland, USA 12/08/12

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

835th SIGIR Conference Portland, USA 12/08/12

DATA

• Gathering average community notion of privacy

• We crawled “most recently uploaded” Flickr photos (2 Months)

• Started a social annotation game (over the course of 2 weeks)

• 81 users (colleagues, social networks , forum users) , 6 teams

9

„Private are photos which have to do with the private

sphere (like self portraits, family, friends, your home) or

contain objects that you would not share with the entire

world (like a private email). The rest is public. In case no

decision can be made, the picture should be marked as

undecidable."

35th SIGIR Conference Portland, USA 12/08/12

DATA: Inter Rater Agreement

• 37,535 images were judged, each by at least two persons

• 70% were labeled public or undecidable by all annotators

• 13% were labeled private by all annotators, 28% by at least one person

• 4,701 private, 27,405 public labels were assigned.

• Inter-Rater Agreement for 100 photos and 36 users: Fleiss kappa=0.6

1035th SIGIR Conference Portland, USA 12/08/12

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

1135th SIGIR Conference Portland, USA 12/08/12

Features

• Frontal face detection: faces associated with higher privacy

• Edges: Long coherent edges correspond to artificial environments

• Colors: fewer dominant colors correspond to professional photos

• SIFT - Scale Invariant Feature Transform: Objects/Regions detection

• Text: Tags, image title

• Brightness/Sharpeness/Profile faces did not show strong discriminative

properties

1235th SIGIR Conference Portland, USA 12/08/12

Features: Colors

13

Public

Private

We determined most discriminative colors for each

class using Mutual Information Theory

Example of a public photo with a few dominant colors and a private photo.

35th SIGIR Conference Portland, USA 12/08/12

Features: Edges

14

Example of a public photo dominated by incoherent edges and a private photo of a

working place with a mix of coherent and incoherent edges.

35th SIGIR Conference Portland, USA 12/08/12

Features: SIFT

1535th SIGIR Conference Portland, USA 12/08/12

Features: Text

16

Family, Emotions, Sentiment Nature, Inanimate

35th SIGIR Conference Portland, USA 12/08/12

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

1735th SIGIR Conference Portland, USA 12/08/12

Classification

18

• We used SVM classifier from SVMLight library

• We converted Edges and Colors histograms to feature vectors

• By SIFT and Text features each object or term is a dimension

• We normalized values in each dimension into the range [0,1] using

Platt’s sigmoid method

35th SIGIR Conference Portland, USA 12/08/12

Classification

19

• Labeled images: 4,701 private, 27,405 public

• Balanced set of 4,701 private and 4,701 randomly selected public images

• We used 60% as training data and 40% as test data

• We used Precision-Recall Curves and Break Even Points as quality

measure

• We tested visual, textual features and their combinations

35th SIGIR Conference Portland, USA 12/08/12

Textual Features P/R Curve

20

The pictures we used for classification experiments, contained good quality textual metadata (e.g titles and at

least three English tags). Thus the text features could provide a short but concise summary of the image

content and result in a BEP of 0.78.

35th SIGIR Conference Portland, USA 12/08/12

Visual Features P/R Curves

21

• The occurrence of faces in photos is an intuitive indicator for privacy, reflected by a

BEP of 0.63 for the face feature

• The edge-direction coherence feature achieves a BEP of 0.65

• SIFT features outperform all of the other visual features (BEP = 0.70)

35th SIGIR Conference Portland, USA 12/08/12

Feature Combinations P/R Curves

22

The combination of the visual and textual features leads to a BEP of 0.80, showing that

textual and visual features can complement each other in the privacy classification task

However, classification with only visual features alone also produces promising results, and

can be useful if no or insufficient textual annotations are available as is the case for many

photos on the web.

35th SIGIR Conference Portland, USA 12/08/12

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

2335th SIGIR Conference Portland, USA 12/08/12

24

Privacy Directed Search

35th SIGIR Conference Portland, USA 12/08/12

25

PicAlert!

35th SIGIR Conference Portland, USA 12/08/12

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

2635th SIGIR Conference Portland, USA 12/08/12

Conclusion and Future Work

• We applied classification using various visual and textual features

• Classification models were trained on a large-scale dataset with privacy

assignments obtained through a social annotation game

• Approach of using only visual features shows applicable results and can be

applied in scenarios where no textual annotation is available (e.g. personal

photo collections or mobile phone pictures)

Future Work:

• Using collaborative filtering for personalization

• Using other features like Color-Sift. Using context (mobile sensors)

• Larger user studies / annotation games / temporal developments study

• Integration into popular Web2.0 applications

2735th SIGIR Conference Portland, USA 12/08/12

PicAlert: http://l3s.de/picalert/

Sergej Zerr, Stefan Siersdorfer, Jonathon Hare

zerr@L3S.de

Data

Features

Search & Diversification Evaluation

Thank you!Special thanks to ACM SIGIR,

for providing the travelling grant!