Folksonomies Indexing Und Retrieval In Bibliotheken

Post on 01-Dec-2014

1.410 views 0 download

description

Präsentation im Rahmen des Studiengangs "Library and Information Studies" an der Karl-Franzens-Universität Graz.

Transcript of Folksonomies Indexing Und Retrieval In Bibliotheken

1

FolksonomiesInhaltserschließung und Retrieval

im Web 2.0und

in Bibliotheken

Dr. phil. Isabella Peters

Heinrich-Heine-Universität Düsseldorf

Abteilung für Informationswissenschaft

Uni Graz – 17. Dezember 2009

2

Folksonomies: Indexing without Rules

“Anything goes”

“Against method”, 1975 (Paul K. Feyerabend, Austro-American

philosopher)

Tagging

• no rules

• no methods – or even against methods

• indexing a single document

– synonyms – why not? (New York – NY – Big Apple – … )

– homonyms – never heard! (not: Java [Programming Language] – Java

[Island], but Java)

– translations – why not? (Singapore – Singapur – …)

– typing errors – nobody is perfect (Syngapur)

– hierarchical relations (hyponymy) – why not? (Düsseldorf –

North Rhine-Westfalia – Germany)

– hierarchical relations (meronymy) – why not? (tree – branch – leaf)

3

Indexing – in general

4

Tri-partite System of Folksonomies

Folksonomies consist always of 3 parts

1) document (resource)

2) prosumer (user)

3) tag

5

Users – Tags - Documents

thematically linked

shared users thematically linked

shared documents

6

Shared Documents & Thematically

Linked Users

more like this ...

� similar documents

detection of documents

more like me ...

� similar users

detection of communities

thematically linked

shared documents

7

More like me! Or: More like This User!

• starting point: single user (ego)

• processing

– (1) tag-specific similarity• all tags of ego: a(t)

• all tags of another user B: b(t)

• common tags of ego and another user B: g(t)

– (2) document-specific similarity• all tagged documents of ego: a(d)

• all tagged documents of another user B: b(d)

• common tagged documents of ego and another user B: g(d)

– calculation of similarity• tag-specific: Jaccard-Sneath: Sim(tag; Ego,B) = g(t) / [a(t) + b(t) – g(t)]

• document-specific: Jaccard-Sneath: Sim(doc; Ego,B) = g(d) / [a(d) + b(d) – g(d)]

• ranking of Bi by similarity to ego (say, top 10 tag-specific and top 10 document-specific users)

• merging of both lists (exclusion of duplicates)

• cluster analysis (k-nearest neighbours, single linkage, complete linkage, group average linkage)

– result presentation: social network of ego in the centre

8

More like me! Or: More like This User!

single linkage clustering (fictitious example)

Sim(tag) = 0.21

Sim(doc) = 0.25

Sim(tag) = 0.65

Sim(doc) = 0.55

Sim(tag) = 0.33

Sim(doc) = 0.29

Sim(tag) = 0.17

Sim(doc) = 0.23

Sim(tag) = 0.08

Sim(doc) = 0.11

Sim(tag) = 0.15

Sim(doc) = 0.17

Sim(tag) = 0.45

Sim(doc) = 0.36

9

Narrow Folksonomies

• only onetagger (the content creator)

• no multiple tagging

• example: YouTube

Tags

10

Extended Narrow Folksonomies

• more than one tagger

• no multiple tagging

• example: Flickr

Source: Vander Wal (2005)

Tags

Add Tags Option

11

Broad Folksonomies

• more than one tagger

• multiple tagging

• example: Delicious

Source: Vander Wal (2005)

Tags

12

Folksonomies make use of

Collective Intelligence

Collective Intelligence

• “Wisdom of the Crowds” (Surowiecki)

• “Hive Minds” (Kroski) – “Vox populi” (Galton) – “Crowdsourcing”

• no discussions, diversity of opinions, decentralisation

• users tag a document independently from each other

• statistical aggregation of data

Collaborative Intelligence

• discussions and consensus

• prototype service: Wikipedia (but: 90 + 9 + 1 – rule)

“Madness of the Crowds”

• e.g., soccer fans – hooligans

• no diversity of opinion – no independence – no decentralisation –no (statistical) aggregation

13

Power Tags

• Power Law Distribution • Inverse-logistic Distribution

Power Tags Power Tags

14

Power Law Tag Distribution

Source: http:// del.icio.us

Tags zu www.visitlondon.com

0

10

20

30

40

50

60

70

Lond

on

Trav

el

UKEn

gland

Tour

ism

Guid

e

Cultu

reIn

form

ation

Ente

rtainm

ent

Holid

ayLo

ndre

s

Lond

ra

f (x)= C / xa

Users

Tags

80/20-Rule

Power Tags

Long Tail

15

Tags zu www.asis.org

0

5

10

15

20

25

30

35

Assoc

iation

sLib

rary

Inform

ation

Inform

ation

scien

ce IATe

chno

logy

Profes

siona

lRes

earch

Usabil

ityScie

nce

Libra

ries

Web

Inform

ation

arch

itectu

re

ITOrg

aniza

tions

Archite

cture

Organ

zatio

nCom

puter

sCon

feren

ce

Inform

ation

_arch

itectu

re

Inform

ation

_scie

nce

Societ

y

Inverse-logistic Tag Distribution

Source: http:// del.icio.us

Users

Tags

f (x)= e-C‘(x-1)b

Long Trunk

Long Tail

Power Tags

16

Use of Power Tags

• Power Tags as factor in relevance ranking �

documents tagged with Power Tags appear higher in

ranking

• Power Tags as candidate tags for Tag Gardening �

which (semantic) relation do they have with co-

occuring tags?

17

Benefits of Indexing with Folksonomies

• authentic user language – solution of the “vocabulary problem”

• actuality

• multiple interpretations – many perspectives – bridging the semantic gap

• raise access to information resources

• follow “desire lines” of users

• cheap indexing method – shared indexing

• the more taggers, the more the system becomes better – network effects

• capable of indexing mass information on the Web

• resources for development of knowledge organization systems

• mass quality “control”

• searching - browsing – serendipity

• neologisms

• identify communities and “small worlds”

• collaborative recommender system

• make people sensitive to information indexing

18

Disadvantages of Indexing with

Folksonomies

• absence of controlled vocabulary

• different basic levels (in the sense of Eleanor Rosch)

• different interests – loss of context information

• language merging

• hidden paradigmatic relations

• merging of formal (bibliographical) and aboutness tags

• no specific fields

• tags make evaluations (“stupid”)

• spam-tags

• syncategoremata (user-specific tags, “me”)

• performative tags (“to do”, “to read”)

• other misleading keywords

� solution: Tag Gardening with methods of Information Linguistics, user

collaboration in giving meaning to tags and combination with existing

knowledge organization systems

19

Goal of Tag Gardening: EmergentSemantics

Quelle: Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/v5n3/a58.html.

20

Maintenance of KOS and Folksonomy

Folksonomy KOS

Tag Gardening

new terms – new relations

Quelle: Christiaens, S. (2006). Metadata Mechanism: From Ontology to Folksonomy…and Back. LectureNotes in Computer Science, 4277, 199–207.

21

Feedback Loop in Practice:

Tagging of OPACs

2 possibilities:

• 1) tagging of resources within the library’s website

• 2) tagging of resources outside the library’s firewall

22

Tagging of OPACS: Within Library’s

Website: PennTags

http://tags.library.upenn.edu/

23

Tagging of OPACS: Within Library’s

Website: Ann Arbor District Library

http://www.aadl.org/catalog

24

Tagging of OPACS: Within Library’s

Website: University Library Hildesheim

http://www.uni-hildesheim.de/mybib/all_tags

25

Tagging of OPACS: Within Library’s

Website

• advantages:

– user behaviour can be directly observed and

exploited for own applications

– used knowledge organization system (KOS) can

profit from user behaviour and user language

– users will be “attracted” to the library

– library will appear “trendy”

26

Tagging of OPACS: Within Library’s

Website

• disadvantages:

– development and implementation (costs and

manpower) of the tagging service have to be taken

over from the library

– if only users may tag: librarians may loose their

work motivation or may have a feeling of

uselessness

– “lock- in”- effect of users � no “fresh” ideas

27

Tagging of Resources Outside the

Library‘s Firewall: LibraryThing

http://www.librarything.com/search

28

Tagging of Resources Outside the

Library‘s Firewall: BibSonomy

http://www.bibsonomy.org/

29

Tagging of Resources Outside the

Library‘s Firewall

• advantages:

– development and implementation (costs and

manpower) of the tagging service haven‘t to be

taken over from the library

– the library may profit from the “know- how” of the

provider of the tagging system

– users may profit from tagging activities of

hundreds of other users � no lock- in

– library appears “trendy”

30

Tagging of Resources Outside the

Library‘s Firewall

• disadvantages

– user behaviour cannot be observed or exploited

– your users support other tagging service

– used KOS cannot profit from user behaviour

31

Exkurs: Sentiment Tags

• negative tags: “awful” – “foolish”, …

• positive tags: “amazing” – “useful”, …

• applicable for sentiment analysis of documents

Quelle: Yanbe, Y., Jatowt, A., Nakamura, S., & Tanaka, K. (2007). Can Social Bookmarking Enhance Search in the Web? In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, Vancouver, Canada (pp. 107–116).

32

Summary

• knowing how folksonomies work is important for their

adequate application in both

– knowledge representation and

– information retrieval

• knowing why folksonomies work is a secret ☺

33

Knowledge Representation and

Information Retrieval

• two sides of the same coin

• Immanuel Kant: Thoughts without content are

empty, intuitions without concepts are blind...

Knowledge Representationwithout Information Retrieval is

empty.

Information Retrieval without Knowledge

Representation is blind.

FeedbackLoop

34

Folksonomies and

Knowledge Organization Systems

• two sides of the same coin

• no rivals - work best in combination!

flexible, up-to-date, user-centric precise, rigid, complete

FeedbackLoop

35

Viele Grüße aus Düsseldorf.

Kontakt: isabella.peters@uni- duesseldorf.de

Erschienen 2009 im Verlag Saur, de Gruyter