Dirk Pieper/Friedrich Summann Bielefeld UL
description
Transcript of Dirk Pieper/Friedrich Summann Bielefeld UL
![Page 1: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/1.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Bielefeld Academic Search Engine (BASE):
an End-user Oriented Institutional Repository Search Service
Dirk Pieper/Friedrich Summann
Bielefeld UL
![Page 2: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/2.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Part 1:Institutional Repository ServersBASE: concept and contentCreating a special view on institutional repository server collectionsDemo: BASE user-interface and further visions
Part 2:OAI dataflow, BASE dataflowRepository information in registriesOAI harvesting problemsFurther developments of BASE
Overview:
![Page 3: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/3.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Definition: “A digital collection capturing and preserving the intellectual output of a single or multi-university community.” (Raym Crow, http://www.arl.org.sparc/IR/ir.html)IR servers exist of course also outside the university community IR servers appear as simple web sites, database systems with OAI interface, …
Institutional Repository Servers:
![Page 4: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/4.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
BASE uses Fast Data Search BASE contains intellectual selected resources with focus on OAI-Servers but also web crawled contentBASE displays result lists as bibliographic data and full text hitsBASE frontend is written in PHP using the search API from Fast Data SearchBASE offers sorting, search refinement and search history
BASE: concept and content
![Page 5: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/5.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Search API
Pipeline
QU
ERY &
RESU
LTPR
OC
ESSINGDO
CU
MEN
TPR
OC
ESSING
Pipeline
Pipeline
FILETRAVERSER
FILTER
SEARCH
INDEXFILES
CO
NN
ECTO
RS
TUNING, ADMINISTRATION and DEBUGGING
WEBCRAWLER
BASE: concept and content
![Page 6: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/6.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
BASE: concept and content At present 2,7 mio documents in 189 collections,
15 of them web crawled data
![Page 7: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/7.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Projekt Gutenberg-DE
Internet Library of Early Journals Oxford
Various Institutional Repositories
Springer Link Metadata
Cornell HistMath Fulltext Crawl
University Michigan Historical Math
CiteSeer Zentralblatt Mathematik
Bielefeld Univ: Math. Preprints
ArXiv OPAC UL Bielefeld
Ifo Institute Munich
Zeitschriften der Aufklärung (Bielefeld UL)
BASE: concept and content
![Page 8: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/8.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Special view on IR server collections Collections are listed in configuration file
[ftubirmingham]url = "http://eprints.bham.ac.uk/"desc_de = "The Univ. of Birmingham: Eprints Archive"desc_en = "The Univ. of Birmingham: Eprints Archive"descdd_de = "Birmingham Univ."descdd_en = "Birmingham Univ."
Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], …
Parametric search possible
Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)
![Page 9: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/9.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Try your search on Google Scholar ...
Vision: search in Google Scholar
![Page 10: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/10.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Check citations (citing articles) in Google
Scholar ...
Vision: check citations in Google Scholar
![Page 11: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/11.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
OAI-Data
Harvesting
BASE Internal Index (FAST)
OPAC
Article Database
Dissertations,monographs
(fulltext)
Articles(fulltext)
PubMed, Euclid,ArXiv, CiteSeer,
Citebase, DOAJ articles
All ressources(texts, images,
video,references ....
OAI dataflow at Bielefeld UL
![Page 12: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/12.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
OAI-Data Web PagesDatabaseRecords
Harvesting Pre-Processing
Processing
Internal Index (FAST)
User interface (PHP)
BASE dataflow
![Page 13: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/13.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Eprints Registry (607)
Openarchives.org (383)
DSpace Registry (28)
Directory of Open Archive Repositories (324)
Univ. of Illinois Registry (1000)
Repository information in registries
![Page 14: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/14.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
2
1612
12
5514
6
33
4
2
18
17
3
3
USA 76Canada 13South America 2Africa 2 India 3Australia 11New Zealand 1
3
OAI-compliant univ. repositories in BASE
![Page 15: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/15.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
OAI Registry Watcher(Bielefeld UL, Perl)
Open Source Harvester (FS Consulting, Perl with modifications) XML Validator and Repairer
(Bielefeld UL, based on Perl XML modules
OAI Harvest Watcher(Bielefeld UL, Perl)
OAI Resource Updater(Bielefeld UL, Perl)
Tools for the Harvesting Environment
![Page 16: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/16.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Repositories do not response or deliver Error Messages
Data contain only References without any Fulltext
Links to the Document do not work
Access to fulltext is restricted
XML file is not well-formed
Field content varies
OAI harvesting challenges
![Page 17: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/17.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es <source>http://xxx.xxx.uni-xxxxx.de/publications/
ELibD905_diplom_allnoch.pdf</source>
<dc:creator>Barry Wellman,Jeffrey Boase,Kakuko Miyata</dc:creator> <dc:subject>Barry Wellman,Jeffrey Boase,Kakuko Miyata The Mobile-izing ....</dc:subject>
<dc:title>Talk P. Bruzzone</dc:title> <dc:creator>Bruzzone </dc:creator> <dc:creator>Pierluigi</dc:creator>
Reproductive Biology and Endocrinology 2004, 2:52 doi:10.1186/1477-7827-2-52
<dc:date>2004-07-05</dc:date> <dc:type>Review </dc:type><dc:identifier>http://www.rbej.com/content/2/1/52</dc:identifier>
OAI Harvesting: Problems in Practice 1
![Page 18: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/18.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
EN: 9910ENG: 771En: 566Eng: 1English: 24084English (United States): 63English and Greek: 1English and Russian: 1English/Japanese: 1English; Russian: 1English=en: 1Translation into English: 2
en: 1279115en-CA: 865en-US: 3en-es: 5en-us: 8en;: 2en_UK: 618en_US: 18456eng: 186787eng : 92eng + dut: 2eng;: 17eng; fre; ger;: 141 ....
OAI Harvesting: Problems in Practice 2- Variations of <dc:language>
![Page 19: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/19.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Standard repository software is great - for OAI harvesting as well
Small collections – small problems
Getting the related fulltext is complicated
Libraries produce better metadata
Data aggregation may produce problems
Writing e-mails helps - sometimes
Some Rules from Harvesting Practice
![Page 20: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/20.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Search form (working)
HTTP calls (working)
Web Service (in development)
Federated Search (Vascoda) (in discussion)
Further Developments: BASE Interfaces
![Page 21: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/21.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
<form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /></form>
Local Integration: Search Form
![Page 22: Dirk Pieper/Friedrich Summann Bielefeld UL](https://reader036.fdokument.com/reader036/viewer/2022062802/56814495550346895db133d7/html5/thumbnails/22.jpg)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Thank you!