GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass...

28
GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH Institute for Scientific Computing P.O. Box 3640 D-76021 Karlsruhe, Germany Dr. Doris Ressmann http://www.gridka.de

Transcript of GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass...

Page 1: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 1

Mass Storage at GridKa

Forschungszentrum Karlsruhe GmbHInstitute for Scientific Computing

P.O. Box 3640D-76021 Karlsruhe, Germany

Dr. Doris Ressmann

http://www.gridka.de

Page 2: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 2

Introduction

• Overview

• What is dCache?

• Pool Selection mechanism

• dCache properties

• LCG connection

• Access to dCache – connection to CERN

• Tape Management

• Conclusion

Page 3: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 3

Service Challenge

disks

SANgridftp

10Gbit

Page 4: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 4

Mass Storage Environment

SAN

NFS xrootd dCache

gridftp

10Gbit tape library

disks

Page 5: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 5

What is dCache?

• Developed at DESY and FNAL

• Disk pool management with or without tape backend

• Data may be distributed among a huge amount of disk servers.

• Automatic load balancing by cost metric and inter pool transfers.

• Data removed only if space is needed

• Fine grained configuration of pool attraction scheme

Page 6: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 6

Pool Selection Mechanism

• Pool Selection required for:

• Pool selection is done in 2 steps

– Query configuration database :

→ which pools are allowed for requested operation (intern/extern)

– Query 'allowed pool' for their vital functions :

→ find pool with lowest cost for requested operation

Client dCache

Tape dCache

dCache dCache

dCache Client

Page 7: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 7

LCG Storage Element

• DESY dCap lib incorporates with CERN GFAL library

• SRM version ~ 1.1 supported

• gsiFtp supported

Page 8: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 8

Multiple access of one file

Pool 1 Pool 2 Pool 3

File 1File 1

Page 9: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 9

Multiple access of one file

Pool 1 Pool 2 Pool 3

File 1File 1 File 1 File 1

Page 10: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 10

Access to dCache

• Mountpoint– ls – mv– rm

• dCap– dccp <source> <destination> – dc_open(...) – dc_read(...)

• Preload library

• Gridftp– Problematic when file

needs to be staged first

• SRMCP

Intern Extern

Page 11: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 11

dCache environment

Internal nodes

file transferhead

node

pools

file

transfer

tape library

Page 12: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 12

dCache environment

Internal nodes

file transferhead

node

gsiftp

srm

pools

file

transfer

tape library

Page 13: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 13

dCache environment

Internal nodes

file transferhead

node

gsiftp

srm

pools

file

transfer

tape library

file transfer

gsiftp

Page 14: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 14

dCache environment

Internal nodes

file tra

nsfer

file transferhead

node

srmcpsrm

pools

file

transfer

tape library

srmcp

Page 15: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 15

PNFSPerfectly Normal File System

• gdbm databases

• Experiment specific databases

• Independent access

• Content of metadata:

– User file name

– File name within dCache

– Information about the tape location (storage class…)

– Pool name where the file is located

real data

0000000000000000000014F00000000000000000000015100000000000000000000015A00000000000000000000017E8000000000000000000001858

pool and tape

database for filenames

metadata

pnfs

Page 16: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 16

gsiftp• Only registered dCache user!!! grid-proxy-init globus-url-copy –dbg \ file:///tmp/file1 \ gsiftp://srm1.fzk.de/grid/fzk.de/mounts/pnfs/cms/file1• dCache gridftp client and server in Java• copy direct into available pool node

●pool: data is precious ● (can't be deleted)● flush into tape●data is cached (can be deleted from pool)

Page 17: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 17

srmcp

• Only registered dCache user!!!grid-proxy-init

srmcp –debug=true \

srm://srm.web.cern.ch:80//castor/cern.ch/grid/dteam/castorfile \

srm://srm1.fzk.de:8443//pnfs/gridka.de/data/ressmann/file2

srmcp –debug=true \

srm://srm1.fzk.de:8443//pnfs/gridka.de/data/ressmann/file2

file:////tmp/file2

Page 18: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 18

Firewall issues

• Connection to headnode: Ports 8443 and 2811

• Port Range to pool nodes: 20.000 to 50.000

Page 19: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 19

SRM Disk Version

• FNAL is currently developing a standalone SRM Disk version.

• The client uses a java version of gridftp• The server uses a standard globus gridftp. • It is far from production ready and needs:

– SQL Database– jdbc driver

• http://www-isd.fnal.gov/srm/unix-fs-srm/

Page 20: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 20

Page 21: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 21

Tape Management

• Tivoli Storage Manager (TSM) library management

• TSM is not developed for archiveInterruption of TSM archive

No control what has been archived

Page 22: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 22

Tape Management

• Tivoli Storage Manager (TSM) library management

• TSM is not developed for archiveInterruption of TSM archive

No control what has been archived

Page 23: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 23

Tape Management

• Tivoli Storage Manager (TSM) library management

• TSM is not developed for archiveInterruption of TSM archive

No control what has been archived

Page 24: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 24

dCache tape access

• Convenient HSM connectivity (done for Enstore, OSM, TSM, bad for HPSS)

• Creates a separate session for every file

• Transparent access

• Allows transparent maintenance at HSM

Page 25: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 25

dCache pool node20 GB

1h

800 GB

Page 26: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 26

dCache tape managementPrecious data is separately collected per 'storage class’Each 'storage class queue ' has individual parameters,

steering the tape flush operation.Maximum time, a file is allowed to be 'precious' per

'storage class'.Maximum number of precious bytes per 'storage

class‚Maximum number of precious files per 'storage

class‚Maximum number of simultaneous ‘tape flush'

operations can be configured

Page 27: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 27

Conclusion and Future Work

• Low cost read pools

• Reliable write pools

• Write once never change a dCache file

• Single point of failure

• Working SRM connection between CERN and FZK

• Connection to openlab at CERN

• Adding 15 Pool nodes for the 10 Gbit test from SRM to SRM

• Adding tape drives to increase throughput

• More at www.dcache.org

Page 28: GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.

GridKa January 2005

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

Doris Ressmann 28