Cameras Denise Ko¨lter - RWTH Aachen Universitykhadivi/files/Koelter_Ausarbeitung.pdf · This...

Rheinisch-Westfalische Technische Hochschule AachenLehrstuhl fur Informatik VIProf. Dr.-Ing. Hermann Ney

Seminar Computer Vision im WS 2004/2005

Cameras

Denise Kolter

Matrikelnummer 229 208

13.01.2005

Betreuer: Shahram Khadivi

2 CONTENTS

Contents

1 Introduction 3

1.1 The Human Eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Cameras 5

2.1 Pinhole Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.1 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Cameras with Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Paraxial Geometric Optics . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Thin Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Real Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5.1 CCD Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5.2 Sensor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Geometric Camera Models 12

3.1 Elements of Analytical Euclidean Geometry . . . . . . . . . . . . . . . . . . 123.2 Camera Parameters and the Perspective Projection . . . . . . . . . . . . . . 13

3.2.1 Intrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.2 Extrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Affine Cameras and Affine Projection Equations . . . . . . . . . . . . . . . 15

4 Radiometry - Measuring Light 16

4.1 Light in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Light at Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.1 A BRDF Database employing the Beard-Maxwell Projection Model 194.3 Important Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3.1 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3.2 Lambertian surface and albedo . . . . . . . . . . . . . . . . . . . . . 204.3.3 Generalization of the Lambertian Model and Implications for Ma-

chine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3.4 Visual Appearance of matte Surfaces . . . . . . . . . . . . . . . . . . 224.3.5 Specular Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Conclusion 23

6 Reference list 24

3

cristallinelens

iris

pupil optical axis maculalutea

fovea

eyeball

cornea

retina

opticalnerve

blind spot

Figure 1: The human eye (picture from [Rincon])

1 Introduction

In this report, I describe how a camera works and how this knowledge is used in computervision. Further, I explain how to measure light (radiometry). A camera uses radiometryby measuring the light which arrives the camera’s sensors.First, I will describe in chapter 1 the main parts of the human eye because of its similarityto a camera, based on [Baumann02] and [Forsyth03]. Then I will explain in chapter 2different types of cameras, i.e. cameras without lenses and cameras with lenses and sometypes of projection that can be used with them, based on [Forsyth03]. In chapter 3 I willgive a brief assumption about the elements of analytical Euclidean Geometry at first. Af-terwards, I will explain the parameters belonging to a camera, i.e. intrinsic and extrinsicones and how they are used for perspective projection. At the end of this chapter, I willshortly describe affine cameras and affine projection equations, based on [Forsyth03].The last chapter ,chapter 4, is about radiometry or how to measure light. I will startwith the behavior of light in space and at surfaces, based on [Forsyth03], with a brieferdescription about the Bidirectional Reflectance Distribution Function (BRDF), based on[Westlund]. Then, I will explain some special cases. The first one is radiance, followed bythe Lambertian surface and albedo, based on [Forsyth03] which goes along well witha generalization of the Lambertian model and implications for machine vision, basedon [Oren92], which leads to a look on the visual appearance of matte surfaces, basedon [Nayar95]. Last, I will describe how light behaves on specular surfaces, based on[Forsyth03].

4 1 INTRODUCTION

1.1 The Human Eye

In this section, based on [Baumann02] and [Forsyth03], I will give a brief look at thehuman eye, as a camera works similar to it. The human eye is composed of six mainelements, shown in Figure 1: the eyeball, the iris, the pupil, the cornea, the crystallinelens and the retina. It has skin and ligaments for protection, which open and close withthe help of muscles.Human eyes are inverted eyes which means that the light has to go through all cell-layersbefore it arrives the receptors. The cell-layers are horizontal cells, bipolar cells, amacrinecells and ganglion cells. They are needed to preprocess visual informations, i.e. the lightwhich is reflected or emitted by the eye. This is important because about 120 million ofrods, i.e. photoreceptors, share about 1 million of axons to the optical nerve, so coding andpreprocessing of information has to be done in the retina. Like a camera the eyeball lyingin the orbit covers a field of view , in this case this field covers 160(width)× 135(height).The field of view is a portion of scene space that actually projects onto the retina of theeye/camera. It depends on the focal length and on the effective area of the retina. Theiris, the surface of which is opaque and colored, and the pupil , the diameter of which variesfrom about 1 to 8 mm, control the amount of light permeating the eyeball. The cornea, ahighly curved and transparent window, and the crystalline lens refract the incoming lightlike the lens of a camera, so a picture is formed on the retina. The retina is a thin, layeredmembrane with about 120 million of extremely sensitive photoreceptors, named rods andabout 5 millions of cones. The rods are receptors for informations about brightness only(black-and-white viewing). The cones are partitioned into three receptor classes for colorperceptions. The rods are more light-sensitive than the cones, so they are important forviewing in dawn. These photoreceptors work in a range from about 350nm to 700nmwavelength. In the retina, light impulses are changed into electrical signals.When going from light into dark the human eye adapts in three ways. The fastest is thereaction of the pupil, which changes its diameter only about the factor 64. Less fast is theadaption of the cones, which causes that for a short time one sees colors in the dark. Theslowest adaption is the one of the rods, when this adaption is done the eye can just seeblack-and-white.In the center of the retina is a region where the concentration of cones is eminently high andwhere images are sharply focused whenever the eye fixes its attention on an special object.This region is called the macula lutea. In the middle of the macula lutea is a depressioncalled the fovea located, this is the region of the eye with the highest concentration ofcones, about 1.6 · 105/mm2. But there is also a blind spot on the retina namely at thepoint the optical nerve exits the retina. The optical nerve is a bundle of axons and onecentral blood-vessel system which consists of an artery and a vein to feed the retina withblood. The major task of the optical nerve is leading the electrical signals from the retinato the back of the occipital lobe of the brain which interprets this signals as visual images,so the real image processing is done by the brain. The refracting power of the eye is aneffect of refraction at the interface between the air and the cornea, deformations of thecrystalline lens fine tune this effect. Cameras with lenses use this effect, too. Refracting

power is measured in dioptrine (dpt) with dpt =1

meter. The overall refracting power of

the human eye is 58.6 dpt. The eye and a camera have got many similarities, which isbased upon the fact that a camera is a human made replica of the eye. The pinhole usedby cameras can be referred to the pupil and the iris, because in both cases the amountof incoming light is influenced. Also todays cameras and the eye use lenses to refract the

5

light and the camera’s sensors can be compared with the rods and cones of a human eye.

2 Cameras

In this chapter, based upon [Forsyth03], I describe the differences between cameras with-out lenses and cameras with lenses. Further, I explain some projection models for bothtypes of cameras. I start with the pinhole camera and the pinhole imaging model, whichleads to perspective projection and affine projection. Afterwards I describe cameras withlenses. This leads to paraxial geometric optics, which I need to explain the differencesbetween thin and simple thick lenses. At the end of this section, I use Charge-Coupled-Device cameras to introduce sensing.

The first camera model was the camera obscura, which was invented in the 16th cen-tury. This model used a pinhole instead of lenses to focus light rays. All nowadays camerasare based on the camera obscura. The imaging surface of a camera is generally a rectangle,while the human retina is spherical. Imaging sensors record either spatially discrete orcontinuous pictures. Spatially discrete pictures are recorded by human eyes, because theyhave rods and cones, 35mm cameras, which have grain, and digital cameras, that havepixels. The signal which an imaging sensor records can be discrete or continuous and con-sists either of a single value, e.g. black-and-white camera; few values, e.g. RGB intensitieson color camera, responses of the three types of cones for human eyes; many values, e.g.responses of hyperspectral sensors; or even a continuous function of wavelength, which isessential in the case of spectrometers.

2.1 Pinhole Cameras

At first, I illustrate how a camera without lenses works. Thus I declare projection modelswhich can be used for this type of cameras.

2.1.1 Projection

In this part of the report, I introduce two types of projection. The first one is the per-spective projection and the second one is the affine projection.

Perspective Projection

The pinhole perspective projection model , which was first proposed by Brunelleschi atthe beginning of the 15th century is mathematically convenient. It creates an invertedimage, so it is sometimes functional to consider a virtual image associated with a planelying in front of the pinhole at the same distance from the pinhole as the actual imageplane as shown in Figure 2.The apparent size of objects depends on their distance from the pinhole. Lines in Πparallel to the image plane do not have an image at all. It is often convenient to reason interms of reference frames, coordinates and equations. The optical axis in Figure 2 is theline perpendicular to Π′ and passing through the pinhole. The point it pierces Π′ is calledthe image center . The image center can be used as origin of an image plane coordinateframe. It plays an important role in camera calibration procedures.Example:A coordinate system (O, i, j, k) attached to a pinhole camera with origin O (coincides with

6 2 CAMERAS

Virtualimage

Pinhole

Imageplane

P’

C’: image center O

optical axisf’

P

P’

Figure 2: The pinhole imaging model

the pinhole) and vectors i and j form a basis for a vector plane parallel to the image planeΠ′, located at positive distance f ’ from the pinhole along vector k. In this coordinatesystem I examine a scene point P with coordinates (x, y, z), the image point P ′ of P withcoordinates (x′, y′, z′), where z′ = f ′ if P ′ in Π′, I assume that P , P ′ and O are collinear.So ~OP ′ = λ ~OP for some number λ

x′ = λx

y′ = λy

f ′ = λz

⇔ λ =x′

x=

y′

y=

f ′

z(1)

⇒

x′ = f ′ ·x

z

y′ = f ′ ·y

z

(2)

Affine Projection

In this section, I focus on two specific affine models, i.e. the weak-perspective and theorthographic projections. The fronto-parallel plane Π0 in Figure 3 is defined by z = z0.For any point P in Π0, Equation 2 can be written as

x′ = −mx

y′ = −mywhere m = −

f ′

z0, (3)

z0 is negative because the plane must be in front of the pinhole, so m associated withplane Π0 is positive. I consider points P and Q in Π0, image points P ′, Q′ which are theimages of P , Q. So it is obvious that ~PQ|| ~P ′Q′ and | ~P ′Q′| = m · | ~PQ| is the dependenceof image size on object distance.The weak perspective or scaled orthographic projection model is used, when the scenedepth is small relative to the average distance from the camera. This model refers to a

2.2 Cameras with Lenses 7

j

P

PP’0

j

P

QO

P

P’P’

Q

Q’Q’ kk

iQ’

P’f’ -z0

Figure 3: Weak-perspective (black) and orthographic (red) projection

constant magnification, so the image coordinates can be normalized, so that m = −1.This orthographic projection is defined by

x′ = x

y′ = y(4)

with all light rays parallel to the k-axis and orthogonal to the image plane Π′. But it isunrealistic to suppose that pure orthographic projection is used for imaging models.

2.2 Cameras with Lenses

There are two main reasons to use cameras with lenses. First is gathering light, because alarge pinhole gives blurry pictures and a small one reduces the amount of light, which maycause diffraction effects. Second, a lens keeps the picture in sharp focus while it gatherslight from a large area.The simplistic behavior of lenses is dictated by the laws of geometric optics:(1) light travels in straight lines (light rays) in homogeneous media,(2) a ray, its reflection from a surface and the surface normal are coplanar , i.e. they lie inthe same plane, and(3) a ray passing from one medium to another is refracted .According to Snell’s law for a ray r1 incident to the interface between two transparentmaterials with indexes of refraction n1, n2 and a refracted ray r2 is applied that r1, r2

and the normal of the interface are coplanar, and the angles between the coplanar and therays are related by

n1 sinα1 = n2 sinα2. (5)

These laws are illustrated in Figure 4.

2.2.1 Paraxial Geometric Optics

In paraxial / first-order geometric optics the angles between all light rays going througha lens and the normal to the refractive surfaces of the lens are small. I also suppose thatthe lens is rotationally symmetric about the optical axis and all refractive surfaces arespherical , as shown in Figure 5. A light ray passes the optical axis in point P1 and is re-fracted at the point P of the circular interface of radius R with center C, which separates

8 2 CAMERAS

n2

n1

r2

r ’1

r1 a1

a2

a1

Figure 4: Snell’s law

d1 d2

R

P

C

hP2 P1

b2 b1

gb2 b1

a1

a2

Figure 5: paraxial refraction

2.3 Thin Lenses 9

O FF’

f f

P

P’

z’ -z

-y

yP

P’

y

-y’

z’ -zff

H’H

F’ F

Figure 6: thin lens vs. simple thick lens

two transparent media with indexes of refraction n1 and n2. The refracted ray passesthe optical axis in point P2. The angles between the two rays and the chord joining Cto P with distance h between P and the optical axis are named α1 and α2. While β1

/ β2 are the angles between the optical axis and the line joining P1 / P2 to P , so theangle between the optical axis and the line joining C to P is γ = α1 − β1 = α2 + β2. If

all angles are small, they equal their sines and tangents: α1 = γ + β1 ≈ h(1

R+

1

d1) and

α2 = γ − β2 ≈ h(1

R−

1

d2), where d1/d2 are the distances between the points P1/P2 and

the interface. Snell’s law for small angles yields to the paraxial refraction equation:

n1α1 ≈ n2α2 ⇔n1

d1+

n2

d2=

n2 − n1

R(6)

So the relationship between d1 and d2 depends only on R, n1 and n2 but not on β1 andβ2.

2.3 Thin Lenses

Here, I assume using a lens with two spherical surfaces of radius R and index of refractionn, this lens is surrounded by vacuum with an index of refraction equal to 1. The index ofrefraction of a transparent medium depends on wavelength. When the lens is thin a rayentering the lens and being refracted at its right boundary is immediately refracted againat the left boundary, as shown in the left side of Figure 6.Ray (PO) is not refracted and all other rays are passing through point P are focused by

the thin lens on the point P ′ with depth z′ along (PO) such that1

z′−

1

z=

1

f, where

f =R

2(n − 1)is the focal length of the lens. The two focal points of the lens F and F ′ are

located at distance f from the lenses diameter.

2.4 Real Lenses

As models with thin lens are very simplified, more realistic models act on the assumptionof using a thick lens, as shown in the right side of Figure 6. The principal points ofthe lens are H and H ′ and the only undeflected ray is the one along the optical axis.

10 2 CAMERAS

d’P’ P

P’circle of least confusion

Figure 7: Spherical Aberration

Simple lenses suffer from a number of aberrations. The reason for this problem is thatrays striking the interface farther from the optical axis are focused closer to the interfacebecause the paraxial refraction is only an approximation. This phenomenon is the sourcefor two different types of spherical aberration, namely the longitudinal spherical aberrationand the transverse spherical aberration, shown in Figure 7. The longitudinal sphericalaberration is the distance between a point P ′, which is the paraxial image of a point Plying on the optical axis, and the intersection of the optical axis with a ray issued fromP and refracted by the lens. The transversal spherical aberration is the distance betweenthe optical axis and the point the rays intersects an image plane Π′ erected in P . All rayspassing through P that are refracted by the lens form a circle of confusion in P as theyintersect Π′. Its size changes by moving Π′ along the optical axis. The smallest circleis called the circle of least confusion. There are four other types of primary aberrations:coma, astigmatism field, field curvature and distortion. All these aberrations degrade theimage by blurring the picture. While distortion changes the shape of an image as a whole,the other three blur the picture of every single object point. There exists a phenomenonof chromatic aberration because the focal length depends on the wavelength, i.e. refractedrays corresponding to different wavelength intersect the optical axis at different points,this is called longitudinal chromatic aberration. These rays also form different circles ofconfusion in the same image, this is called transverse chromatic aberrations.

Aberrations can be minimized by compound lenses which can be modeled by thicklens equations. But they suffer from one more defect which is relevant to machine vision,which is called vignetting . Vignetting causes brightness to drop-in image periphery and itmay cause problems to automated image analysis programs.

2.5 Sensing

The difference between the camera obscura and a modern camera are that a modern onecan record the pictures that form on its backplane. This is done in different ways, themost popular one is used by CCD cameras.

2.5 Sensing 11

2.5.1 CCD Cameras

Charge-Couple-Device (CCD) cameras were proposed in 1970 and replaced the vidicon(a glass envelope with electron gun and faceplate) cameras in most modern applications.CCD cameras use a rectangular grid of electron collection sites laid over a thin siliconwafer. The photo-conversion works in the following way: When photons strike the silicon,electron-hole pairs are generated and the electrons are captured by the potential well whichis formed by applying a positive electrical potential to the corresponding gate. All electronsthat are generated at each site are collected over a fixed time. Now the charges storedat the individual sites are moved using charge coupling , that means charge packets aretransferred from one side to the other by manipulating the gate potentials. The packagesstay separated the whole time.The digital output of most CCD cameras is transformed internally into an analog videosignal before being passed to a frame grabber that constructs the final digital image.Consumer-grade color CCD cameras use the same chips as black-and-white ones, but therows and columns of the sensor are more sensitive to red, green or blue light, while higherquality cameras use beam splitter to ship the image to three different CCDs via colorfilters.

2.5.2 Sensor Models

For simplicity, this section includes only a discussion about the sensor models of black-and-white CCD cameras.Assume the number I of electrons which are recorded at the cell, located at row r andcolumn c of the CCD array can be modeled as

I(r, c) = T

∫

λ

∫

p∈S(r,c)E(p, λ)R(p)q(λ)dpdλ (7)

with electron-collection time T , spatial domain of the cell and range of wavelength towhich the CCD has a nonzero response S(r, c), E the power per unit area and unit wave-length arriving at point P , spatial response of the site R and quantum efficiency of thedevice q. I will later come back to this in the chapter about radiometry.There are physical phenomena that alter the ideal camera model, for example blooming ,which occurs when the light source illuminating a collection side is so bright that thecharge stored at that site overflows into adjacent ones. It can be avoided by illuminationcontrol. Other factors like fabrication defects, thermal and quantum effects and quantiza-tion noise are inherent. Discretization of analog voltage by the frame grabber introducesboth geometric effects and quantization noise. In this case the occurring geometric effectis mostly line jitter and it can be corrected via calibration, while the quantization noisecan be modeled as a zero-mean random variable. This is for a digital signal D(r, c):

D(r, c) = γ(NI(r, c) + NDC(r, c) + NB(r, c) + R(r, c)) + Q(r, c) (8)

, where NI(r, c) is the number of electrons generated by the photoconversion process ofthe sensors. Furthermore NDC(r, c) is the number of electrons generated by the thermalenergy, their contribution is called dark current. The electrons introduced by the CCDelectronics (bias) are represented by NB(r, c), while R(r, c) characterizes the read-out noisecaused by the output emplifier. Lastly Q(r, c) names the quantization noise.

12 3 GEOMETRIC CAMERA MODELS

3 Geometric Camera Models

In this chapter, based upon [Forsyth03], I establish the constraints between image mea-surement and the position and orientation of geometric figures measured in some arbitraryexternal coordinate system. In the first part of this chapter, I give a short summary aboutthe most important elements of analytical Euclidean geometry. After this the physicalparameters which are important for world and camera coordinate frames are introduced.This are used for rigid transformation. At the end of this chapter, I explain the affinemodel of the imaging process.

3.1 Elements of Analytical Euclidean Geometry

In a three-dimensional Euclidean space, a point O (origin) and three vectors i, j andk, which are orthogonal to each other, form an orthonormal coordinate frame called(F ) = (O, i, j, k). For a point P lying in a plane Π with point A ∈ Π and unit vectorn perpendicular to the plane are characterized by

~AP · n = 0. (9)

For a point P =

xyz

and a vector n =

abc

in a coordinate system (F ) it is shown

that ~OP ·n− ~OA ·n = 0 or ax + by + cz − d = 0 with distance d := ~OA ·n between originO and plane Π independent of point A in Π.Sometimes it is useful to use the homogeneous coordinate vector , for example for a pointP

(a b c − d) ·

xyz1

(10)

or

Π · P = 0, where Πdef=

abc−d

and Pdef=

xyz1

. (11)

If more than one coordinate system is considered the vector of the point P in the frame(F ) is written as F P :

F P = F ~OP =

xyz

⇔ ~OP = xi + yj + zk (12)

When two coordinate systems (A) = (OA, iA, jA, kA) and (B) = (OB, iB, jB, kB) are con-sidered and the basis vectors of (A) and (B) are parallel to each other, the two coordinatesystems are separated by pure translation. In this case, it is given ~OBP = ~OBOA + ~OAP ,thus BP = AP + BOA. If the two origins of (A) and (B) are the same and the basis

3.2 Camera Parameters and the Perspective Projection 13

(A)

(B)

P

OB

iBjB

kb

iA

jAkA

OA

Figure 8: Rigid Transformation

vectors of (A) and (B) are not parallel, the two coordinate systems are separated by purerotation. The rotation matrix can be defined as

BAR

def=

iA · iB jA · iB kA · iBiA · jB jA · jB kA · jB

iA · kB jA · kB kA · kB

= (BiABjA

BkA) =

AiTB

AjTB

AkTB

(13)

⇒ ABR = B

ART

Rotation matrices are characterized by ABR−1 = A

BRT and det ABR = 1. The set of rotation

matrices forms a non-commutative group.If origins and basis vectors of two coordinate systems are different the frames are sep-

arated by a generalized rigid transformation and it is BP = ABR AP + BOA. A rigid

transformation maps a coordinate system onto another one (see Figure 8, it can also beconsidered as a mapping between points, for a rotation matrix RF of frame (F ) and a

vector t ∈ R3: F P ′ = RF + t ⇔

(

F P ′

1

)

=

(

R tOT 1

)

·

(

F P1

)

If R is replaced by an arbitrary nonsingular 3 × 3 matrix, the equation still represents amapping between points but the lengths and angles may not be preserved anymore.

3.2 Camera Parameters and the Perspective Projection

There exists two different kinds of parameters referring to cameras, intrinsic and extrinsicones. The intrinsic parameters relate the camera’s coordinate system, fixed to the imageplane, to an idealized coordinate system, while the extrinsic parameters relate it to a fixedworld coordinate system and specify its position and orientation in space.

3.2.1 Intrinsic Parameters

From now on I assume that a normalized image plane is associated with a camera. Thisplane is parallel to its physical retina but located at a unit distance from the pinhole

14 3 GEOMETRIC CAMERA MODELS

with an own coordinate system. The perspective projection in this normalized coordinatesystem is

u =x

z

v =y

z

⇔ p =1

z(Id 0)

(

P1

)

, pdef= (u, v, 1)T , (14)

where p is the vector of the projection of the point P = (x, y, z) into the normalized imageplane and u and v are the coordinates of the image point p. Equation 14 yields to

u = kfx

z,

v = lfy

z.

(15)

As pixels are normally rectangular, the camera has two additional scale parameters k and

l. So, there is a distance f in meters and a pixel with the dimensions1

k×

1

l, with unit of k

and l is pixel×m−1. The parameters k, l and f are not independent and can be replacedby magnifications α = kf and β = lf in pixel units.Two parameters u0 and v0 are added because the origin of a camera usually is at a cornerC of the retina and not at its center. The position of the principal point C0 in the retinalcoordinate system is defined by u0 and v0. This yields to

u = αx

z+ u0,

v = βy

z+ v0.

. (16)

If the camera coordinate system is skewed due to some manufacturing error, so the angleθ between the image axes is not equal to 90 equation 16 is altered into

u = αx

z− α cot θ

y

z+ u0,

v =β

sin θ

y

z+ v0.

. (17)

Combining equation 14 and equation 17 yields to

p = Kp, where p =

uv1

and K :=

α −α cot θ u0

0β

sin θv0

0 0 1

(18)

⇒ p =1

zMP, where M := (K 0) and P = (x y z 1)T homogeneous coordinate vector of

P .

3.3 Affine Cameras and Affine Projection Equations 15

3.2.2 Extrinsic Parameters

A camera frame (C) distinct from the world frame (W ) yields to

(

CP1

)

=

(

CW R COW

OT 1

) (

W P1

)

and p =1

zMP, M = K(R t). (19)

R = CW R is a rotation matrix, t = COW is a translation matrix and P = ( W x, W y, W z, 1)T

is the homogeneous coordinate vector of point P in (W ).This equation can be used to determine the position of the camera’s optical center O inthe world coordinate system, its homogeneous coordinate vector O verifies MO = 0. IfM = (A b), where A is a nonsingular 3×3 matrix and b is a vector in R

3, then O = −A−1b.Because z is not independent of M and P , I can rewrite equation 19 to

u =m1 · P

m3 · P,

v =m2 · P

m3 · P.

(20)

A projection matrix can now be written as a function of its five intrinsic (α, β, u0, v0 and θ)and its six extrinsic parameters (the three angles defining R and the three coordinates oft) as

M =

αrT1 − α cot θrT

2 + u0rT3 αtx − α cot θty + u0tz

β

sin θrT2 + v0r

T3

β

sin θty + v0tz

rT3 tz

, (21)

where rT1 , rT

2 , rT3 are the three rows of matrix R and tx, ty, tz the coordinates of the

vector t.

3.3 Affine Cameras and Affine Projection Equations

In this section I explain affine cameras and affine projection equations which are used toapproximate the imaging process when the scene relief is small compared with the overalldistance separating it from the camera observing it.

Affine projection models are orthographic projection, parallel projection, weak-perspectiveprojection and paraperspective projection. Under orthographic projection the imagingprocess is simply modeled as an orthogonal projection onto the image plane. It is reason-able for distant objects lying at a roughly constant distance from the camera. The parallelprojection model subsumes the orthographic one and takes into account that the objectsof interest may lie off the optical axis of the camera. Weak-perspective and paraperspectiveprojection models generalize the orthographic and the parallel model to allow for varia-tions in the depth of an object relative to the camera observing it. Let O denote the opticalcenter, R a scene reference point and P a scene point as in Figure 9. Weak-perspectiveprojection of P is constructed in two steps:

1. P is projected orthogonally onto a point P ′ of the plane Πr parallel to the imageplane Π′ and through R.

2. Perspective projection is used to map the point P ′ onto the image point p.

16 4 RADIOMETRY - MEASURING LIGHT

O

p

q

r

R

P’

QQ’

PP’

p

r

PP’

Q

R

Pr

Figure 9: Affine projection models: weak-perspective (black) and paraperspective (red)projection

The paraperspective model takes into account both the distortions associated with areference point that is off the optical axis of the camera and possible variations in depth:Let ∆ be the line joining O to R

1. Parallel projection of the direction of ∆ is used to map P onto P ′ of Πr

2. Perspective projection is used to map P ′ to p.

In both models the second step is a scaling of the image coordinates.

4 Radiometry - Measuring Light

In this chapter I explain the basics of radiometry, i.e. measuring light, which is used incameras to measure the light which arrives the camera’s retina. Radiometry describeshow energy is transformed from light sources to surface patches. First I give an accountof the behavior of light in space and the occurring effects, based on [Forsyth03]. After thisI declare what happens to light at surfaces and specify the Bidirectional Reflectance Dis-tribution Function with the Beard-Maxwell Projection Model, on the basis of [Forsyth03]and [Westlund]. After this fundamentals I describe some important special cases, startingwith radiosity and Lambertian surfaces and albedo, both are adapted from [Forsyth03],followed by the generalization of the Lambertian Model and Implications for Machine Vi-sion, based on [Oren92]. At last I describe the visual appearance of matte surfaces, on thebasis of [Nayar95], and the one of specular surfaces, based on [Forsyth03].

4.1 Light in Space

To have a brief look at the behavior of light in space I define some basics first.As a source is tilted with respect to the direction in which the illumination is traveling it’looks smaller’ to a patch of surface viewing the source. As a patch is tilted to the directionin which the illumination is traveling, it ’looks smaller’ to the source. This effect is calledforeshortening . It is an important phenomenon because the effect of a distant source ona surface depends on the looking of the source from the point of view of the surface. If

4.1 Light in Space 17

r

dlNdf

Q1

Figure 10: Angle

two different sources result in exactly the same amount of radiation arriving along eachincoming direction, they must have the same effect on the surface.

The pattern a source generates on an input hemisphere can be described by the solidangle that the source subtends. An angle subtended on the plane by an infinitesimal linesegment of length dl at a point p can be obtained by projecting the line segment onto theunit circle whose center is p as shown in Figure 10. The length of the result is the requiredangle in radians. The angle depends on the distance to the center of the circle and on theorientation of the line:

dφ =dl cos θ1

r. (22)

Similarly the solid angle subtended by a patch of surface at a point x is obtained byprojecting the patch onto the unit sphere, the center of which is at x. The area of theresult is the required solid angle ω. The unit of the solid angle is steradians.If the area of patch dA is small, there is an infinitesimal solid angle, its subtends are easilycomputed in terms of the area of the patch and the distance to it as

dω =dA cos θn

r2. (23)

The distribution of light in space is a function of position and direction.

Definition 1

The appropriate unit for measuring the distribution of light in space is radiance, whichis defined as the power traveling at some point in a specified direction, per unit areaperpendicular to the direction, per unit solid angle.


The units of radiance are watts per square meter per steradian, W ×m−2 × sr−1. A basicphenomenon in radiometry is that a small surface patch viewing a source frontally collectsmore energy than the same patch viewing a source along a nearly tangent direction. Sothe amount of energy a source collects depends on the size of the patch and its directionto the source. The square meters in the radiance are foreshortened.The radiance at a point P in direction of the vector v is denoted L(P, v) or if P lies in asurface L(P, ω, φ), where ω and φ are spherical coordinates.It is important that ’radiance is constant on a straight line’ . Assume that light does notinteract with the medium through which it travels, i.e. light in vacuum. The radianceleaving point P1 in direction of point P2 is the same as the radiance arriving at point P2

from the direction of point P1. [Forsyth03]

4.2 Light at Surfaces

In this section I explain what happens to light at surfaces. If light hits a surface it may beabsorbed, transmitted or scattered. Usually a combination of these three effects occurs.But the effect of fluorescence, that some surfaces absorb light at one wavelength andradiate light at a different one, complicates the picture further.So some simplifying assumptions are needed. For this I assume a local interaction model,that means, I assume that all effects are local and can be explained by a model with nofluorescence or emission. In this model the radiance leaving a point on a surface is dueonly to radiance arriving at this point, all light leaving a surface at a given wavelength isdue to light arriving at this wavelength and surfaces do not generate light internally anddo not treat sources separately.There exists a function describing the relationship between incoming illumination andreflected light, it is called the Bidirectional Reflectance Distribution Function (BRDF).The BRDF is a function of the direction in which light arrives at a surface and thedirection in which it leaves.First I define irradiance, this is the unit for representing the incoming power, defined asthe incident power per unit area not foreshortened. This means, a patch of surface ofarea dA illuminated by the radiance Li(P, θi, φi) coming in from a differential region ofthe solid angle dω at angles (θi, φi) receives irradiance

(1

dA)(Li(P, θi, φi))(cos θidA)dω = Li(P, θi, φi) cos θidω. (24)

So the radiance is multiplied by the foreshortening factor and by the solid angle to getirradiance.The measured pixel intensity is a function of irradiance integrated over the pixel’s area,a range of wavelength and some time. In the ideal case the measured pixel intensity isproportional to the radiance

I =

∫

t

∫

λ

∫

x

∫

y

E(x, y, λ, t)s(x, y)dydxdλdt. (25)

The BRDF is the most general model of local reflection. It is defined as the ratio of theradiance in the outgoing direction to the incident irradiance. If a surface is illuminatedby radiance Li(P, θi, φi) coming from a differential region of the solid angle dω at angles(θi, φi) was to emit radiance Lo(P, θo, φo), so its BRDF would be

ρbd(θo, φo, θi, φi) =Lo(P, θo, φo)

Li(P, θi, φi) cos θidω. (26)

4.2 Light at Surfaces 19

The units of the BRDF are inverse steradians (sr−1). The BRDF could vary from 0 (nolight reflected in that direction) to infinity. It is symmetric in the incoming and outgoingdirection, this is known as the Helmholtz reciprocity principle.The radiance leaving a surface due to irradiance in a particular direction is easily obtainedfrom the definition of the BRDF:

Lo(P, θo, φo) = ρbd(θo, φo, θi, φi)Li(P, θi, φi) cos θidω. (27)

So the radiance leaving a surface due to its irradiance is obtained by summing over con-tributions from all incoming directions:

Lo(P, θo, φo) =

∫

Ωρbd(θo, φo, θi, φi)Li(P, θi, φi) cos θidω, (28)

where Ω is the incoming hemisphere.[Forsyth03]

4.2.1 A BRDF Database employing the Beard-Maxwell Projection Model

The Beard-Maxwell reflection model (BM) predates many of the other reflection modelsand foreshadows several of the features incorporated for these. The BM is the alias methodfor generating random variates. It captures subtle BRDF characteristics required in real-istic image synthesis. The BM is a non-invertible function.The BM was first used to describe the reflection properties of rough, painted surfacesdisplaying Fresnel effects, later several other surfaces were added. It is build on the as-sumption that material surface is a three dimensional terrain of microfacets of varyingorientation. Either light is reflected by one of the microfacets (first surface reflectance,ρfs) or scattered of the surface after having first entered the sub-surface medium (volu-metric reflectance, ρvol). So the BM takes the form

ρ(θo, φo, θi, φi) = ρfs(θo, φo, θi, φi) + ρvol(θo, φo, θi, φi). (29)

First surface reflectance causes light to be reflected in the specular direction of each in-dividual microfacets. So it is determined by the distribution of the micro-facets’ normalsgiven by the density function Ξ(θ, φ):

ρfs(θo, φo, θi, φi) =R(β)Ξ(H)

4 cos θi cos θoSO, (30)

where H is the half angle vector, R(β) is the Fresnel reflectance of the bistatic angle β andSO is a shadowing and obscuration term, which accounts for the height distribution ofthe microfacets. Light reflected from the first surface is assumed to maintain its originalpolarization while light reflected through volumetric scattering is assumed to be totallydepolarized. This can be described by using the four combinations of polarization, whichemerge from the fact that incoming light and reflected light are viewed separately. Thefirst orientation describes the polarization of the incoming light, with respect to the planeof incidence. The second one describes the polarization of the outgoing light with respectto the reflectance plane. So we have ρ⊥⊥, ρ⊥‖, ρ‖⊥ and ρ‖‖:

ρfs = (ρ⊥⊥ − ρ⊥‖) + (ρ‖‖ − ρ‖⊥),

ρvol = 2ρ⊥‖ + 2ρ‖⊥.(31)


For the BM exists a large data pool, the Nonconventional Exploitation Factors Data Sys-tem (NEFDS), which contains over 400 materials, varying from dirt to tree canopies, whichfall in 12 categories, that are asphalt, brick, camouflage, composite, concrete, fabric, wa-ter, metal, paint, rubber, soil and wood. All materials in the NEFDS are represented wellby the NEF-BM, which is a modified form of the BM, which uses the Lambertian and thedirectional diffuse reflectance and not just one of them like the original BM does. Furtherall materials are applicable to the area of remote sensing and include objects which wouldbe viewed from a remote sensor (e.g a satellite). [Westlund]

4.3 Important Special Cases

4.3.1 Radiosity

If the radiance leaving a surface is independent of the exit angle, there is no point describingit using a unit that explicitly depends on direction. The appropriate unit for this isradiosity, which is the total power leaving a point on a surface per unit area on thesurface, B(P ), with units Watts per square meter (W × m−2). The radiosity of a surfaceat one point is the sum of the radiance leaving the surface at that point. If the point Pemits radiance L(p, θ, φ), the radiosity at that point is B(P ) =

∫

Ω L(P, θ, φ) cos θdω, whereΩ is the exit hemisphere, dω = sin θdθdφ and cos θ turns the foreshortened area into area.If a surface has a constant radiance, its radiosity is B(P ) = ΠL0(P ). [Forsyth03]

4.3.2 Lambertian surface and albedo

If the BRDF of a surface is independent of outgoing direction, the directional hemisphericreflectance, which is simply the fraction of the incident irradiance in a given direction thatis reflected by the surface, does not depend on illumination direction. Such surfaces areknown as ideal diffuse or Lambertian surfaces, where the directional hemisphere reflectanceis often called the diffuse reflectance or the albedo, ρd. For a Lambertian surface withBRDF ρbd(θo, φo, θi, φi) = ρ yields to

ρd =

∫

Ωρ cos θodωo = Πρ, (32)

which is often used in the form ρBRDF =ρd

Π.

A Lambertian surface will look equally bright from any direction, whatever the directionalong it is illuminated, because the sensation of brightness corresponds to measurementof radiance. [Forsyth03]

4.3.3 Generalization of the Lambertian Model and Implications for Machine

Vision

A non-Lambertian surface is primary caused by surface roughness. A surface can be viewedas a collection of planar facets, at high magnification each pixel includes a single facet,but at lower magnification it can include a large number of them. If this is the case, thereflectance is not Lambertian. This effect is called backscattering , i.e. the surface radiatesmore energy back to the source than in the normal direction or the forward direction. Ifa surface is isotropic, the radiance and the BRDF do not change if the surface is rotatedabout its normal vector. Here the surface is assumed to be modeled as a collection oflong symmetric V-cavities, with two opposing facets each, which are assumed to be much

4.3 Important Special Cases 21

larger than the wavelength of the incident light (V-cavity roughness model). The V-cavityroughness model can be used to describe surfaces with isotropic and anisotropic roughness.Further is λ2 ≪ da ≪ dA assumed, where λ is the wavelength of the incident light, dais the area of a facet and dA is the area of the surface patch. I denote the slope andorientation of each facet in the V-cavity model as (θa, φa). The facet-number N(θa, φa)and the slope-area distribution P (θa, φa) are related as follows:

P (θa, φa) = N(θa, φa)da cos θa. (33)

Three types of surfaces with different slope-area distributions are known:

a Uni-directional Single-Slope Distribution: non-isotropic surface, all facets have the sameslope and all cavities are aligned in the same direction.

b Isotropic Single-Slope Distribution: all facets have the same slope but are uniformlydistributed in orientation on the surface plane.

c Gaussian Distribution: most general case, slope-area distribution is assumed to be nor-mal with zero mean.

Looking at a single V-cavity both facets fully illuminated from a source on the right side,it seems, if both facets are Lambertian with equal albedo, that the left facets appearsbrighter than the right one, because it receives more incident light. To a distant observerviewing from the left a larger fraction of the foreshortened cavity area is dark and a smallerone is bright. Moving to the right the fraction of brighter area increases while the darkerone decreases. So the totally brightness or radiance of the cavity increases as the observerapproaches the source direction.For computing the contribution of the facet to the radiance of the surface patch theprojected area da cos θa is needed instead of the actual facet area da. The radiance con-tribution thus determined is called the projection radiance of the facet:

Lrp(θa, φa) =dΦo(θa, φa)

(da cos θa) cos θodωo(34)

The total radiance of the surface can be obtained as the average of the projection radianceof all facets on the surface:

Lr(θo, φo, θi, φi) =

∫ π

2

θa=0

∫ 2π

φa=0P (θa, φa)Lrp(θa, φa) sin θadφadθa (35)

Using this equations for uni-directional single-slope distribution model the radiance varieswith the viewing direction. So a rough Lambertian surface comprise of tilted facets isnon-Lambertian. The same result occurs if a model for isotropic single slope distributionis used. Further the effects of shadowing , i.e. a facet is only partially illuminated, becausethe adjacent facet casts a shadow on it, and masking , i.e. a facet is only partially visibleto the sensor, because the adjacent facet occludes it. Because of these effects a geometricalattenuation factor (GAF) is needed, which lies between zero and unity.In this reflectance model interreflections occur because light rays bounce between adja-cent facets. This is specially significant for rough surfaces with high albedo values, becausewhen the surface is illuminated from large angles and viewed from the opposite side ofthe large angles, none of the facets visible to the sensor are illuminated by the source, but


from the adjacent facets, that face the source and are therefore illuminated by it. In thecase of a Lambertian surface the energy in an incident light ray diminishes rapidly witheach interreflection bounce.The Lambertian model is a special case of the model for Gaussian slope-area distribution.What does this mean for machine vision? It is clear, that incorrect modeling of reflectanceleads to inaccurate results, therefore rough diffuse surfaces cannot be assumed to be Lam-bertian in reflectance.[Oren92]

4.3.4 Visual Appearance of matte Surfaces

Matte surfaces emerge from body reflection, which consists of the light that finds its wayback to the surface after penetrating it. Lambert’s law predicts that the radiance Lr of

an ideal matte surface point isp

πcos θi, where p the albedo or reflectivity represents the

fraction of the total incident light by the surface, and θi is the incident angle betweenthe surface normal and the illumination direction. This law says, that the brightness of ascene point is independent of the observer’s viewpoint. Lambert’s law may hold well fora single planar facet, but a collection of such facets with different orientations violates it.So it is only practical for near sight, and not for far sight.Because all models motivated by the non-Lambertian reflectance of the moon are severelylimited in scope, a new reflectance model was developed, that described the relation be-tween macroscopic surface and sensor resolution. The surface patch imaged by each sensoris modeled as a collection of V-shapes cavities, each with two planar Lambertian facets.This model captures the foreshortening of the individual facets, masking, shadowing andinterreflections between adjacent facets:

L(θo, θi, φo − φi, σ) =p

φE0 cos θi(A + BMax[0, cos(φo − φi)] sin α tan β), (36)

A = 1.0 − 0.5σ2

σ2 + 0.33, B = 0.45

σ2

σ2 + 0.09,

where E0 is the intensity of the source, (θo, φo) and (θi, φi) are the observer and theilluminant directions and α = Max(θo, θi) and β = Min(θo, θi). This can be viewed as ageneralization of Lambert’s law with σ = 0.But experiments have led to a curios observation [Nayar95]:

The model predicts that for very high macroscopic roughness, when the observerand the illuminant are close to one another, all surface normals will generate ap-proximately the same brightness. This implies that, a three-dimensional object,irrespective of its shape, will produce nothing more than a silhouette with constantintensity within. In case of polyhedra, edges between adjacent faces will no longerbe discrete, and smoothly curved objects will be devoid of shading.

4.3.5 Specular Surfaces

Another important surface-class is build by the glossy or mirror like surfaces also calledspecular surfaces. An ideal specular reflector behaves like an ideal mirror. So radiationarriving along a particular direction can leave only along the specular direction. But onlya few surfaces can be approximated as ideal specular reflectors. Normally the radiationarriving in one direction leaves in a small lobe of directions around the specular direction,

23

speculardirection

speculardirection

dQ

frompointsource

frompointsource

Figure 11: Phong model

this causes a blurry effect. If larger specular lobes occur the specular image is more heavilydisturbed and darker. The Phong model in Figure 11, which is used to describe the shapeof the specular lobe in terms of the offset angle from the specular direction, assumes thatonly point light sources are specularly reflected. The radiance leaving a specular surface isproportional to cosn(δθ) which is of course the same as cosn(θo − θs), where θo is the exitangle, θs is the specular direction and n is a parameter. If n is large, the lobe is narrowand there are small, sharp specularities, while a small n leads to a broad lobe and largespecularities with fuzzy boundaries. [Forsyth03]

5 Conclusion

In the introduction I described the human eye, which works similar to a camera. The pupilcan be seen as a pinhole and the retina as the image plane, and of course, the crystallinelens works like the lens of a camera.Then I introduced in Chapter 2 - Cameras some projection models, beginning with theoldest one, which is the pinhole perspective projection, followed by the weak perspectiveand the orthographic projection model, which are all models for pinhole cameras withoutlenses. After this I explained why lenses are important in camera usage and gave theparaxial refraction equation. This was completed by a section about thin and real (thick)lenses and a description how these lenses work. I finished this chapter with an introductionabout sensing and CCD Cameras.In Chapter 3 - Geometric Camera Models I gave an abstract of the elements of analyticalEuclidean geometry and how they influence the use of cameras and rigid transformations.These are used for computing intrinsic and extrinsic camera parameters. These parametersdescribe the camera’s coordinate system related to the world coordinate system. Lastly, Idescribed how affine projection works using weak-perspective projection and paraperspec-tive projection.In Chapter 4 - Radiometry I gave a brief summary about the behaviour of light in spaceand at surfaces. At last I explained some special cases for radiometry, which are radiosity,Lambertian surfaces and albedo which lead to a generalization of the Lambertian modeland implications for machine vision. This model leads to a look on the visual appearanceof matte surfaces and in contrast to them one on specular surfaces. This report does notcover the whole field of cameras in computer vision, an important area is camera calibra-

24 REFERENCES

tion. For the right usage of cameras in computer vision one has not only to know aboutcamera types and models but about camera calibration, too.The field of radiometry is not fully explored yet, as seen on the example of the model formatte surfaces, developed by Oren and Nayar in 1995.

6 Reference list

References

[Forsyth03] David A. Forsyth, Jean Ponce - Computer Vision, A Modern Approach, Chap-ters 1, 2 and 4, Prentice Hall, 2003

[Westlund] Harold B. Westlund, Gary W. Meyer - A BRDF Database Employing theBeard-Maxwell Reflection Model

[Nayar95] Shree K. Nayar, Michael Oren - Visual Appearance of Matte Surfaces, SCI-ENCE, Vol. 267, Feb. 1995

[Oren92] Michael Oren, Shree K. Nayar - Generalization of the Lambertian Model andImplications for Machine Vision, International Journal of Computer Vision, Vol.14:3, Nov. 1992

[Baumann02] Martin Baumann - Einfuhrung in die Medizin fur Naturwissenschaftler undIngenieure, Sinne 1, Sinnesrezeptoren, Auge, WS 2001/2002

[Rincon] Ies el Rincon: http : //usuarios.lycos.es/biologiacelular/Ojo.htm, Feb. 4, 2005

Cameras Denise Ko¨lter - RWTH Aachen Universitykhadivi/files/Koelter_Ausarbeitung.pdf · This...

Documents

Transcript of Cameras Denise Ko¨lter - RWTH Aachen Universitykhadivi/files/Koelter_Ausarbeitung.pdf · This...