DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

141
Aus dem Institut für Tierzucht und Tierhaltung der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel ___________________________________________________________________ DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ASSESS ANIMAL WELFARE Dissertation zur Erlangung des Doktorgrades der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Ing. agr. Paula Martín Fernández aus Madrid, Spanien Dekan: Prof. Dr. Eberhard Hartung Erster Berichterstatter: Prof. Dr. Joachim Krieter Zweiter Berichterstatter: Prof. Dr. Eberhard Hartung Tag der mündlichen Prüfung: 23.01.2015 ___________________________________________________________________ Die Dissertation wurde mit dankenswerter finanzieller Unterstützung aus Mitteln des Bundesministeriums für Bildung und Forschung im Rahmen des Kompetenznetzes der Agrar- und Ernährungsforschung PHÄNOMICS angefertigt.

Transcript of DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

Page 1: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

Aus dem Institut für Tierzucht und Tierhaltung

der Agrar- und Ernährungswissenschaftlichen Fakultät

der Christian-Albrechts-Universität zu Kiel

___________________________________________________________________

DEVELOPMENT OF A MULTI-CRITERIA EVALUATION

SYSTEM TO ASSESS ANIMAL WELFARE

Dissertation

zur Erlangung des Doktorgrades

der Agrar- und Ernährungswissenschaftlichen Fakultät

der Christian-Albrechts-Universität zu Kiel

vorgelegt von Ing. agr. Paula Martín Fernández

aus Madrid, Spanien

Dekan: Prof. Dr. Eberhard Hartung

Erster Berichterstatter: Prof. Dr. Joachim Krieter

Zweiter Berichterstatter: Prof. Dr. Eberhard Hartung

Tag der mündlichen Prüfung: 23.01.2015

___________________________________________________________________

Die Dissertation wurde mit dankenswerter finanzieller Unterstützung aus Mitteln des

Bundesministeriums für Bildung und Forschung im Rahmen des Kompetenznetzes der Agrar-

und Ernährungsforschung PHÄNOMICS angefertigt.

Page 2: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...
Page 3: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

A Mis Padres

Page 4: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...
Page 5: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

TABLE OF CONTENTS GENERAL INTRODUCTION………………………………………………………………..1 CHAPTER ONE

Comparison of methods to develop a multi-criteria evaluation system to assess animal welfare…………………………………………………………………………………...........5 CHAPTER TWO

Development of a multi-criteria evaluation system to assess growing pig welfare…………33

CHAPTER THREE

Validation of a multi-criteria evaluation model for animal welfare…………………………61

Annex………………………………………………………………………………………...89

GENERAL DISCUSSION………………………………………………………………….121

GENERAL SUMMARY……………………………………………………………………128 ZUSAMMENFASSUNG…………………………………………………………………...131 ACKNOWLEDGMENTS…………………………………………………………………..134 CURRICULUM VITAE…………………………………………………………………....135

Page 6: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...
Page 7: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

1

GENERAL INTRODUCTION Concern about livestock living conditions has increased considerably in the last few

years. Consumers are increasingly linking animal welfare indicators with food safety

and quality. These consumers’ preferences create economic incentives for stakeholders

to meet animal welfare standards, as established by legislation or voluntary certification

schemes (Vapnek and Chapman, 2010). It is a generally accepted fact that animal

welfare is a multi-dimensional concept which compromises several aspects such as the

absence of thirst, hunger, discomfort, disease, pain, injuries and stress, and the presence

of normal behavioural expressions (the classical five freedoms (Farm Animal Welfare

Council (FAWC), 1992)). The EU Welfare Quality® (WQ) project developed several

protocols for the assessment of welfare of cattle, pigs and poultry (Botreau et al., 2009).

The inputs for the WQ protocols are on farm welfare measures described in the

protocols. Information at measure level may be useful for farm management purposes;

however, labelling purposes require a certain level of aggregation of the measures into

overall scores. Due to this fact, a multi-criteria evaluation model is required for the

evaluation of an animal unit (farm, slaughterhouse). The WQ protocols proposed a

multi-criteria evaluation system to aggregate the information of the welfare measures

into an overall assessment. Different operators (e.g., I-spline functions, decision trees,

weighted sums or Choquet integrals) were used for this purpose (Botreau et al., 2008).

The main drawback of the multi-criteria evaluation system proposed in the WQ

protocols is that it lacks of transparency and flexibility with respect to the I-spline

functions and the different aggregation operators used. There are other ways of

approaching the multi-criteria evaluation problem that differ from the ones used by the

WQ multi-criteria evaluation model, e.g., the multi-attribute utility theory (MAUT),

ELECTRE or the Analytic Hierarchy Process (AHP). In the MAUT, uni-dimensional

utility functions, which correspond to each criterion, are aggregated into a single global

utility function combining the whole of the criteria (Keeney and Raiffa, 1976), whereas

by using ELECTRE (outranking procedure) only the preference relations of pairs of

alternatives are aggregated (Roy, 1971); whilst in the Analytic Hierarchy Process

‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons

(Saaty, 1980). This thesis focuses on the MAUT. The application of MAUT consists of

two separated steps, the utility function determination and the aggregation function

Page 8: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

2

determination. A large number of methods have been proposed to determine the utility

function in MAUT, for instance the standard sequences method described by Bouyssou

et al., (2000) and the MACBETH method described by Bana e Costa et al., (1999).

Examples of aggregation functions in MAUT are the weighted sum, the ordered

weighted average (Yager, 1989) and the Choquet integral (Choquet, 1953, Murofushi

and Sugeno, 1989, Grabisch, 1997).

Chapter One contains a comparison of different MAUT methods which can be applied

to produce an overall evaluation of animal welfare in the context of certification

schemes. This was performed with regard to the potential of these methodologies to

solve the main difficulties found in the literature faced by such a model, which are that

criteria may have different importance, and interactions may exist between them. This is

a key aspect since the welfare criteria may not fully compensate for each other (Botreau

et al., 2007). Two utility function determination methods (the standard sequences

method and the MACBETH method), and two aggregation functions (the weighted sum

and the Choquet integral (CI)) were compared. In the framework of MAUT, the use of

the MACBETH method together with the CI seemed to be the model which better

solved the difficulties presented.

In order to compare the different methodologies which could be used in the context of

MAUT, a theoretical model of a welfare assessment for growing pigs was used

considering only four criteria, good feeding, good housing, good health and appropriate

behaviour. Due to this fact, in Chapter Two, the application of the MACBETH method

together with the CI based on a real welfare assessment, such as the WQ protocol for

growing pigs (Welfare Quality, 2009), was presented by means of examples.

Throughout this study the different multi-criteria methods used in the WQ protocol

were also compared with the unique methodology proposed in this study.

After the development of any multi-criteria evaluation system, a validation of the model

must be carried out in order to prove that it works as intended in practical conditions

(Qureshi et al., 1999). In Chapter Three, the MAUT methodology proposed in

Chapter Two was implemented to aggregate welfare data which was collected in

different growing pig farms in Schleswig-Holstein, Germany. In total, 44 observations

were carried out. The whole WQ assessment protocol for growing pig farms was

implemented in each observation. The results obtained for each observation were

compared with the results obtained by implementing the multi-criteria methodology

Page 9: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

3

proposed in the WQ protocol. Also, the influence of variations in the welfare measure

values was estimated in order to assess the sensitivity of the model.

Overall, the thesis provides a multi-criteria evaluation model for animal welfare, the use

of which has been implemented in the context of the Welfare Quality® protocol for

growing pigs.

References

Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:

Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),

Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:

Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.

Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and

Veissier I 2007. Aggregation of measures to produce an overall assessment of

animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.

Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of

animal welfare at farm level: an application of MCDA methodologies.

Foundations of Computing and Decision Science. 33, 1-18.

Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy

adopted in Welfare Quality. Animal Welfare. 18, 363-370.

Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation

and decision models: A critical perspective. Kluwer, Dordrecht.

Choquet G 1953. Theory of capacities. Annales de l’Institut Fourier. 5, 131-295.

Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary

Record, 17, 357.

Grabisch M 1997. K-Order additive discrete fuzzy measures and their interpretation.

Fuzzy sets and systems. 92, 167-189.

Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and

values tradeoffs. Wiley, New York.

Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet

integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems.

29, 201-227.

Qureshi ME, Harrison SR & Wegener MK 1999. Validation of multi-criteria analysis

models. Agricultural Systems. 62, 105-116.

Page 10: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

4

Roy B 1971. Problems and methods with multiple objective functions. Mathematical

Programming. 1, 239-266.

Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource

allocation. McGraw-Hill, New York.

Vapnek, J and Chapman M 2010. Legislative and regulatory options for animal welfare.

FAO Legislative study 104, FAO, Rome.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs.

Lelystad: Welfare Quality® Consortium.

Yager R 1988. On ordered weighted averaging operators in multi-criteria decision

making. IEEE Transactions on Systems, Man and Cybernetics. 18, 183-190.

Page 11: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

5

CHAPTER ONE

Comparison of methods to develop a multi-criteria

evaluation system to assess animal welfare

P. Martín 1, I. Traulsen 1, C. Buxadé 2 and J. Krieter 1

1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany

2 Animal Production Department, Polytechnic University, Madrid, Spain

Page 12: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

6

Abstract

The aim of this paper was to create a model to review different methodologies which

can be applied to produce an overall evaluation of animal welfare in the context of

certification schemes. This was performed with regard to the potential of these

methodologies to solve the main difficulties found in the literature faced by such a

model. Welfare Quality® distinguishes four welfare criteria (good feeding, good

housing, good health and appropriate behaviour). Data for growing pigs farms was

generated, with each farm receiving one score for each welfare criteria. Ten farms were

used as learning data and the complete dataset generated was used to exemplify the

differences between the methods. The multi-attribute utility theory (MAUT) was used

to produce an overall value of welfare. The utility functions and the aggregation

function were constructed in two separated steps. First, utility functions for each

criterion were determined in two different ways, using the standard sequences method

(SS) and the MACBETH software. In the second step, the weighted sum (WS) and the

Choquet integral (CI) were used as aggregation functions. The utilities derived from

MACBETH allowed us to model more adequately the preferences of the decision-maker

regarding the different importance of the criteria and the interaction between them than

the SS method. A comparison of the WS and the CI results obtained from each method

was carried out. The results showed that there were interactions between the criteria,

assuming independence among the criteria led to important differences in the

classification of the farms.

Keywords: Animal Welfare, assessment, methods, pigs.

Page 13: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

7

1 Introduction

Concern about livestock living conditions has increased considerably in the last few

years and consumers have also been increasingly linking animal welfare indicators with

food safety and quality. These consumer preferences create economic incentives for

stakeholders to meet animal welfare standards, as established by legislation or voluntary

certification schemes (Vapnek and Chapman, 2010). It is a generally accepted fact that

animal welfare is a multidimensional concept which compromises several aspects such

as the absence of thirst, hunger, discomfort, disease, pain, injuries and stress, and the

presence of normal behavioural expressions (the classical five freedoms (Farm Animal

Welfare Council (FAWC), 1992)). Due to this fact the assessment of animal welfare

must be based on several measures. Information at measure level may be useful for farm

management purposes; however, labelling purposes require a certain level of

aggregation into overall scores (Blokhuis et al., 2010). To determine an overall level of

animal welfare, measures need to be combined. Although it has been argued that

science should not attempt to perform overall welfare assessments because value

judgements are inherently involved (Fraser, 1995), others state that an overall welfare

assessment is not arbitrary and a high level of accuracy can be achieved (Bracke et al.,

1999). In spite of the different viewpoints, various models have been developed to

assess overall levels of animal welfare. More recently, Welfare Quality (WQ) has

developed several protocols for the overall assessment of the welfare of cattle, pigs and

poultry (Welfare Quality, 2009).

A common feature of all the approaches in multi-criteria decision-making is the need

for an aggregation operator. In the multi-attribute utility theory (MAUT), uni-

dimensional utility functions which correspond to each criterion are aggregated into a

single global utility function combining all the criteria (Keeney and Raiffa, 1976),

whereas in ELECTRE (outranking procedure) the preference relations on pairs of

alternatives are aggregated (Roy, 1971) and in the Analytic Hierarchy Process (AHP)

‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons

(Saaty, 1980). Examples of aggregation functions in MAUT are the weighted sum

(WS), the ordered weighted average (Yager, 1988) and the Choquet integral (CI)

(Murofushi and Sugeno, 1989). The most common aggregation tool still used today is

the WS, with all its well-known drawbacks. The WS can be used as an aggregator when

Page 14: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

8

mutual preferential independence among criteria is assumed. However, in practice, this

mutual preferential independence is rarely verified. In order to be able to take into

account the interaction between the criteria, Sugeno (1974) proposed substituting the

weight vector involved in the calculation of the WS for a fuzzy measure (also called

capacity). The fuzzy integrals, such as the CI, are defined from the concept of a fuzzy

measure. The capacity with respect to the CI can be seen as an extension of the weight

vector with respect to the WS (Grabisch et al., 2008). The distinguishing feature of a CI

is that it is able to represent a certain kind of interaction, ranging from redundancy

(negative interaction) to synergy (positive interaction) (Grabish, 1996).

The aim of this study was to create a model to compare different methodologies which

can be used in the context of the MAUT. These could then be applied to develop a

multidimensional estimation system in order to produce an overall evaluation of animal

welfare in the context of certification schemes. In the framework of MAUT, a

comparison was undertaken between two methods of utility function determination (the

standard sequences method and the MACBETH method) and two aggregation methods

(the WS and the CI). These different methods were used with the objective of finding

the method which better solves the main difficulties found in the literature faced by

such a model. The main difficulties the model faces are that criteria may have different

levels of importance, and interactions may exist between them, this being a key aspect

that the welfare criteria may not fully compensate for each other (Botreau et al., 2007b).

2 Material and methods

2.1 Data

In order to compare the different methodologies which can be used in the context of

MAUT, a theoretical model of a welfare assessment for growing pigs was used

considering four criteria, good feeding (F), good housing (Ho), good health (He) and

appropriate behaviour (B), corresponding to the four main WQ principles. Each of these

criteria was assessed by a different number of measures. Values of the measures were

established which check whether each criteria could be 0 and 1 (absence or presence).

In this way, and considering a linear combination (sum) of the values of the measures to

Page 15: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

9

produce the criteria value, if a criterion is assessed by three measures, it can take four

different values: 0, 1, 2 and 3. Thus, good feeding was defined by four measures, and

thus could vary between 0 (worst) and 4 (best), good housing by 7 measures varying

between 0 and 7, good health by 13 measures, varying between 0 and 13, and

appropriate behaviour between 0 and 4, assessed by 4 measures. These scales were

elicited in this way instead of establishing intervals between 0 and 100, so they

represent raw data which was not interpreted in terms of welfare and can allow the

study of the potential of the different methods to work in a future step of the project

with measures collected in different units or scales.

Data from ten farms regarding the four criteria were selected as learning data (Table 1)

from which the decision-maker (DM) had to express his preferences. These consisted of

giving a partial weak order over the set of weights related to each criterion (W in Table

1), the sign of interaction between the 6 pairs of criteria ((F, Ho), (F, He), (F,B), (Ho,

He), (Ho, B), (He, B)) and a partial weak order (R) over the farms (Table 1) taking into

account both the different importance of the criteria and the interactions between them.

Farms a, b, c, d and e were selected to assess how the DM perceived the different

importance of the criteria. For these 5 farms, 3 of the criteria were assigned a good

value and only one of the criteria corresponded to a medium value. Farms g, h, i and j

were selected to assess how the DM perceived the interaction between a bad grade in

one criterion and medium values in the other criteria.

A second dataset consisting of 2,800 farms from the combination of all the possible

values for the four criteria was generated in order to obtain an absolute impression of

the influence of using the different methods not limited to the relative comparison of a

small dataset (learning data).

Page 16: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

10

Table 1. Criteria values for each selected farm (learning data) and initial preferences of

the decision-maker.

Farm Feeding¹ Housing² Health³ Behaviour4 R

a 2 5 10 3 1

b 3 3 10 3 2

c 3 5 7 3 3

d 3 5 10 2 4

e 3 5 6 3 5

f 2 3 6 2 6

g 0 3 6 2 7

h 2 1 6 2 8

i 2 3 4 2 9

j 2 3 6 1 10

W + ++ +++ +++

¹Feeding values can vary between 0 (worst) and 4 (best).

² Housing values can vary between 0 (worst) and 7 (best).

³ Health values can vary between 0 (worst) and 13(best). 4 Behavioural values can vary between 0 (worst) and 4 (best).

R: DM’s ranking over the farms

W: Initial notions of the DM about the importance of the weights.

Bad grade; medium grade; good grade.

2.2 General methodology

The MAUT was used to produce an overall value of welfare starting from the data

regarding the four main criteria. The utility functions and the aggregation functions

were constructed in two separated steps (Figure 1). A comparison was made between

the two methods of utility function determination, i.e. the standard sequences method

(SS) described by Bouyssou et al. (2000) and the MACBETH method described by

Bana e Costa et al. (1999), and also two aggregation methods, i.e. the WS and the CI.

Page 17: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

11

Figure 1. General methodology followed in the study.

The results obtained via the different utility function determination methods and the

aggregation operators were also compared. The rankings of the overall utilities obtained

for the 10 farms selected as learning data were compared. However, in order to obtain

an absolute impression - not limited to the relative comparison of a small dataset - of the

influence of taking the interactions between the criteria into account, four welfare

categories were defined which match the ones proposed by Welfare Quality (2009):

unacceptable (overall utility < 20), acceptable (overall utility >20 but < 55), enhanced

(overall utility > 55 and < 80) and excellent (overall utility > 80). The MACBETH

overall utilities obtained for the complete dataset (2,800 farms) through the WS and the

CI were classified into one of the four categories and the number of farms assigned to

each welfare category were compared between aggregation methods.

2.3 MAUT - Utilities determination

For the utility function determination, each criterion was considered separately. The

utility function ui represents the preferences of the DM over the criteria Xi. The utilities

can be seen as providing numerical representation of the attractiveness of the different

values of the criteria for the DM. A large number of methods have been proposed to

Feeding Utility

SS

MACBETH

Housing Utility

SS

MACBETH

Health Utility

SS

MACBETH

Behaviour Utility

SS

MACBETH

Choquet integral

Weighted Sum

Overall Utility

Individual utilities determination Aggregation into an overall utility

Page 18: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

12

determine the utility functions in an additive multi-attribute utility model, see von

Winterfeltd and Edwars, (1986) for an accessible account of such methods. There are

essentially two families of methods, one based on direct numerical estimations and the

other on indifference judgements. We chose two methods from the latter category, the

MACBETH method (Bana e Costa et al., 1999) and the SS method (Kranztz et al.

(1971), von Winterfeldt and Edwards (1986), Wakker (1989), Bouyssou et al (2000));

since utilities which are spontaneous might not be as reliable as utilities which follow a

methodology to construct them, see Bouyssou et al. (2006) and Bana e Costa et al.

(2004) for a deeper review. These two methods were chosen for two reasons; first we

wanted to compare a methodology based on qualitative judgements (MACBETH) with

a method based on quantitative judgements (SS), and second due to the extensive

literature available on these two methods.

2.3.1 Standard sequences method

To elicit a utility function (ui), for example uHo corresponding to Housing, the SS

method starts by considering two hypothetical farms which differ only in the feeding

and housing criteria. Ceteris paribus is considered for the performance levels of the

other criteria. Then, it is assumed that the two farms differ in Feeding by a noticeable

amount (1 point for instance). An interval of this amplitude is located in the middle of

the range for Feeding; say for example 1-2. Then, a value for Housing is also set in the

middle of the range, say 3. Then, the DM is asked to assess a value of Housing (XHo)

such that he would be indifferent towards the two farms (1, 3) and (2, XHo). The second

question to the DM uses his answer to the first question (he is asked to assess the value

X’Ho of Housing that would leave him indifferent towards the two farms (1, 2) and (2,

X’Ho). Continuing along the same line would lead for instance to the following

sequences of indifference:

(1, 3) ~ (2, 2)

(1, 2) ~ (2, 0)

Then, similar questions are asked for the upper half of the range of the Housing, which

may lead to the following sequences of indifference.

(1, 5) ~ (2, 3)

(1, 7) ~ (2, 5)

Page 19: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

13

In other words, the DM considers that a farm with a score of 1 in Feeding and a score of

3 in Housing (considering ceteris paribus in the other two criteria) is equal in terms of

preference to a farm with a score of 2 in Feeding and a score of 2 in Housing. A farm

with a score of 1 in Feeding and a score of 2 in Housing is thus considered equal to a

farm with a score of 2 in Feeding and a score of 0 in Housing. A farm with a score of 1

in Feeding and a score of 5 in Housing is considered equal to a farm with a score of 2 in

Feeding and a score of 3 in Housing, and finally a farm with a score of 1 in Feeding and

a score of 7 in Housing is considered equal to a farm with a score of 2 in Feeding and a

score of 5 in Housing. Such a sequence gives the analyst an approximation of the

single-attribute utility function for Housing uHo. The final step is to normalise the

individual utility function of each criterion in a (0-100) interval in order to be able to

aggregate the marginal utility functions for the different criteria.

To determine uHe (Health) and uB (Behaviour) in the same way as for Housing, a

successive search was carried out for intervals on the Health and Behaviour scales

which would exactly compensate the Feeding interval 1 - 2 in terms of preference.

Finally, the same recording was made for Feeding itself (uF), fixing an interval for

instance on the Housing of 2 - 3.

2.3.2 MACBETH

MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique)

is a methodology described by Bana e Costa et al. (1999), which requires only

qualitative judgements to quantify the relative attractiveness (utilities) of options

(farms). To elicit a marginal utility function (ui) using the MACBETH software, for

example uHo corresponding to Housing, the first step is to fill in a matrix, giving

qualitative judgements regarding the difference of attractiveness between the different

quantitative performance levels of the criterion. For instance, for Housing, the

quantitative performance levels vary between 0 and 7. The qualitative judgements of

difference can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or

‘extreme’, (Figure 2a).

Page 20: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

14

Figure 2a. MACBETH matrix of qualitative judgements. Quantitative performance

levels for Housing.

As each judgement is given, the software automatically verifies the matrix’s consistency

(Figure 2b), and suggests judgement modifications which can be made to fix any

detected inconsistency (Figure 2c).

Figure 2b. MACBETH matrix of qualitative judgements. Example of building a

consistent matrix.

Page 21: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

15

Figure 2c. MACBETH matrix of qualitative judgements. Example of inconsistency.

From the complete and consistent matrix of judgements, MACBETH creates a

numerical scale (Figure 2d). With the numerical scale, MACBETH produces the

marginal utility function (u) for each criterion. The range in which the utilities vary was

defined in this study as 0-100 in order to be in accordance with the SS method.

Figure 2d. MACBETH matrix of qualitative judgements. Complete matrix of

judgments and numerical scale.

Page 22: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

16

2.4 MAUT - Aggregation methods

In the second step, all the criteria were considered together. Here, the weighted sum

(WS) and the Choquet integral (CI) were used as aggregation functions in order to

evaluate the differences in the output of taking the interactions between criteria into

account and considering that the welfare criteria behaved as independent criteria.

2.4.1 Weighted sum

After the SS technique and the MACBETH method, 4 utility functions were present

where 0 was the worst performance and 100 was the best performance for each

criterion. Weights would have had to be used to additively combine these values using

the WS. The DM was asked to provide some initial notions on the importance of the

weights (W in Table 1). Thus, a test was performed to determine whether the same

weighting vector was obtained when two different methods were implemented to elicit

them:

Firstly, following a method suggested by Bouyssou et al. (2006), and described first by

Keeney and Raiffa (1976). The interest in this technique is that the weights are not

obtained by asking the DM to give the value of the parameters (direct rating procedure).

Instead, the DM is asked to rank alternatives, and the different importance values of the

criteria are determined from this ranking, following a determined procedure which uses

the utility functions previously determined.

Secondly, the weighting of the criteria was performed within the MACBETH software

following the same procedure as described for the elicitation of the utilities, in other

words, giving qualitative judgements regarding the difference of attractiveness between

criteria.

The same weighting vector was obtained using the Keeney and Raiffa (1976) technique

and the MACBETH methodology. The utilities calculated by the SS method and the

MACBETH methodology were aggregated with the WS using the weighting vector

obtained.

Page 23: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

17

2.4.2 Choquet integral

In order to combine the 4 utility functions calculated by the SS technique or by the

MACBETH method using the CI, the first step was the capacity identification.

Capacities can be regarded as a weighting vector involved in the calculation of weighted

sums. Seen as an aggregation operator, the CI with respect to the capacity can be

considered as taking into account the different importance of the criteria and the

interaction between criteria. The overall importance of a criterion can be measured by

its Shapley value and the interaction between criteria can be measured by the interaction

indices. The interaction phenomena among criteria can be very complex and difficult to

identify. Different forms of dependence exist, for instance, correlation,

substitutive/complementary, and preferential dependence (Marichal, 2000). In this

study, the DM regarded the criteria as complementary (positive interaction) or

substitutive (negative interaction). According to the definition of Marichal, (2000)

subtitutiveness between criteria can be understood as when a decision maker demands

that the satisfaction of only one criterion produces almost the same effect than

satisfaction of both. Of course, it is better that they be good on both directions, but it is

less important. For instance, in this study and considering two criteria i and j, they

would be regarded as substitutive when it is important that farms are good at criterion i

or j, in other words, compensation is allowed between them, but they will be considered

complementary when for the DM the satisfaction of only one criterion produces a very

weak effect compared with the satisfaction of both.

The number of variables involved in the CI increases exponentially with the

coefficients, which define a capacity. For reasons of simplicity, it may be preferable to

restrict to 2-additive or 3-additive solutions (Gabrisch et al., 2008), which in this study

corresponded to the definition of 10 or 14 coefficients respectively. We proposed

restricting the model to the 2nd order, thus assuming that interaction between more than

2 criteria does not exist. Due to the fact that in this example although only 4 criteria

were considered and the difference in coefficients to be determined between a 2nd and a

3rd order was small, more criteria in a further step of the project may have to be

considered. If, for instance, 6 health criteria are aggregated, 21 coefficients will be

needed with a 2-additive model and 41 with a 3-additive one. The number of variables

involved in the CI increases exponentially with the coefficients which define a capacity.

Let us consider a decision problem involving a set X of n elements, here

Page 24: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

18

(criteria). Defining a capacity on X requires the definition of coefficients. This

could be too complex to handle if n goes beyond, say 8 (Grabisch, 1997). As a

consequence it is frequent to consider that the capacity is additive, what identifies the

Choquet integral with the weighted arithmetic mean (Marichal, 2000), and that can be

defined with only n coefficients, at the price of a very poor modelling tool, avoiding in

this way the complexity of using non-additive capacities but also losing their richness

(Kojadinovic, 2007). The fundamental notion of k-additive proposed by Grabisch

(1997) enables to find an intermediate solution between the complexity of

representation and the richness of the model. K-additive measures for need less

than coefficients to be defined. Only n coefficients are needed for (additive

capacity), for , and in general for k-additive measures.

According to Mayag et al. (2011) given (x1,…, xn) the individual utilities for the

criteria, in this study (xF, xHo, xHe, xB) the individual utilities for Feeding, Housing,

Health and Behaviour respectively, the CI with respect to a 2-additive capacity can be

written as follows:

Where vi represents the importance of the criterion i and Iij represent the interaction

between criteria i and j.

There are different methods for capacity identification proposed in the literature. Most

of them can be stated as optimisation problems. The main differences between them are

the objective function and the preferential information they require as input. The

minimum variance approach was used, which requires only a partial order over the

farms as preference information. Capacity identification was implemented within the

Kappalab R package following the method described by Grabisch et al. (2008). The

utilities calculated using MACBETH and the SS method corresponding to the criteria

data for the 10 farms were used as subsets against which the capacity was to be

identified, in order for the CI to numerically represent the preferences of the DM with

respect to this capacity. The partial weak order over the farms (R in Table 1) given by

the DM was used for the implementation of the minimum variance approach (MV). A

Page 25: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

19

non-negative indifference threshold for the ranking over the farms was defined so the

partial weak orders previously mentioned were translated into partial semi-orders with

fixed indifference thresholds, see Grabisch et al. (2008) for a deeper review. The values

of the thresholds had to be chosen carefully, since a very large indifference threshold

could have made the program infeasible, see Marichal and Roubens (2000) for a deeper

review. The indifference threshold for the ranking of the alternatives was established as

0.05.

After an initial calculation of the CI with the MV, a progressive interactive approach

was developed in order to be in accordance with the DM’s initial preferences regarding

the importance of the criteria (Shapley values) and the interaction indices (MV’). Non-

negative indifference thresholds for the Shapley values and for the interaction indices

were defined. The indifference threshold established to regard the criteria as different

was 0.05 and the minimal absolute value of an interaction index to be considered as

significantly different from zero was established as 0.05.

Additional constraints on the Shapley values were imposed, so the importance of the

criteria followed the order determined before following the DM preferences (W in

Table 1) and additional constraints on the interactions indices were imposed so the

criteria were regarded as complementary and compensation was limited between them

(positive interaction between the 6 pairs of criteria (F, Ho), (F, He), (F, B), (Ho, He),

(Ho, B), (He, B).

2.5 Estimation of the importance of the interactions between criteria

The utility functions determined before with MACBETH were used to produce a utility

value for each criterion for the 2,800 farms. In order to demonstrate the importance of

taking into account the interaction between the criteria to produce an overall assessment

of farm animal welfare, the individual utilities for each criterion were aggregated

additively (with a weighting vector, WS) and non-additively (with the CI). For the CI

aggregation, the coefficients (Shapley values and interaction indices) obtained by the

MV’ approach for the MACBETH method were used and for the WS aggregation only

the Shapley values of the MV’ approach (WSMV’) were used as weights. The objective

was to estimate the number of farms that changed their welfare category due to the

inclusion in the model of the interactions between the criteria and the limitation of the

Page 26: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

20

compensation between them. Each one of the 2,800 farms was assigned to a welfare

category (unacceptable, acceptable, enhanced and excellent). The number of farms

assigned to each welfare category were compared when the criteria were considered as

independent criteria and when the interactions between the criteria were taken into

account limiting the compensation between them.

3 Results

3.1 Utility function determination methods

The differences between the utility functions calculated using the SS method and the

MACBETH method were in general minor, except for the lowest value of Behaviour,

where a difference between the utilities of both methods greater than 10 was found

(Figure 3).

Figure 3. Utility functions calculated using the SS method (−−−) and the MACBETH

method (───) for Feeding, Housing, Health and Behaviour

3.2 Aggregation methods - Weighted sum

The resulting weighting vector following the Keeney and Raiffa technique and the

MACBETH method matched well, and were in accordance with the initial preferences

Page 27: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

21

of the DM (W in Table 1). For both methods, the importance of the criteria conformed

to the following sequence:

Health (0.3333) = Behaviour (0.3333) > Housing (0.2223) > Feeding (0.1111)

These weights were used for the aggregation of the individual utilities calculated using

the SS and the MACBETH methods.

3.2.1 Standard sequences

The ranking of the 10 farms’ utilities obtained after aggregating with the WS, i.e. the

individual utilities calculated with the SS method (Table 2), was different from the

ranking over the farms given by the DM as initial preferences (Table 1). For farms a, b,

c, d, e and f, the ranking of the utilities was coincident with the initial DM preferences,

but completely different for farms g, h, i and j.

Table 2. Partial utilities calculated with the standard sequences method and overall

utilities and rankings (R) computed using the weighted sum (WS) and the Choquet

integral (CI) with the different approaches implemented, the minimum variance (MV)

and the minimum variance with Shapley value and interaction indices constrains (MV’).

Farm

Partial utilities WS CI

F Ho He B Overall

utility

R Overall utility

(MV)

R Overall utility

(MV’)

a 50 75 70 66.66 67.78 1 67.70 1 NS

b 75 50 70 66.66 65 2 66.38 2 NS

c 75 75 40 66.66 60.55 3 61.86 3 NS

d 75 75 70 33.33 59.44 4 59.02 4 NS

e 75 75 30 66.66 57.22 5 58.97 5 NS

f 50 50 30 33.33 37.78 6 38.18 6 NS

g 0 50 30 33.33 32.22 7=8 32.50 7 NS

h 50 12.5 30 33.33 29.44 10 32.45 8 NS

i 50 50 10 33.33 31.11 9 32.40 9 NS

j 50 50 30 16.66 32.22 7=8 32.35 10 NS

F: Feeding; Ho: Housing; He: Health; B: Behaviour. NS: No solution.

Page 28: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

22

3.2.2 MACBETH

The ranking obtained after aggregating with the WS, the individual utilities calculated

with the MACBETH method (Table 3) and the ranking over the farms provided by the

DM as initial preferences (Table 1) were equal except for farms c and d. MACBETH

did not distinguish between them whereas the DM preferred farm c to farm d.

Table 3 Partial utilities calculated with MACBETH and overall utilities and rankings

(R) computed using the weighted sum (WS) and the Choquet integral (CI) with the

different approaches implemented, the minimum variance (MV) and the minimum

variance with Shapley values and interaction indices constraints (MV’).

Farm

Partial utilities WS CI

F Ho He B Overall

utility

R Overall utility

(MV)

R Overall utility

(MV’)

R

a 55 75 65 65 66.11 1 65.21 1 64.52 1

b 80 50 65 65 63.33 2 65.16 2 61.22 2

c 80 75 40 65 60.56 3=4 63.07 3 58.51 3

d 80 75 65 40 60.56 3=4 63.02 4 58.46 4

e 80 75 30 65 57.22 5 60.12 5 54.67 5

f 55 50 30 40 40.56 6 42.49 6 39.27 6

g 0 50 30 40 34.45 7 35.22 7 29.77 7

h 55 15 30 40 32.78 8 35.17 8 29.72 8

i 55 50 5 40 32.22 9 35.12 9 29.67 9

j 55 50 30 5 28.89 10 32.31 10 26.26 10

F: Feeding; Ho: Housing; He: Health; B: Behaviour.

3.3 Aggregation methods - Choquet integral.

3.3.1 Standard sequences

The overall utilities for the 10 farms computed using the CI with respect to the 2-

additive solutions are given in Table 2. For the MV approach, the results follow the

Page 29: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

23

partial weak order provided by the DM at the beginning, and comply with the

indifference threshold established by the DM (0.05). Note that the differences between

the overall utilities of farms g, h, i, and j, are exactly equal to 0.05, which is exactly the

indifference threshold. The Shapley values and the interaction indices of the 2-additive

solution obtained by means of the MV approach are given in Table 4.

Table 4 Coefficients of the weighted sum (WS) the Choquet integral obtained by the

minimum variance approach (MV) and by the minimum variance approach with

constraints on the Shapley values and on the interaction indices (MV’), to aggregate

individual utilities calculated using the SS method and the MACBETH method.

Shapley values Interaction indices

F* Ho* He* B* F,Ho F,He F,B Ho,He’ Ho,B He,B

SS

WS 0.111 0.222 0.333 0.333 - - - - - -

MV 0.183 0.226 0.278 0.312 -0.151 0.008 -0.029 0.027 0.054 -0.014

MV’ NS NS NS NS NS NS NS NS NS NS

MACBETH

WS 0.111 0.222 0.333 0.333 - - - - - -

MV 0.228 0.233 0.266 0.273 -0.048 0.019 0.019 0.018 0.007 0.022

MV’ 0.139 0.241 0.309 0.311 0.05 0.05 0.05 0.05 0.05 0.05

F: Feeding; Ho: Housing; He: Health; B: Behaviour. NS: No solution.

The importance of the criteria followed the next order: Behaviour > Health > Housing >

Feeding. This order over the overall importance of the criteria was not completely in

accordance with the initial preferences of the DM. In the interaction indices, it should be

noted that there was a strong negative interaction between Feeding and Housing (-

0.151). Feeding also negatively interacted with Behaviour, and Health interacted

negatively with Behaviour. There was no solution for the MV’ approach, due to the fact

that the model was not compatible with the three constraints imposed: ranking over the

farms, Behaviour = Health > Housing > Feeding, and all criteria regarded as

complementary (with indifference thresholds of 0.05, 0.05 and 0.05 respectively).

Page 30: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

24

3.3.2 MACBETH

The overall utilities computed using the CI with respect to the 2-additive solutions for

the 10 farms are given in Table 3. Note that, as expected, for the MV approach the

results follow the partial weak order provided by the DM as an initial preference. It

should also be noted that the differences between the overall utilities of farms a and b, c

and d, g and h, and between h and i, are exactly equal to 0.05, which is the indifference

threshold. The Shapley values and the interaction indices of the 2-additive solutions for

MACBETH are given in Table 4. For the MV approach, the importance of the criteria

followed the order: Behaviour > Health > Housing > Feeding, which was not

completely in accordance with the initial preferences of the DM. All pairs of criteria

interacted positively except for Feeding and Housing, which interacted negatively.

For the MV’ approach, the constraints for both the interaction indices (indifference

threshold 0.05), and the Shapley values (indifference threshold 0.05) imposed by the

DM were satisfied, these being all the criteria complementary (positive interaction) and

following the Shapley values, the order: Health = Behaviour > Housing > Feeding

(Table 4). If these utilities (MV’) were compared with the initial ones without any

constraint (MV), three main facts could be noticed: first, the ranking over the farms

remained equal; second, the farms had lower values, an effect that was even more

marked in farms g, h, i and j; and third, the utilities of the MV approach decreased when

the compensation between the criteria was limited (MV’), this effect was stronger for

farms g, h, i and j, which are the farms that were elicited to evaluate compensation

between good and bad grades.

3.4 General dataset

When the number of farms assigned to each welfare category by using the MV’ Shapley

values and interaction indices and using the MV’ Shapley values as if they were the

coefficients of a weighted sum (WSMV’) were compared, it was noted that in the first

case 485 farms were classified as unacceptable, 1,788 farms as acceptable, 475 as

enhanced and 52 as excellent, whereas in the second case 407 farms were classified as

unacceptable, 1,574 as acceptable, 697 as enhanced and 122 as excellent. The number

Page 31: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

25

of farms which changed to a higher or lower classification when the interaction indices

were not used in the aggregation are shown in Table 5.

Table 5 Number of farms changing to a higher or lower classification when the Shapley

values of the MV’ approach were used as the coefficients of a weighted sum instead of

the minimum variance with Shapley values and interaction indices constraints (WSMV’).

Original class Farms changed to class:

Unacceptable Acceptable Enhanced Excellent

Unacceptable (n=485) 373 112 0 0

Acceptable (n=1788) 34 1455 299 0

Enhanced (n=475) 0 7 398 70

Excellent (n=52) 0 0 0 52

4 Discussion

The animal welfare multi-criteria evaluation was constructed in two separated steps.

First, utility functions for each criterion were determined in two different ways, using

the SS method and the MACBETH software. In the second step, the WS and the CI

were used as aggregation functions. For the CI capacity identification, minimum

variance (MV) and minimum variance with constraints (MV’) approaches were used.

The main problem found in the utility functions determination with the SS method was

that they are determined on the basis of a linear transformation. For the utility function

of Behaviour (Figure 3), an increase in Behaviour from a score of 2 to a score of 3 had a

utility for the DM of one unit, an increase from 3 to 4 also had a utility of one unit, and

an increase from a score from 0 to 2 was corresponded by a utility of one. Due to the

linear transformation which follows the model, an increase in Behaviour from 0 to 1

passively corresponds to an increase of half a unit. However, there is no opportunity to

assign a lower or a higher value, which can lead to overestimating or underestimating

the utility values a DM would like to assign to a determine performance of one criteria.

It must be pointed out that when the number of performance levels of a criteria

decreases, this under/over estimation can become higher even making the model

unfeasible.

Page 32: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

26

The rankings obtained after aggregating with the WS, i.e. the individual utilities

calculated using the SS method (Table 2) and the MACBETH method (Table 3) were

very different. Compared to the ranking over the farms given by the DM as initial

preferences, MACBETH was the method that better fitted the DM preferences, with

only a different ranking for farms c and d, whereas the SS method presented several

ranking reversals between farms g, h, i and j, which were the farms that were elicited to

estimate how the methods behave when a criterion had a very low value and the other

criteria presented medium-high values. In other words, they were elicited to study the

preferences of the DM regarding the compensation between good grades and bad

grades. This difference between the rankings appeared to be related to the problem

presented above, i.e. the SS method did not allow the DM to assign lower values for

Housing and Behaviour, and this led to a non-accurate interpretation of the DM’s

preferences, implying that the ranking over the overall utilities differed from the DM’s

initial preferences’ ranking.

The results of the MV approach, both for the SS method (Table 2) and the MACBETH

method (Table 3), followed the partial weak order provided at the beginning. The

Shapley values obtained using both methods conformed to the same sequence which

was not completely in accordance with the DM preferences although the differences

were minor. However, the major difference between the methods and the DM

preferences were the values of the interaction indices. The DM considered all the

criteria as complementary; however, there was a negative interaction between Feeding

and Housing for the MACBETH method and a strong negative interaction between

Feeding and Housing for the SS method. There were also interactions between Feeding

and Behaviour, and Health and Behaviour. In an initial calculation of the capacity with

no additional constraints imposed on the model it is usual that the results do not

completely fit the preferences of the DM due to the small dataset from which the

capacity is determined. This issue can be solved by imposing additional constraints on

the Shapley values and on the interaction indices. However, for the SS method, there

was no solution compatible with the constraints (MV’), whereas for the MACBETH

method there was a compatible solution. In the case of the SS method, both the poor

fitting of the DM preferences in the MV approach and the inconsistency of the MV’

model appears to be related to the problem with the SS utility function determination

Page 33: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

27

method. In the case of the MACBETH method, the fact that the preferences of the DM

regarding the Shapley values and the interaction indices in the first approach (MV) were

not completely satisfactory appeared to be more related to the limited learning data than

to a poor interpretation of the DM preferences, since there was a compatible solution

after imposing the constrains.

In summary, the problem in the determination of the utility functions with the SS lay in

the quantitative performances of the criteria. These performances were a mere

simulation. Real welfare measures, as proposed in Welfare Quality® (2009), may be

used in a further step of the project. The quantitative performances of WQ measures

vary, for instance, between 0 and 100 percentage animals with the presence of the

measure. In this scenario, it could be assumed that the utility functions determined using

the SS method would fit the DM preferences as well as the MACBETH method would.

However, we prefer the use of MACBETH to the use of the SS method for several

reasons: first, due to the available information on how to use this method to facilitate a

consensus between stakeholders (Parnell et al., 2013, Bana e Costa et al., 2014), which

may be one of the difficulties when a panel of different DMs is consulted to determine

the utility functions and the aggregation parameters in a further step of the project.

Second, due to the fact that this method makes it easier to judge the different

attractiveness of options with an increasing number of criteria, due to its interactive

software, and due to the use of qualitative judgments, and moreover, a scale of

indifferent categories (‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or

‘extreme’), Bana e Costa et al. (2004). Third, the determination of the utilities process

remained more transparent with the MACBETH method and it is easier to explain to the

stakeholders due to its interactive software than the SS method. Fourth, MACBETH

allows for a comparison of not only qualitative performance levels but quantitative

performances too, with no need for a previous conversion of the quantitative scales into

a qualitative scale, allowing a solution to one of the problems presented by Botreau et

al. (2007b).

What the results of the MV and MV’ approaches corroborated is that using MAUT,

whose aggregation process is based on the WS, is not a valid method to develop an

overall assessment of animal welfare due to the fact that the criteria do not behave as

Page 34: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

28

independent criteria, which is an assumption when using this aggregator (Vincke,

1992). The estimation of the different classification of the farms obtained if the DM

decided to use an additive value model (WSMV’) in spite of all its well-known

drawbacks showed that the main differences occurred in the number of farms classified

as unacceptable and enhanced. 112 of the 485 farms classified with the MV' as

unacceptable were classified as acceptable with the WSMV, and 299 farms of the 1,788

farms classified as acceptable with the MV’ approach were classified as enhanced with

the WSMV’. In other words, not taking the interaction between the criteria into account

led to a considerable decrease in the number of farms classified as unacceptable (from

17.3% of the farms to 14.5%) and acceptable (from 63.9% to 56.2%) and a noticeable

increase in the number of farms classified as enhanced and excellent (from 17% to

24.9% and from 1.9% to 4.4% respectively). Note that the percentage of farms in each

welfare category may vary if the thresholds established for each category are modified.

The large difference in the number of farms classified as unacceptable appeared to be

related to the limitation of compensation between bad and good grades. This revealed

the potential impact of not taking into account the interactions between the criteria to

produce an overall assessment of animal welfare in the context of certification schemes,

which might have been unnoticed had the differences between the aggregation methods

for a small subset of farms as the initial dataset been considered.

5 Conclusions

In summary, in the aggregation of animal welfare criteria it is of major importance to

choose an aggregation method which allows an interaction between the criteria to be

taken into account, such as the CI, and allows the limitation of these interactions when

the criteria are considered complementary by the DMs. Choosing a simpler aggregation

method, such as the WS, which allows compensation between the criteria would lead to

an important misclassification of farms in the context of certification schemes, as

demonstrated here. In this study, it was concluded that MACBETH method better

represented the preferences of the DM than the SS method. The interpretation of the

DM preferences through the utility functions was found crucial in the determination of

the CI aggregation coefficients. A utility function which does not reflect the preferences

Page 35: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

29

of the DM adequately would lead to an incompatible solution when additional

constraints are imposed on the capacity determination model.

6 Acknowledgements

The present study is part of the PHENOMICS research project which is funded by the

German Federal Ministry of education and research.

7 References

Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:

Basic ideas, software, and an application. In Advances in Decision Analysis (eds

N Meskens and M Roubens), vol. 4, pp.131-157. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical

foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J

Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-

technical approach for group decision support in public strategic planning: The

Pernambuco PPA case. Group decision and negotiation 23, 5-29.

Blokhuis HJ, Veissier I, Miele M and Jones B 2010. The Welfare Quality® project and

beyond: Safeguarding farm animal well-being. Acta Agriculturae Scandinava,

Section A, Animal Science 60, 129-140.

Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier

I 2007a. Aggregation of measures to produce an overall assessment of animal

welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.

Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and

Veissier I 2007b. Aggregation of measures to produce an overall assessment of

animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.

Botreau R, Butterworth A, Engel B, Frokman B, Jones B, Keeling L, Kjærnes U,

Manteca X, Miele M, Perny P, van Reenen CG and Veissier I 2009. An Overview

Page 36: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

30

of the Development of the Welfare Quality® Assessment Systems. In Welfare

Quality Reports® no. 12 (eds L Keeling). Cardiff University, UK.

Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of

animal welfare at farm level: an application of MCDA methodologies.

Foundations of Computing and Decision Science 33, 1-18.

Bracke MBM, Spruijt BM and Metz JHM 1999. Overall animal welfare assessment

reviewed. Part 1: Is it possible? Journal of Agricultural Science 47, 279-291.

Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation

and decision models: A critical perspective. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2006. Evaluation

and decision models with multiple criteria: Stepping stones for the analyst.

Springer, New York, USA.

Fraser D 1995. Science, values and animal welfare: Exploring the ‘inextricable

connection’. Animal Welfare 4, 103-117.

Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary

Record 17, 357.

Grabisch M 1996. The application of fuzzy integrals in multi-criteria decision making.

European Journal of Operational Research 89, 445-456.

Grabisch M 1997. k-order additive discrete fuzzy measures and their representation.

Fuzzy Sets and Systems 92, 167-189.

Grabisch M, Kojadinovic I and Meyer M 2008. A review of capacity identification

methods for Choquet Integral based multi-attribute utility theory. Applications of

the Kappalab R package. European Journal of Operational Research 186, 766-

785.

Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and

values tradeoffs. Wiley, New York, USA.

Krantz DH, Luce RD, Suppes P and Tversky A 1971. Foundations of measurement, vol.

1: Additive and polynomial representations. Academic Press, New York, USA.

Kojadinovic I 2007. Minimum variance capacity identification. European Journal of

Operational Research 177, 498-514.

Labreuche C and Grabisch M 2003. The Choquet integral for the aggregation of interval

scales in multi-criteria decision making. Fuzzy sets and Systems 137, 11-16.

Page 37: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

31

Marichal JL 2002. An axiomatic approach of the discrete Choquet integral as a tool to

aggregate interacting criteria. IEEE Transaction on fuzzy systems, vol. 8, no 6.

Marichal JL and Roubens M 2000. Determination of weights of interacting criteria from

a reference set. European Journal of Operational Research, vol. 124, no 3, 641-

650.

Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive

Choquet integral through cardinal information. Fuzzy sets and Systems 184, 84-

105.

Merad M, Dechy N, Serir L, Grabisch M and Marcel F 2013. Using a multi-criteria

decision aid methodology to implement sustainable development principles within

an organization. European Journal of Operational Research 224, 603-613.

Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet

integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,

201-227.

Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision

analysis. John Wiley and sons, New York, USA.

Roy B 1971. Problems and methods with multiple objective functions. Mathematical

Programming 1, 239-266.

Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource

allocation. McGraw-Hill, New York, USA.

Sugeno M 1974. Theory of fuzzy integrals and its applications. PhD thesis, Tokyo

Institute of Technology. Tokyo, Japan.

Yager R 1988. On ordered weighted averaging operators in multi-criteria decision

making. IEEE Transactions on Systems, Man and Cybernetics 18, 183-190.

Vapnek J and Chapman M 2010. Legislative and regulatory options for animal welfare.

FAO Legislative study 104. FAO, Rome, Italy.

Vincke P 1992. Multi-criteria Decision-aid. Wiley, New York, USA.

von Winterfeldt D and Edwards W 1986. Decision analysis and behavioral research.

Cambridge University Press, Cambridge, UK.

Wakker PP 1989. Additive representations of preferences: A new foundation of

decision analysis. Kluwer Academic Publishers, Dordrecht, Netherlands.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs. Wefare

Quality® Consortium, Lelystad, Netherlands.

Page 38: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

32

Page 39: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

33

CHAPTER TWO

Development of a multi-criteria evaluation system to assess

growing pig welfare

P. Martín 1, I. Traulsen 1, C. Buxadé 2 and J. Krieter 1

1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany

2 Animal Production Department, Polytechnic University, Madrid, Spain

Page 40: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

34

Abstract

The aim of this paper was to present an alternative multi-criteria evaluation model to

assess animal welfare on farms based on the Welfare Quality® project, using an

example of welfare assessment of growing pigs. The WQ assessment protocol follows a

three-step aggregation process. Measures are aggregated into criteria, criteria into

principles, and principles into an overall assessment. This study focused on the first step

of the aggregation. Multi-attribute utility theory (MAUT) was used to produce a value

of welfare for each criterion. The utility functions and the aggregation function were

constructed in two separated steps. The MACBETH method was used for utility

function determination and the Choquet integral (CI) was used as an aggregation

operator. The WQ decision-makers’ preferences were fitted in order to construct the

utility functions and to determine the CI parameters. The methods were tested with

generated datasets for farms of growing pigs. Using the MAUT, similar results were

obtained to the ones obtained applying the WQ protocol aggregation methods. It can be

concluded that due to the use of an interactive approach such as MACBETH, this

alternative methodology is more transparent for stakeholders and more flexible than the

methodology proposed by WQ, which allows the possibility to modify the model

according, for instance, to new scientific knowledge.

Keywords: Growing pigs, Welfare Quality, multi-criteria evaluation.

Page 41: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

35

1 Introduction

Concern about livestock living conditions has increased considerably in the last few

years. Also, consumers have been increasingly linking animal welfare indicators with

food safety and quality. These consumer preferences create economic incentives for

stakeholders to meet animal welfare standards, as established by legislation or voluntary

certification schemes (Vapnek and Chapman, 2010). Due to the lack of a standard

assessment of animal welfare, these standards vary from one certification scheme to

another. This statement was the origin of the EU Welfare Quality® project (WQ),

which aimed at proposing an overall assessment system to assess the welfare of cattle,

pigs and poultry (Botreau et al., 2008).

Animal welfare is a multi-dimensional concept, and its assessment should be based on a

variety of measures related to several aspects such as the absence of thirst, hunger,

discomfort, disease, pain, injuries and stress, and the presence of normal behavioural

expressions (Farm Animal Welfare Council (FAWC), 1992)). Due to this fact, a multi-

criteria evaluation model is required for the evaluation of an animal unit (farm,

slaughterhouse). These multi-criteria, decision-making approaches all share the need for

an aggregation operator. In this case, information at the measures level may be useful

for farm management purposes; however, labelling purposes require a certain level of

aggregation of the measures into overall scores. Considerable efforts continue to be

made in order to develop overall assessment systems for different farm animal species

(e.g. WQ project, Bristol Welfare Assurance Programme and Animal Welfare Indicators

project, AWIN). WQ developed animal welfare multi-criteria evaluation models for

different livestock species (Botreau et al., 2009). The inputs for the WQ animal welfare

multi-criteria evaluation model are on-farm welfare measures described in the WQ

assessment protocol (Welfare Quality, 2009). The WQ multi-criteria evaluation model

uses different aggregation methods (e.g., decision tree, weighted sum or Choquet

integral) to aggregate measures into an overall assessment (Botreau et al., 2008). There

are other ways of approaching the aggregation problem that differ from the ones used by

the WQ multi-criteria evaluation model, e.g., the multi-attribute utility theory (MAUT),

ELECTRE or the Analytic Hierarchy Process (AHP). In the MAUT, uni-dimensional

utility functions which corresponds to each criterion are aggregated into a single global

utility function combining the whole of the criteria (Keeney and Raiffa, 1976), whereas

Page 42: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

36

by using ELECTRE (outranking procedure) only the preference relations on pairs of

alternatives are aggregated (Roy, 1971); whilst in the Analytic Hierarchy Process

‘children’ nodes of a common ‘parent’ are aggregated using pair-wise comparisons

(Saaty, 1980).

In the present study, we focused on the MAUT. A large number of methods have been

proposed to determine the utility functions in MAUT, for instance the standard

sequences method described by Bouyssou et al. (2000) and the MACBETH method,

described by Bana e Costa et al. (1999). Examples of aggregation functions in MAUT

are the weighted sum, the ordered weighted average (Yager, 1988) and the Choquet

integral (CI) (Murofushi and Sugeno, 1989). The most common aggregation tool still

used today is the weighted sum, with all its well-known drawbacks. Using this

aggregator, different importance can be attached to the criteria, but no interaction

between the criteria is taken into account. The distinguishing feature of a CI is that it is

able to represent a certain interaction, ranging from redundancy (negative interaction) to

synergy (positive interaction) (Grabish, 1996). In the framework of the MAUT, the

MACBETH method was used for utility function determination, and the CI as the

aggregation method.

The aim of this paper is to present an alternative multi-criteria evaluation model to

assess animal welfare on farms, within the WQ framework, employing, as an example, a

welfare assessment of growing pigs. The aim was to find a model which solved the

main difficulties described by Botreau et al. (2007b) that a multi-criteria aggregation

model for animal welfare faces, for instance, the problem that interactions may exist

between measures and that measures may have different importance for animal welfare,

but it remains more transparent and flexible than the model proposed in the WQ

protocol. In other words, we looked for a model which can be easily understood by the

stakeholders and which would allow the parameters to be changed according to new

scientific knowledge. The paper is organised as follows: Section 2 presents the general

methodology followed in the WQ protocol and the methodology we propose to

construct the multi-criteria evaluation model. Section 3 presents the construction of

criteria from the initial measures by means of examples. Finally, Section 4 discusses the

strengths and weaknesses of the model.

Page 43: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

37

2 General methodology

2.1 Welfare Quality®

(WQ)

The WQ assessment protocol for growing pigs consists of 27 welfare measures, which

were aggregated following a three-step aggregation process (Welfare Quality, 2009). 27

welfare measures were thus combined into 12 criteria, these were aggregated into 4

principles, and these 4 principles were aggregated into an overall assessment. Different

types of operators were used in this aggregation process, such as decision trees,

weighted sums, conversion to ordinal scores, least squares spline fitting, and CI. To

parameterise the operators used for the aggregation of the welfare measures and criteria,

datasets were presented to expert panels of 13 animal scientists, who individually

ranked farms and gave an absolute score on a scale of 0-100 for each of the farms

presented in each of the datasets (Botreau et al., 2008). Partners of the WQ project and

members of the Management Committee and Advisory Committee (i.e. stakeholder

representatives), were consulted to agree upon parameters for the aggregation of

principles into an overall classification (Botreau et al., 2009).

2.1.1 First step of the aggregation process

In the first step, welfare measures were aggregated into the 12 corresponding criteria.

WQ used different types of aggregation of measures into criteria (Figure 1). For some

criteria, the numbers of moderate and severe problems were first combined with a

weighted sum, producing a measure index, on a scale from 0 (worst) to 100 (best).

Afterwards, these index values were converted into measure scores (expressed on the

same 0-100 scale), using spline functions (Ramsay, 1988) that were fitted by least-

square methods. Finally the CI was used to combine the scores for the different

measures into a score for the criterion (a in Figure 1). For some other criteria, the

measures where first transformed into an ordinal scale, which consisted of assigning

warning or alarms depending on the value of the measures. The number of warnings and

alarms were then combined into an index for the criterion, and afterwards this index was

converted into a criterion score using l-spline functions (b in Figure 1). Decision trees

were used to produce the criterion score (c in Figure 1) for other measures. Further

information on the development and employment of these operators can be found in

Botreau et al. (2008, 2009) and Veissier et al. (2011).

Page 44: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

38

Figure 1. Outline of the three different methodologies followed in the Welfare Quality®

project to aggregate the measures into criteria (adapted from Welfare Quality, 2009).

2.1.2 Second step of the aggregation process

In the second step, a CI was used to aggregate the 12 criteria into four principles. This

integral uses weights to combine the different criterion scores into one principle score

(expressed on the 0-100 scale), while limiting the possibility that a poor score of a

criterion is compensated by other excellent scores (Botreau et al., 2007b; Veissier et al.,

2011).

Measure1

Measuren

Score1

Scoren

Criterion score

Measuren

Measure1

Criterion Index Criterion

Score

Ordinal measure1

Ordinal measuren

Measuren

Measure1

Criterion score

Previous calculations I-spline curve fitting Aggregation (Choquet integral)

Previous calculations Weighted sum I-spline curve fitting

Decision tree

a

c

b

Page 45: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

39

2.1.3 Third step of the aggregation process

In the third and final step, the four principles were combined into one overall

assessment. The herds were classified in four different welfare categories:

‘unacceptable’, ‘acceptable’, ‘enhanced’, or ‘excellent’, based on reference profiles for

these four principles (Botreau et al., 2009). To be classified as ‘excellent’, a herd had to

score >55 for each principle and >80 for two principles; to be classified as ‘enhanced’,

each principle had to be >20 and at least two principles had to be >55; to be classified as

‘acceptable’, each principle had to be >10 and at least three principles had to be >2’.

Herds which did not comply with the minimum scores were classified as

‘unacceptable’, which means that at least one principle was ≤ 10 or at least two

principles were ≤ 20.

2.2 Multi-attribute utility theory (MAUT)

As presented before, the WQ assessment protocol follows a three-step aggregation

process. Measures are aggregated into criteria, criteria into principles, and principles

into an overall assessment (Welfare Quality, 2009). This study focused on the first step

of the aggregation to introduce an alternative methodology to the one proposed in the

WQ protocol by means of examples illustrated using growing pigs. MAUT was used to

produce a value of welfare for each criterion, the application of the MAUT consisted of

two separated steps, the utility functions determination and the aggregation function

determination. The MACBETH method was used for the determination of the utilities

and the CI was used as the aggregator method (Figure 2).

Figure 2. Outline of the alternative methodology proposed in this study to aggregate the

Welfare Quality® measures into criteria.

Measure1

Measuren

Utility1

Utilityn

Criterion utility

Previous calculations Utility function determination (MACBETH)

Aggregation (Choquet integral)

Page 46: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

40

2.2.1 Utility function determination (MACBETH)

The utility function gives value to the measure in terms of welfare, it represents the

preferences of the decision-maker (DM) for the measures and their different values. For

example, 5% of lameness in a farm may be interpreted as a worse situation than 5% of

wounds on the body. There are different methods for utility function determination, we

chose MACBETH (Measuring Attractiveness by a Categorical Based Evaluation

Technique) for several reasons:

First, due to the available information on how to use this method to facilitate a

consensus between stakeholders (Parnell et al., 2013, Bana e Costa et al., 2014), which

may be one of the main difficulties which arise when a panel of different DMs is

consulted to determine the utility functions and the aggregation parameters in a further

stage of the project. Second, due to the fact that this method makes it easier to judge the

different attractiveness of options with an increasing number of criteria, due to the use

of qualitative judgments, and moreover, a scale of indifferent categories (‘very weak’,

‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et al. (2004).

Third, the determination of the utilities process remains transparent due to the extensive

bibliography on it (Bana e Costa et al., 1999, 2004) and it is easier to explain to the

stakeholders due to the interactive software provided (M-MACBETH). Fourth,

MACBETH allows for a comparison of not only quantitative performance levels but

qualitative performances too, with no need for a previous conversion of the qualitative

scales into a quantitative scale, allowing a solution to one of the problems presented by

Botreau et al. (2007b).

MACBETH is a methodology which requires only qualitative judgements to quantify

the relative attractiveness (utilities) of options (farms). In order to elicit a marginal

utility function with MACBETH, the first step is to define whether the measure

performs as a quantitative measure or as a qualitative one and which are the

quantitative/qualitative performance levels of the measure. The next step is to fill in a

matrix, giving qualitative judgements regarding the difference of attractiveness between

the different quantitative performance levels of the measure. The qualitative judgements

can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’. As

each judgement was given, the matrix’s consistency was automatically verified with an

interactive algorithm based on linear programming (Mayag et al., 2010), and judgment

Page 47: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

41

modifications were suggested which could be made to fix any detected inconsistency.

From the complete and consistent matrix of judgements, MACBETH creates a

numerical scale. With the numerical scale, MACBETH produces the marginal utility

function (u) for each measure. In order to be able to aggregate the different measures

into criteria, this method also allows the user to normalise the raw data expressed in

different scales into an absolute value scale, ranging, for example, from 0 and 100,

where 0 is the worst situation one can find on a farm and 100 the best situation.

After the initial calculation of the MACBETH scale, a check was performed to ensure

that it adequately represented the relative magnitude of the WQ DMs’ judgements, if

not, the scores were adjusted.

2.2.2 Aggregation with the Choquet integral

In a second step, the CI was used to aggregate the different measures into the

corresponding criteria. In order to combine the measures (individual utilities calculated

with MACBETH) into the corresponding criteria using the CI, the first step used is the

capacity identification. Capacities can be regarded as a weighting vector involved in the

calculation of weighted sums. Seen as an aggregation operator, the CI takes into account

the different importance of the measures and the interaction between them. These

interactions can be complementary (positive) or substitutive (negative). The number of

variables involved in the CI increases exponentially, along with the coefficients which

define a capacity. To keep things simple, it may be preferable to restrict to two-additive

solutions.

In this study, capacity identification, based on the least squares (LS) approach, was

implemented using the Kappalab R package following the method described by

Grabisch et al. (2008). In order to use the LS identification method, the utilities

calculated with MACBETH corresponding to the examples’ data, were used as subsets

against which the initial preferences of the WQ DMs are expressed.

The results of the aggregation of the examples’ data following the WQ protocol were

used as initial preferences in order to fit the model to the WQ DMs preferences.

With this methodology, a progressive interactive approach can be developed after an

initial calculation of the CI, where additional constraints on the Shapley values, which

measure the overall importance of a measure (criterion), and the interaction indices can

be imposed in order to fit more precisely the WQ DMs preferences.

Page 48: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

42

According to Mayag et al. (2011) given (x1, x2, …, xn) the individual utilities for the

different measures, the CI with respect to a two-additive capacity can be written as

follows:

Where vi represents the importance of the measure i and corresponds to the Shapley

value of µ (capacity) and Iij represents the interaction between measures i and j.

3 Examples of the aggregation of measures into criteria

In order to illustrate the methodology proposed for the construction of the criteria, three

examples are given: absence of injuries, absence of disease and absence of pain induced

by management procedures. The WQ protocol distinguishes three types of aggregation

of measures into criteria. Each one of these three criteria are calculated in a different

way in the WQ protocol (Figure 1), whereas this study proposes a unique methodology

for all the criteria (Figure 2).

3.1 Example 1: Criterion ‘Absence of injuries’

Absence of injuries is assessed by three measures: lameness, wounds on the body and

tail-biting. The measures which form this criterion have in common that they are

recorded at individual level. This scale generally represents the severity of the problem

and the range of animals surveyed can be easily calculated (e.g. percentage of animals

walking normally, percentage of moderately lame animals, and percentage of severely

lame animals).

3.1.1 Welfare Quality®

Briefly, the WQ protocol, first produced an ‘Index’ ( ) by combining the percentage

of animals in each severity category, particularly for lameness and wounds on the body.

It consists of a weighted sum, where n can be substituted by lameness or wounds in the

body. For instance, for lameness (l):

Page 49: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

43

For example, a farm with a 10% moderately lamed animals (lameness1) and a 1%

severely lamed animals (lameness2) will achieve an Index for lameness ( ) of 95.

Afterwards this ‘Index’ is restored into a non-linear function (l-spline function)

producing a ‘Score’ ( . For instance, for lameness:

When ≤ 85 then:

When ≥ 85 then:

For example, the farm presented before which was assigned with =95 will achieve

a Score for lameness (Sl) of 51.35.

Figure 3 shows an example of the WQ I-spline function for lameness.

Figure 3. Scores for lameness according to the Index calculated for the % of lame pigs.

For tail biting the I-spline function is calculated directly. The mere absence or presence

of it is recorded, and thus there is no need for a weighted sum to combine the scores

regarding the severity of the problem.

Page 50: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

44

To produce the criterion score, the partial scores previously obtained with the I-spline

functions are combined with the CI (Welfare Quality, 2009).

3.1.2 MAUT

Before determining the utility functions of lameness and wounds on the body, we

produced an Index as was carried out in the WQ protocol, in order to combine the

percentage of animals with a moderate problem and the percentage of animals with a

severe problem ( , -where n can be lameness or wounds on the body. We

implemented the same weights as those used in the WQ protocol. For instance, for

lameness:

For example, a farm with a 10% moderately lamed animals and a 1% severely lamed

animals will achieve an Index for lameness ( ) of 5.

The utility function for the percentage of animals with tail biting was calculated

directly.

-Utility function determination (MACBETH)

The measures which form this criteria were defined as quantitative measures in

MACBETH. The quantitative levels of these measures were defined according to the

WQ protocol. Figure 3 shows how the scores assigned by the WQ DMs corresponding

to the percentage of lame animals decreased rapidly for the 100 to 85 range – reflecting

0 to 15 % lame animals respectively – the rate gradually slowing down after this point.

Performance levels which vary in one unit between 0 and 15% animals with lameness

were established, and when the slope of the l-spline function became homogeneous,

intervals of 10 units were established, as can be seen in Figure 4 with an example of the

utility function for lameness calculated with MACBETH.

Page 51: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

45

Figure 4. Utility function for lameness calculated with MACBETH

For example, the farm presented before which was assigned with =5 will achieve

a utility for lameness of 51.35.

-Aggregation with the Choquet integral

Ten farms were used as learning data to determine the CI aggregation parameters (Data

in Table 1). The utilities calculated with MACBETH for these ten farms were used as a

subset to express the WQ DMs preferences (Utilities in Table 1). The results of the

aggregation of the ten farms’ data following the WQ protocol were used as the WQ

DMs’ initial preferences in order to identify the capacity using the LS-based approach

(WQ overall scores in Table 1).

Page 52: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

46

Table 1. Absence of injuries measures data for selected farms. Measures’ values, individual utilities and overall utilities for each selected farm.

Farm Measures data

(criteria)

(criteria) L1 L2 W1 W2 BT2 L W BT L W BT

a 0 0 0 0 0 0 0 0 100 100 100 99 100

b 1 0.31 4 1 1 0.72 3.67 1 90.37 90.16 93.84 89.99 90.30

c 5.10 0.03 7.46 2 3 2.08 6.97 3 75.04 81.95 82.46 75.01 75.04

d 4.67 1 12.71 5 5 2.87 13.47 5 67.59 67.64 72.41 67.49 67.59

e 1 3.77 25.37 1 7 4.17 17.91 7 57.08 59.29 63.61 57 57.08

f 5.53 3 5 5.6 10 5.21 8.98 10 50.07 77.21 52.52 50.01 50.07

g 1.92 7 36 10 19 7.77 33.33 19 37.69 37.7 31.58 33.81 33.70

h 40.37 1 37.31 5 33 17.15 29.87 33 24.57 41.52 19.57 21.17 21.31

i 33.25 30 89.55 20 55 43.3 79.70 55 10.12 14.40 10.49 10 10.12

j 0 100 0 100 100 100 100 100 0 0 0 0 0

¹Percentage of animals affected with lameness (L) /wounds on the body (W) scored 1

²Percentage of animals affected with lameness/wounds on the body/bitten tails (BT) scored 2

Page 53: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

47

Comparing the results for lameness, wounds on the body and tail-biting obtained with

the WQ method and the MAUT (Table 1), the overall utilities from different farms –

calculated with MACBETH – fit the scores (at criteria level) obtained with the WQ I-

spline functions. The Shapley values for each measure are shown in Table 2. As we can

see, lameness was considered more important than tail-biting, which was in turn

considered more important than wounds on the body. Furthermore, Table 2 shows that

all the interaction between the measures were positive, thus, the measures were defined

as complementary, in accordance with the WQ protocol.

Table 2. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Shapley value Interaction indices

Lameness Wounds Tail biting

Lameness 0.500 - 0.347 0.652

Wounds on the

body 0.174 0.347

- 0.000

Tail biting 0.326 0.652 0.000 -

3.2 Example 2: Criterion ‘Absence of disease’

Absence of disease is assessed by 13 measures. The measures used to check this

criterion lead to data expressed on different scales.

3.2.1 Welfare Quality®

Due to the different nature of the measures (for instance, mortality is recorded as the

percentage of mortality on farm during the last 12 months, whilst coughing and

sneezing are assessed as the average frequency of coughs/sneezes per animal over 5

minutes), WQ decided to compare the data to alarm thresholds which represent the limit

between what is considered abnormal and what is considered normal. When the

incidence observed on a measure reaches approximately half the alarm threshold, a

warning is attributed. The measures are grouped into six areas: mortality, respiratory,

digestive, liver, skin and hernias. The severity of the problem is estimated per area: if

within an area the frequency of one symptom is above the warning threshold and the

Page 54: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

48

other is below, a warning is attributed to the area. On the other hand, if within an area

the frequency of one symptom is above the alarm threshold, the alarm is attributed to

the area; if neither occurs, no problem is recorded. The number of alarms and warnings

detected on a farm are calculated and used to calculate an ‘Index’ for the absence of

disease criteria (Iad) with a weighted sum.

For instance, a farm with a warning in 2 areas and an alarm in another will achieve an

index for absence of disease ( ) of 63.3.

Finally the ‘Index’ is transformed into a score using I-spline functions.

When ≤ 10 then:

When ≥ 10 then:

For instance, the farm presented before which was assigned with an =63.33 will

achieve a score of 48.42.

3.2.2 MAUT

The measures employed to check this criterion were transformed in a first step, into an

ordinal scale, before determining the utility functions. The data was compared to the

warning and alarm thresholds defined in the WQ protocol. The measures were grouped

into the six areas defined in the WQ protocol. The area was attributed with a warning or

an alarm when one of its measures was above the warning or the alarm threshold.

-Utility function determination (MACBETH)

The utility function was calculated per area. We defined the six disease areas as

qualitative measures where the performance levels could be recorded using the terms

‘no problem’, a ‘warning’ attributed to the area and an ‘alarm’ attributed to the area. In

Page 55: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

49

MACBETH, when the area was attributed a warning, a utility of 40 was assigned to it.

When the area was assigned with an alarm, a utility of 0 was assigned, and when there

was no problem recorded the utility assigned to the area was 100 (Figure 5).

Figure 5. Utilities assigned to the performance levels of the absence of disease areas

For instance, the farm presented before, will achieve a utility of 0 in the area which was

assigned with an alarm, a utility of 40 for both areas which were assigned with a

warning, and utilities of 100 for the rest of areas.

-Aggregation with the Choquet integral

Again, ten farms were used as learning data to determine the CI aggregation parameters

(Data in Table 3; the data were highlighted in grey or dark grey when they were above

the corresponding WQ warning or alarm thresholds respectively). The utilities obtained

for the ten farms with MACBETH were used as subsets to express the WQ DMs’

preferences (Utilities in Table 3). The results of the aggregation of the ten farms

following the WQ protocol (WQ overall scores) were used as initial preferences in order

to use the least squares-based approach for capacity identification (WQ in Table 3).

Page 56: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

50

Table 3. Absence of disease Measures’ values for each selected farm. Measures’ values, individual utilities and overall utilities for each selected farm.

Farm

Measures Data

(criteria)

(criteria) Mortality Respiratory

condition

Digestive

condition

Parasites Skin

condition

Hernias

M1 C2 S2 LB3 TS3 RP3 LF4 P SC5 H5 H6

a 0.3 5 2 0.2 0.1 0.1 2 0 0.4 0.5 0.1 99.99 100.00

b 0.7 12 5 0.3 0.2 0.8 3 0 1 1 0.3 83.97 83.80

c 1 14 24 1.4 1 0.6 20 0 3 2.3 0.3 74.13 73.00

d 1.3 16 10 0.5 0.3 0.3 6 0 1.3 1.5 0.5 69.46 69.46

e 1.8 20 16 1 0.7 0.5 10 0 2.4 2 0.8 56.38 58.30

f 2 6 24 1.4 1 0.7 12 0 9 2.4 0.9 48.42 48.42

g 3 30 38 1.8 1.3 1 10 0 3.6 3 1 34.23 41.81

h 2.6 33 42 2 1.6 1.2 16 0 4 3.2 1.1 27.94 31.00

i 3 37 44 6.1 2 1.5 17 0 4.3 7 1.2 16.88 14.00

j 5.3 50 46 3 2.4 1.7 18 0 9.7 3.8 1.7 7.67 3.01

¹Percentage of mortality (M) on farm during the last 12 months. ² Average frequency of cough(C)/sneezes (S) per animal during 5 minutes. 3Percentage of pigs with evidence of laboured breathing (LB)/twisted snouts (TS)/rectal prolapse (RP) 4Percentage of pigs in herd with liquid faeces (LF) 5Percentage of pigs scored as 2 in skin condition (SC)/ hernias (H) 6Percentage of pigs scored as 1 in hernias(H) Data over the warning threshold; Data over the alarm threshold

Page 57: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

51

For instance, we can notice that the farm presented before corresponding to Farm F is

assigned an overall utility of 48.42 after the aggregation of the individual utilities for

each area with the CI.

We found that the initial Shapley values resulted from aggregating the utilities with the

CI varied between each area slightly, and in the WQ protocol all the areas were consider

equally important. After imposing additional constraints on the Shapley values, the

importance attached to all the areas was the same. Regardless, the overall utility

remained equal. The interaction indices (Table 4) varied from the initial calculation of

the CI and the second constrained calculation, but in both cases all the areas performed

as complementary measures.

Table 4. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Mortality Respiratory Digestive Liver Skin Hernias

Mortality 0.165 - 0.024 0.046 0.029 0.018 0.024

Respirato

ry 0.167 0.024

- 0.017 0.055 0.046 0.035

Digestive 0.168 0.046 0.017 - 0.077 0.037 0.025

Liver 0.163 0.029 0.055 0.077 - 0.056 0.049

Skin 0.166 0.018 0.046 0.037 0.056 - 0.021

Hernias 0.168 0.0214 0.035 0.025 0.049 0.021 -

3.3 Example 3: Criterion ‘Absence of pain induced by management

procedures’

Absence of pain induced by management procedures is assessed by two qualitative

measures: castration and tail docking. These measures are taken at farm level. The

farms are classified according to the presence or absence of these mutilation procedures,

and if so, the use or not of anaesthetics.

3.3.1 Welfare Quality®

WQ used a lexicographic valuation tree for these types of measures (Figure 6).

Page 58: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

52

Figure 6. Tree created in the MACBETH decision support system for the criteria

Absence of pain induced by management procedures

For instance, a farm on which pigs were castrated using anaesthetics and tail docking

was performed without anaesthetics will achieve an index of 35 for the absence of pain

induced by management procedures.

3.3.2 MAUT

-Utility function determination (MACBETH)

Castration and tail docking were defined in MACBETH as qualitative measures in this

study. Following the WQ protocol, their performance levels were established as no

castration/no tail docking, castration/tail docking with anaesthetics and castration/tail

docking without anaesthetics. Figure 7 shows the MACBETH scales for each measure.

Figure 7. Utilities assigned to the performance levels of the Absence of pain induced by

management procedures

Page 59: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

53

For instance, the farm we presented before will achieve a utility of 60 for castration and

a utility of 0 for tail docking.

-Aggregation with the Choquet integral

Nine farms were used as learning data to determine the CI aggregation parameters (Data

in Table 5). The utilities calculated with MACBETH corresponding to these farms were

used as subsets employed to express the WQ DMs preferences (Utilities in Table 5). To

enable the use of the LS-based approach for capacity identification, results from

aggregating the 9 farms data following the WQ protocol were used as WQ DMs’ initial

preferences (WQ overall scores in Table 5).Considering that WQ DMs were satisfied,

we decided not to impose any additional constraint when aggregating the absence of

injuries criterion. Table 5 demonstrates how the utilities concerning castration and tail

docking obtained from the 9 possible farm situations, were adjusted as much as possible

to the WQ scores, for this given criterion. When adjusting the utilities to the WQ DMs’

preferences, the CI parameters obtained indicated that tail docking was considered more

important than castration corresponding to its Shapley values of 0.539 and 0.461. We

also learnt that both measures were performing in a complementary way, with an

interaction index of 0.109.

For instance, we can notice that the farm presented before (Farm F) is assigned an

overall utility of 24.37 after the aggregation of the individual utilities for castration and

tail docking with the CI.

.

Page 60: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

54

Table 5. Absence of pain induced by management procedures. Measures’ values, individual utilities and overall utilities for each selected farm.

Farm Measures data Utilities (criteria) (criteria)

Castration Tail docking Castration Tail Docking

a No No 100 100 100 100

b No Yes (with anaesthetics) 100 45 60 67.34

c No Yes (without anaesthetics) 100 0 38 40.62

d Yes (with anaesthetics) No 60 100 77 79.36

e Yes (with anaesthetics) Yes (with anaesthetics) 60 45 53 51.09

f Yes (with anaesthetics) Yes (without anaesthetics) 60 0 35 24.37

g Yes (without anaesthetics) No 0 100 47 48.40

h Yes (without anaesthetics) Yes (with anaesthetics) 0 45 27 21.78

i Yes (without anaesthetics) Yes (without anaesthetics) 0 0 8 0

Page 61: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

55

4 Discussion and conclusions

4.1 General methodology

By using the MAUT, it has been proven that the main difficulties described by Botreau

et al. (2007b) faced by a multi-criteria aggregation model are solved by allowing this

method to assign different importance to the measures, by limiting the compensation

between them and by working with data collected on different types of scales.

Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining

results that were comparable to the ones obtained by implementing the WQ protocol.

Compared to the I-spline functions used in the WQ protocol to interpret the measures in

terms of welfare, the use of MACBETH presented several advantages:

First, by using MACBETH the assessment remained more transparent, which could help

to explain to the stakeholders the results and to identify the causes of poor welfare while

encouraging them to take efficient remedial measures which would affect the results.

On the other hand, the assessment remains more flexible. With this method all the

parameters can be changed according to new scientific knowledge (inclusion or

exclusion of measures based on new studies on their influence in animal welfare), due

to changes in societal expectations (if the welfare of animals improves significantly on

all farms, stakeholders may want to be more selective when considering a farm as

excellent), etc. The main drawback from using MACBETH was related to the the M-

MACBETH software implementation, as it does not allow the possibility of exporting

the utility functions formulae to other environments, while typing the information into

the software can be indeed extremely tedious when working with large amounts of

data.

With regard to other methods proposed for the overall evaluation of animal welfare,

such as sum of ranks and sum of scores (Botreau et al., 2007a), the use of the CI as an

aggregator presented an important advantage since it allowed interaction between

measures to be taken into account, thus allowing the possibility to limit the interaction

between them, and in this way, solving one of the main problems described by Botreau

et al. (2007b). The CI was also used in the WQ protocol for the aggregation of some

measures into criteria and for the aggregation of criteria into principles (Welfare

Quality, 2009).

The main difficulty in implementing the least squares-based approach for CI capacity

identification is that it depends on information which the DM cannot always provide, as

Page 62: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

56

are the overall scores for each criteria (Grabisch et al., 2008). Due to the fitting of our

results in accordance with the WQ DMs’ preferences, the results obtained from the WQ

model were used as initial preferences, thus avoiding this issue. However, following the

study of Merad et al. (2013), in other circumstances, it may be difficult for the DMs to

provide overall scores. Nevertheless, there are easier methods for capacity identification

proposed in the literature, such as the minimum variance approach, which requires only

a partial order over the farms as preference information. See Grabisch et al. (2008) for a

review of different methods for capacity identification.

4.2 Examples

In order to apply this methodology to the particular case of an Animal Welfare

assessment we have found some key points to take into account:

4.2.1 Absence of injuries

Defining the performance levels in MACBETH which the DM will have to react to is

extremly important in these sorts of measures. Although theoretically, these measures

can vary between 0 and 100 %, in real conditions the values of the measures usually

vary in a lower interval. For instance, Temple et al. (2011) found values which varied

between 0 and 5.8% animals affected with wounds on the body, between 0 and 8.1% for

tail-biting and between 0 and 1.8% for severe lameness. Thus, it will be in the lower

intervals of the measures in which the utility functions will have to be better fit to the

DMs preferences. For instance, for lameness (Figure 4), we established that its

performance levels varied in intervals of one unit between 0 and 15 % lame animals.

After this point ,we established intervals of ten units. In this way, we were able to fit

more precisely the preferences of the DM in the lower interval of the measure.

The use of linear combinations (weighted sum) is also a key feature which can be

reviewed and modifed in further stages of the study, employed to combine measures

which are defined in two severity categories: lameness and wounds on the body in this

study. By using a linear combination we assume that the measures can compensate each

other, and thus, by using the WQ weights, a farm which has for example 0% moderately

lame animals and 10% severely lame animals will be regarded, in terms of welfare, as a

farm with 10% moderately lame animals and 6% of severely lame animals.

Page 63: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

57

Although it was emphasised throughout the development of the WQ model that welfare

scores should not compensate each other (Botreau et al., 2007b and Veissier et al.,

2011), compensation occurred in the first stages by using linear combinations.

Providing an individual utility function for each severity measure and afterwards

aggregate them by using the CI could prove to be an alternative solution. On one hand,

the model accuracy would increase, but on the other hand, so would the complexity of

the decision process, demanding from the DMs that they interpret a higher number of

measures in terms of welfare.

4.2.2 Absence of disease

In order to simulate the WQ DMs’ preferences, we compared the data for the absence of

disease measures with the warning and alarm thresholds established in the protocol.

However, in the development of the methodology we show that by converting the

original, quantitative data into an ordinal scale (3 qualitative levels: no problem

recorded, a warning or an alarm), it was impossible for the model to distinguish

between herds which slightly or greatly exceeded the thresholds. Further, conversion

into an ordinal scale might be reconsidered, and the measures should be treated as

quantitative ones, using the warning and alarm thresholds as references for the DM to

build the utility functions.

To stay in line with the WQ protocol preferences, we decided to create a utility function

per area rather than calculate a utility per measure. Following this method a large

compensation between disease areas measures’ is allowed. For instance, looking in

Table 3 at the warnings and alarms attributed to the measures gathered in the respiratory

area, a warning is both attributed to the respiratory area on a farm which only has one of

the measures classified with a warning (Farm E), as well as a farm which has the fourth

measure classified with a warning (Farm G). The compensation of measures between

disease areas is a crucial point which must be further studied.

4.2.3 Absence of pain induced by management procedures

A decision tree was used for these types of measures in the WQ protocol. By employing

this method, the two measures were considered together, and a score for each one of the

possible scenarios is given directly by the DMs. This methodology can be considered as

a direct rating. Although our methodology provided us with similar results, according to

Page 64: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

58

Bouyssou et al., (2006) it can be concluded that the use of a direct rating method (for

example by using decision trees) makes the methodology less intuitive as opposed to

considering each measure separately and using an aggregation method based on an

intuitive process, which can be easily revised.

5 Acknowledgements

The present study is part of the PHENOMICS research project which is funded by the

German Federal Ministry of education and research.

6 References

Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:

Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),

Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:

Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.

Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical

foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J

Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-

technical approach for group decision support in public strategic planning: The

Pernambuco PPA case. Group decision and negotiation 23, 5-29.

Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier

I 2007a. Aggregation of measures to produce an overall assessment of animal

welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.

Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and

Veissier I 2007b. Aggregation of measures to produce an overall assessment of

animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.

Botreau R, Capdeville J, Perny P and Veissier I 2008. Multi-criteria evaluation of

animal welfare at farm level: an application of MCDA methodologies.

Foundations of Computing and Decision Science 33, 1-18.

Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy

adopted in Welfare Quality. Animal Welfare 18, 363-370.

Page 65: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

59

Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2000. Evaluation

and decision models: A critical perspective. Kluwer, Dordrecht.

Bouyssou D, Marchant T, Perny P, Pirlot M, Tsoukias A and Vincke P 2006. Evaluation

and decision models with multiple criteria: Stepping stones for the analyst.

Springer, New York, USA.

Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary

Record 17, 357.

Grabisch M 1996. The application of fuzzy integrals in multi-criteria decision making.

European Journal of Operational Research 89, 445-456.

Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification

methods for Choquet Integral based multi-attribute utility theory, Applications of

the Kappalab R package. European Journal of Operational Research 186, 766-

785.

Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and

values tradeoffs. Wiley, New York.

Mayag B, Grabisch M and Labreuche C 2010. An interactive algorithm to deal with

inconsistencies in the representation of cardinal information, in: Hüllermeier E,

Kruse R and Hoffmann F (Eds), Information processing and management of

uncertainty in knowledge-based systems. Theory and Methods. Springer, Book

Series: Communication in computer and information science, vol.80, pp. 148-157.

Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive

Choquet integral through cardinal information. Fuzzy sets and Systems 184, 84-

105.

Merad M, Dechy N, Serir L, Grabisch M and Marcel F 2013. Using a multi-criteria

decision aid methodology to implement sustainable development principles within

an organization. European Journal of Operational Research 224, 603-613.

Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet

integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,

201-227.

Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision

analysis. New York: John Wiley and sons.

Ramsay JO 1988. Monotone regression splines in action. Statistical Science 3, 425-442.

Page 66: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

60

Roy B 1971. Problems and methods with multiple objective functions. Mathematical

Programming 1, 239-266.

Saaty TL 1980. The Analytic Hierarchy Process: Planning, priority setting, resource

allocation. McGraw-Hill, New York.

Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X, Velarde A 2011. Application of

the Welfare Quality® protocol to assess growing pigs kept under intensive

conditions in Spain. Journal of Veterinary Behaviour 6, 138-149.

Yager R 1988. On ordered weighted averaging operators in multi-criteria decision

making. IEEE Transactions on Systems, Man and Cybernetics 18, 183-190.

Vapnek, J and Chapman M 2010. Legislative and regulatory options for animal welfare.

FAO Legislative study 104, FAO, Rome.

Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical

decisions underlying the scoring of animal welfare in the Welfare Quality scheme.

Animal Welfare 20, 89–101.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Growing Pigs.

Lelystad: Wefare Quality® Consortium.

Winckler C 2013. Progress in, the present state of, and challenges for on-farm animal

welfare assessments in Europe. UFAW International Animal Welfare Science

Symposium, 4-5 July 2013. Universitat Autónoma de Barcelona, Spain.

Page 67: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

61

CHAPTER THREE

Validation of a multi-criteria evaluation model for animal

welfare

P. Martín 1, I. Czycholl 1, C. Buxadé 2 and J. Krieter 1

1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany

2 Animal Production Department, Polytechnic University, Madrid, Spain

Page 68: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

62

Abstract

The aim of this paper was to validate an alternative multi-criteria evaluation system to

assess animal welfare on farms based on the Welfare Quality® (WQ) project, using an

example of welfare assessment of growing pigs. This alternative methodology aimed to

be more transparent for stakeholders and more flexible than the methodology proposed

by WQ. The WQ assessment protocol for growing pigs was implemented to collect data

in different farms in Schleswig-Holstein, Germany. In total, 44 observations were

carried out. The aggregation system proposed in the WQ protocol follows a three-step

aggregation process. Measures are aggregated into criteria, criteria into principles, and

principles into an overall assessment. This study focused on the first two steps of the

aggregation. Multi-attribute utility theory (MAUT) was used to produce a value of

welfare for each criterion and principle. The utility functions and the aggregation

function were constructed in two separated steps. The MACBETH method was used for

utility function determination and the Choquet integral (CI) was used as an aggregation

operator. The WQ decision-makers’ preferences were fitted in order to construct the

utility functions and to determine the CI parameters. The validation of the MAUT

model was divided into two steps, first the results of the model were compared with the

results of the WQ project at criteria and principle level, and second, a sensitivity

analysis of our model was carried out to demonstrate the relative importance of welfare

measures in the different steps of the multi-criteria aggregation process. Using the

MAUT, similar results were obtained to those obtained when applying the WQ protocol

aggregation methods, both at criteria and principle level. Thus, this model could be

implemented to produce an overall assessment of animal welfare in the context of the

WQ protocol for growing pigs. Furthermore, this methodology could also be used as a

framework in order to produce an overall assessment of welfare for other livestock

species. Two main findings are obtained from the sensitivity analysis, first, a limited

number of measures had a strong influence on improving or worsening the level of

welfare at criteria level and second, the MAUT model was not very sensitive to an

improvement in or a worsening of single welfare measures at principle level. The use of

weighted sums and the conversion of disease measures into ordinal scores should be

reconsidered.

Keywords: Growing pigs, Welfare Quality, multi-criteria assessment, sensitivity

analysis

Page 69: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

63

1 Introduction

Animal welfare is a multi-dimensional concept, and its assessment should be based on a

variety of measures related to several aspects such as the absence of thirst, hunger,

discomfort, disease, pain, injuries and stress, and the presence of normal behavioural

expressions (Farm Animal Welfare Council (FAWC), 1992)). Due to this fact, a multi-

criteria evaluation model is required for the evaluation of an animal unit (farm,

slaughterhouse). In animal welfare, as well as in other areas, the development of a

multi-criteria evaluation system requires considerable efforts due to its complexity. The

complexity of this kind of model lies in the high number of measures involved, the

varied nature of these measures (qualitative, quantitative, measures recorded in different

scales, precision of the measures, different ranges of variation, etc.), the different

importance of the measures, the interaction between them, and last but not least the

number of stakeholder groups involved, which makes it difficult to arrive at decisions

which accommodate stakeholders’ wants and needs (Botreau et al., 2007).

Welfare Quality® (WQ) developed multi-criteria animal welfare evaluation models for

different livestock species (Botreau et al., 2009). The inputs for the WQ multi-criteria

animal welfare evaluation model are on-farm welfare measures described in the WQ

assessment protocol (Welfare Quality, 2009). The WQ multi-criteria evaluation model

uses different aggregation methods (e.g., decision tree, weighted sum or Choquet

integral) to aggregate measures into an overall assessment (Botreau et al., 2008).

Usually, it is in the development of the model where the greatest efforts are made and

less attention is paid to the credibility of the model. However, validation is a crucial

point in order to build sufficient confidence in the model for it to be used for practical

purposes. Model validation can be divided into three components – verification,

validation and sensitivity analysis – according to Qureshi et al. (1999) and Harrison

(1991). Verification refers to building the model correctly (O’Keefe at al., 1991). It

ensures that the model has been developed in a formally correct manner in accordance

with a specified methodology (Geissman and Schultz, 1991). In the case of a

mathematical model implemented by computer programme, verification establishes that

the program has been written correctly and that it behaves as intended. Validation refers

to building the correct model (O’Keefe et al., 1991). Most attempts at model validation

check agreement between the model and real system outputs or between the model and

expert opinions (Qureshi et al., 1999). Sensitivity analysis examines the extent of

Page 70: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

64

variation in predicted performances when parameters are varied over some range of

interest. Sensitivity analysis provides information on the priority areas for refinement if

further versions of the model are to be developed (Qureshi et al., 1999).

The WQ multi-criteria evaluation model was tested on commercial European farms

during the WQ project and partly adjusted according to these results. Also,

classification of some of these farms was compared with the general impression of

observers who carried out audits of the farms (Botreau et al., 2009). Since publication

of the protocols, different studies on the validation of the measures used in the protocol

have been carried out (Temple et al., 2011a, b, 2012a, b, 2013), assessing whether the

measures included in the protocol are sensitive enough to distinguish between different

types of housing systems, and between farms. However, there are few studies which

have assessed whether the model is sensitive at criteria, principle or overall assessment

level, and whether it can distinguish between different farms (de Vries et al., 2013).

The aim of this paper was to validate an alternative multi-criteria evaluation model to

assess animal welfare on farms, within the WQ framework, employing, as an example, a

growing pigs’ welfare assessment. The objective was to compare the results obtained by

implementing our approach with the results obtained by using the approach proposed in

the WQ protocol, as well as assessing its sensitivity to distinguish between commercial

growing pigs’ farms and to demonstrate the relative importance of welfare measures in

the different steps of the multi-criteria aggregation process.

2 Material and methods

2.1 Data

Data collection took place between January 2013 and January 2014 on 8 German

growing pig farms in Schleswig Holstein. All the farms were assessed by the same

observer, who was trained to use the WQ assessment protocol for growing pigs

(Welfare Quality, 2009) by members of the WQ project group. The pigs on the farms

were housed either conventionally or according to the guidelines of the German animal

welfare label “Tierwohllabel” of the German animal welfare organisation “Deutscher

Tierschutzbund e.V.” (Tierschutzbund, 2013). Each farm was visited six times at two

consecutive growing periods. Thereby, during each of the two growing periods, three

assessments took place: the first protocol assessment two weeks after entry into the

growing stable at an average weight of the pigs of 40 kg (Farm Visit 1), the second in

Page 71: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

65

the middle of the growing period at an average weight of 75 kg (Farm Visit 2) and the

third assessment two weeks before beginning of sales to the slaughterhouse at an

average weight of 100 kg (Farm Visit 3). Changes in management occurred on one of

the farms and due to this fact this farm was assessed only two times. In total, the

protocol was run 44 times. The entire WQ protocol for growing pigs was carried out at

each farm visit. Data were collected at pig and herd level, depending on the type of

measurement. After data collection, data were expressed as welfare measures at the herd

level. These welfare measures could be either quantitative or qualitative and were

expressed on different scales depending on the measure (e.g., percentage of lame

animals or coughs per animal in 5 minutes) following the WQ protocol (Welfare

Quality, 2009).

Table 1. Quantitative animal based measures with scoring scale (Welfare Quality,

2009).

Welfare measure Scale

Body condition 2 % lean pigs Bursitis 1 % pigs affected with moderate bursitis Bursitis 2 % pigs affected with severe bursitis Manure on the body 1 % pigs with 20-50% of body surface soiled with faeces Manure on the body 2 % pigs with >50% of body surface soiled with faeces Space allowance Sqm/ 100 kg pig Lameness 1 % pigs moderately lame Lameness 2 % pigs severely lame Wounds on the body 1 % pigs with moderate wounds on the body Wounds on the body 2 % pigs with severe wounds on the body Tail biting 2 % pigs with evidence of tail biting Twisted snouts 2 % pigs with evidence of twisted snout Pumping 2 % pig with laboured breathing Pneumonia % slaughter pigs with pneumonia Pericarditis % slaughter pigs with pericarditis Pleuritis % slaughter pigs with pleuritis Coughing Number of coughs per animal in 5 minutes Sneezing Number of sneezes per animal in 5 minutes Scouring % pens with liquid faeces Rectal prolapse 2 % pigs with evidence of rectal prolapse Skin condition 2 % pigs with ≥ 10 % of skin inflamed Milkspots % pigs slaughter with milkspots on liver Hernia 1 % pigs with hernia/rupture not bleeding or touching the floor Hernia 2 % pigs with hernia/rupture bleeding or touching the floor Mortality % mortality on farm during last year Negative behaviour % negative behaviour out of all social behaviour Exploratory behaviour % pen investigation out of exploration behaviours

% enrichment investigation out of exploration behaviours Human-animal relationship % pens showing panic response QBA descriptors 0-125 mm scale

Page 72: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

66

2.2 Aggregation of welfare measures into criteria and principles

WQ proposes a three-step aggregation process (Welfare Quality, 2009), welfare

measures are aggregated into 12 criteria, these criteria are in turn aggregated into four

principles, and finally these four principles are combined into an overall assessment. In

this study we focused on the first two steps of the aggregation process (Figure 1).

Figure 1. Welfare Quality® bottom-up approach for integrating the data of the different

welfare measures into an overall assessment.

In the present study, two methodologies were used to produce criteria and principle

values from the data of the welfare measures collected in the farms observed: first,

following the WQ assessment protocol for growing pigs (Welfare Quality, 2009) and

second, following an alternative methodology which consisted of the use of MACBETH

and the Choquet integral in the context of the multi-attribute utility theory (MAUT).

Absence of prolonged hunger

Absence of prolonged thirst

Good feeding

Comfort around resting

Thermal comfort

Ease of movement

Good housing

Absence of injuries

Absence of disease

Absence of pain induced by management procedures

Good health

Social behaviour

Other behaviours

Good human-animal relationship

Appropriate behaviour

Positive emotional state

Measures Criteria Principles Overall value

Page 73: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

67

Details of the aggregation of the measures into criteria and principles following both

methodologies are given in the annexed document.

2.2.1 Welfare Quality® (WQ)

-Aggregation of measures into criteria

In the first step, welfare measures were aggregated into the 12 corresponding criteria.

WQ used different types of aggregation of measures into criteria. For some criteria, the

numbers of moderate and severe problems were first combined with a weighted sum,

producing a measure index, on a scale from 0 (worst) to 100 (best). Afterwards, these

index values were converted into measure scores (expressed on the same 0-100 scale),

using spline functions (Ramsay, 1988) fitted by least-square methods. Finally, the

Choquet integral (CI) was used to combine the scores for the different measures into a

score for the criterion. For other criteria, the measures were first transformed into an

ordinal scale, which consisted of assigning warning or alarms, depending on the value

of the measures. The number of warnings and alarms were then combined into an index

for the criterion, and afterwards this index was converted into a criterion score using l-

spline functions. Decision trees were used to produce the criterion score for other

measures. Further information on the development and employment of these operators

can be found in Botreau et al. (2008, 2009) and Veissier et al. (2011).

-Aggregation of criteria into principles

In the second step, WQ used the CI to aggregate the 12 criteria into four principles. This

integral uses weights to combine the different criterion scores into one principle score

(expressed on the 0-100 scale), while limiting the possibility that a poor score of a

criterion is compensated by other excellent scores (Botreau et al., 2007; Veissier et al.,

2011).

2.2.2 Multiattribute Utility Theory (MAUT)

We developed a multi-criteria evaluation system which aimed to produce comparable

results to the methodology produced in the WQ protocol but remained more transparent

and flexible.

-Aggregation of measures into criteria

In the first step of the aggregation, MAUT (Keeney and Raiffa, 1976) was used to

produce a value of welfare for each criteria. The application of the MAUT consisted of

Page 74: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

68

two separated steps, the utility function determination and the aggregation function

determination.

Utility function determination (MACBETH)

The utility function gives value to the measure in terms of welfare, it represents the

preferences of the decision-maker (DM) over the measures and its different values. For

example, 5% of lameness on a farm may be interpreted as a worse situation than 5% of

wounds on the body. There are different methods for utility function determination.

MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique)

was chosen for several reasons: First, due to the available information on how to use

this method to facilitate a consensus among stakeholders (Parnell et al., 2013, Bana e

Costa et al., 2014), which is one of the main difficulties that a multi-criteria evaluation

system for animal welfare faces. Second, due to the fact that this method makes it easier

to judge the different attractiveness of options with an increasing number of criteria, due

to the use of qualitative judgments, and moreover, a scale of indifferent categories

(‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et

al. (2004). Third, MACBETH allows for a comparison of not only qualitative

performance levels but quantitative performances too, with no need for a previous

conversion of the quantitative scales into a qualitative scale, allowing a solution to one

of the problems presented by Botreau et al. (2007). Fourth, the determination of the

utilities process remains transparent due to the extensive bibliography on it (Bana e

Costa et al., 1999, 2004) and it is easier to explain to the stakeholders due to the

interactive software provided (M-MACBETH).

MACBETH is a methodology which requires only qualitative judgements to quantify

the relative attractiveness (utilities) of options (farms). In order to elicit a marginal

utility function with MACBETH, the first step is to define whether the measure

performs as a quantitative measure or as a qualitative one and which are the

quantitative/qualitative performance levels of the measure. The next step is to fill in a

matrix, giving qualitative judgements regarding the difference of attractiveness between

the different quantitative performance levels of the measure. The qualitative judgements

can be rated as ‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’. As

each judgement was given, the matrix’s consistency was automatically verified with an

interactive algorithm based on linear programming (Mayag et al., 2010), and judgment

modifications were suggested which could be made to fix any detected inconsistency.

From the complete and consistent matrix of judgements, MACBETH creates a

Page 75: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

69

numerical scale. With the numerical scale, MACBETH produces the marginal utility

function (u) for each measure. In order to be able to aggregate the different measures

into criteria, this method also allows normalisation of the raw data expressed in

different scales into an absolute value scale, ranging, for example, between 0 and 100,

where 0 is the worst situation one can find on a farm and 100 the best situation.

After the initial calculation of the MACBETH scale, it was checked to ensure that it

adequately represented the relative magnitude of the WQ DMs judgements, if not, the

scores were adjusted.

Aggregation with the Choquet integral (CI)

In a second step, the CI (Choquet, 1953, Murofushi and Sugeno, 1989, Grabisch, 1996)

was used to aggregate the different measures into the corresponding criteria. In order to

combine measures (individual utilities calculated with MACBETH) into criteria using

the CI, the first step was the capacity identification. Capacities can be regarded as a

weighting vector involved in the calculation of weighted sums. Seen as an aggregation

operator, the CI, takes into account the different importance of the measures and the

interaction between them. These interactions can be complementary (positive) or

substitutive (negative). When the interactions between two measures are positive,

compensation is limited between them, whereas when the interactions are negative,

compensation is allowed between them. The number of variables involved in the CI

increases exponentially, along with the coefficients which define a capacity. To keep

things simple, it may be preferable to restrict to two-additive solutions.

Capacity identification, based on the least squares (LS) approach, was implemented

within the Kappalab R package following the method described by Grabisch et al.

(2008). In order to use the LS identification method, the utilities calculated with

MACBETH corresponding to the examples’ data were used as subsets against which the

initial preferences of the WQ DMs are expressed.

The results of the aggregation of the examples’ data following the WQ protocol were

used as initial preferences in order to fit the model to the WQ DMs’ preferences.

With this methodology, a progressive interactive approach can be developed after an

initial calculation of the CI, where additional constraints to the Shapley values, which

measure the overall importance of a measure (criterion), and the interaction indices can

be imposed in order to fit more precisely the WQ DMs preferences.

Page 76: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

70

According to Mayag et al. (2011) given (x1, x2, …, xn) the individual utilities for the

different measures, the Choquet integral with respect to a two-additive capacity can be

written as follows:

Where vi represents the importance of the measure i and corresponds to the Shapley

value of µ (capacity) and Iij represent the interaction between measure i and j.

-Aggregation of criteria into principles

Since criteria are already interpret in terms of welfare in this step, there is no need for a

utility function determination. Again, for capacity identification in the context of the

Choquet integral, we implemented the least-squares-based approach. In this step, we

used the same aggregation operator as in the WQ protocol, and due to this fact, in order

to determine the CI parameters, we used the subsets used in the WQ protocol as

learning data and the values given by the WQ DMs’ for these subsets as preferences.

The utility functions for the different welfare measures as well as the different datasets

used to fit the CI parameters to the WQ DM’s preferences, and the values of the

parameters obtained, can be found in the annex.

2.3 Model validation and sensitivity analysis

According to Harrison (1991), model validation is usually divided into three steps:

verification, validation and sensitivity analysis. Due to the fact that our model was

based on the WQ methodology, the different formulae proposed in the WQ protocol

were verified before determining our model to ensure that the model behaved as

intended. The different calculations of our model (MAUT), whether implemented in

MACBETH or in R, were checked by means of small datasets. The information of the

verification of the WQ and MAUT models can be found in the annex together with the

description of both methodologies. Thus, the model validation is divided into two steps

here, validation and sensitivity analysis.

2.3.1 Validation of the MAUT model

Due to the fact that the WQ model has been already tested for validity (Botreau et al.,

2009) we compared the results for the 44 observations both at criteria and principle

Page 77: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

71

level, obtained with our methodology (MAUT) and with the WQ methodology, which

can be considered as a gold standard. The Euclidean distances for the 44 observations

between the WQ and the MAUT for each criterion and principles were calculated.

When the Euclidean distance between both methods for a criterion/principle was greater

than 0, the Wilcoxon Signed-Rank test confidence intervals between pairs of means

were calculated because the assumption of normality was often not appropriated. A

confidence interval for the difference between two means specifies a range of values

within which the difference between the means of the two models may lie. The

confidence interval for the differences between two means contains all the values of µ1 -

µ2 (the difference between the models’ means) which would not be rejected in the two-

sided hypothesis of:

Ho: µ1 - µ2 = 0

Against:

H1: µ1 - µ2 ≠ 0

If the confidence interval includes 0, we can say that there is no significant difference

between the means of the two models, at a given level of confidence. In this study, a

level of confidence of 90%, (α=10%), was established.

2.3.2 Sensitivity analysis of the MAUT model

In order to assess whether the model is sensitive to our farms, the values of single

welfare measures were replaced with an improved and a worsened value. These values

corresponded to the first or the third quartiles of the data (Table 1). Generally, the first

quartile corresponded to an improved situation due to the fact that the incidence of the

problem was being reduced. However, for some other measures, such as space

allowance or exploratory behaviours, the improved situation corresponded to the third

quartile due to the fact that an increase in the value of the measure led to improved

welfare. We compared the criteria and principles’ values obtained in the original

situation with the improved or worsened situation with the Wilcoxon Signed-Rank test

confidence intervals for the difference between means.

-Sensitivity analysis at criteria level

Figure 2 below shows an example of how the sensitivity analysis was carried out for the

comfort around resting criterion. First, original data for the 44 observations were

aggregated into the corresponding criteria following the MAUT methodology, having in

Page 78: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

72

total 44 values for each criteria (a, in Figure 2). Second, the data for the 44 observations

of only one measure, for instance of Manure 1, were replaced by the improved value

(for Manure1, the first quartile). Again, the data was aggregated with the MAUT into

the corresponding criterion (b, in Figure 2). Third, using the Wilcoxon Signed-Rank

test, the confidence interval of the difference between the means of the criterion values

obtained with the original data and the criterion values obtained with the improved data

was calculated with a confidence level of 90% (α=10), (c, in Figure 2). Fourth, the

second (d, in Figure 2) and third (e, in Figure 2) steps were repeated but this time the

original data for the 44 observations of the same measure, Manure 1, were replaced by

the worsened value (the third quartile value for Manure1). These steps were repeated

modifying one measure at a time for all the criteria.

Figure 2. Outline of the methodology followed to perform the sensitivity analysis at

criteria level with the example of comfort around resting following the five steps (a, b,

c, d and e) previously described.

-Sensitivity analysis at principle level

The same methodology was used as for the sensitivity analysis at criteria level, but the

results were compared at principle level. Figure 3 shows an example of how the

sensitivity analysis was carried out for the good housing principle.

Confidence interval of the difference between means (Original-Improved)

Confidence interval of the difference between means (Original-Worsened)

Manure 1

Manure 2

Wounds 1

Wounds 2

Comfort around resting

a)

Manure 1

Manure 2

Wounds 1

Wounds 2

Comfort around resting

b)

Manure 1

Manure 2

Wounds 1 Wounds 2

Comfort around resting

d)

c)

e)

Original values

Improved values

Worsened values

Page 79: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

73

Figure 3. Outline of the methodology followed to perform the sensitivity analysis of

good housing following the five steps (a, b, c, d and e) previously described.

3 Results

Five welfare measures, twisted snouts, rectal prolapse, shivering, panting and huddling

did not occur in any of the 44 observations. The mean, median (range), first quartile

(1Q) and third quartile (3Q) of the welfare measures with prevalence in the 44

observations are listed in Table 2. Some measures were observed with a prevalence at

farm level of lower than 1%, these measures were lean animals, bursitis 2, lameness 1,

c)

Confidence interval of the difference between means (Original-Improved)

Confidence interval of the difference between means (Original-Worsened)

e)

Manure 1

Manure 2

Wounds 1

Wounds 2

Comfort around resting

a)

Shivering

Panting

Huddling

Space allowance

Thermal comfort

Good housing

Sqm/100 kg C

b)

c) Worsened values

Improved values

Original values

Manure 1

Manure 2

Wounds 1

Wounds 2

Comfort around resting

Shivering

Panting

Huddling

Space allowance

Thermal comfort

Good housing

Sqm/100 kg C

Manure 1

Manure 2

Wounds 1

Wounds 2

Comfort around resting

Shivering

Panting

Huddling

Space allowance

Thermal comfort

Good housing

Sqm/100 kg C

Page 80: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

74

lameness 2, scouring, skin discolouration, hernia 1 and hernia 2. The low prevalence of

coughs and sneezes occurred due to the fact that it was not possible for the assessor to

identify the number of animals coughing or sneezing and due to this fact the number of

coughs and sneezes was divided by the total number of animals in the pen.

Table 2. Means, standard deviation (SD), first quartile (1Q) and third quartile (3Q) of

welfare measures for the 44 observations.

Welfare measure Unit Mean SD 1Q 3Q

Body condition 2 % 0.05 0.26 0 0 Number of drinkers places sufficient

no. Yes (35) No (9)

Drinkers clean no. Yes (44) No (0) 2 drinkers/animal no. Yes (44)

No (0)

Bursitis 1 % 50.74 13.75 40.04 58.39 Bursitis 2 % 0.96 1.32 0.00 1.48 Manure 1 % 10.52 12.09 1.91 17.95 Manure 2 % 3.53 6.35 0.00 4.06 Space allowance Sq m/100kg 1.56 1.61 0.96 1.77 Lameness 1 % 0.29 0.44 0.00 0.69 Lameness 2 % 0.24 0.46 0.00 0.00 Wounds 1 % 8.71 7.24 2.73 14.18 Wounds 2 % 1.03 1.98 0.00 1.33 Tail biting 2 % 2.88 3.09 0.55 4.60 Pumping 2 % 0.05 0.24 0.00 0.00 Pneumonia % 5.71 3.16 3.10 8.10 Pericarditis % 1.55 0.92 0.90 1.83 Pleuritis % 2.49 2.29 0.00 3.55 Coughing no. 0.18 0.24 0.01 0.26 Sneezing no. 0.04 0.04 0.01 0.05 Scouring % 0.23 1.51 0.00 0.00 Skin condition 2 % 0.73 4.73 0.00 0.00 Milkspots % 9.79 15.19 1.20 9.60

Hernia 1 % 0.29 0.43 0.00 0.70 Hernia 2 % 0.01 0.10 0.00 0.00 Mortality % 2.5 0.79 2.00 3.00 Castration no. No (44)

With (0) Without (0)

Tail docking no. No (38) With (0)

Without (6)

Negative behaviour % 33.13 21.96 17.68 39.76 Pen investigation % 24.57 8.91 19.46 29.8 Enrichment investigation % 5.78 3.23 3.60 7.01 Panic % 8.3 17.34 0.00 2.5 QBA descriptors1 mm - - - -

1 Median, range and quartiles of descriptors (active, relaxed, fearful, etc.) for the Qualitative Behaviour

Assessment not shown.

Page 81: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

75

3.1 Validation of the MAUT model

Means and ranges of variation for the welfare criteria and principles obtained with the

WQ and MAUT methodologies are given in Table 3. The Euclidean distances (ED)

between the WQ and the MAUT methods for each criterion and principle are also

depicted in Table 3 along with the confidence intervals of the difference between the

means for each criterion.

Table 3. Means (range) of welfare criteria and principles obtained with the WQ

methodology and the MAUT, Euclidean distances (ED) between the methods for each

criterion and principle and confidence intervals for the differences of means.

WQ MAUT ED Confidence

interval

Welfare criteria

Absence of hunger 99.71 (90.14-100) 99.71 (90.14-100) 0.0 -

Absence of thirst 90.8 (55-100) 91.55 (59.1-99.9) 12.3 -0.1, -0.09

Resting comfort 60.31 (27.71-85.41) 60.31 (27.71-85.41) 0.0 -

Thermal comfort 100 (100-100) 100 (100-100) 0.0 -

Space allowance 67.95 (21.94-98.51) 67.95 (21.94-98.51) 0.0 -

Absence of pain 46.45 (38-100) 48.74 (40.65-100) 16.3 2.64, 2.65

Absence of injuries 82.18 (55.68-97.06) 82.68 (54.61-97.22) 10.3 0.17, 0.83

Absence of disease 72.36 (24.99-83.97) 72.42 (24.88-84.37) 9.3 0.21, 0.45

Social behaviour 54.70 (14.52-84.76) 54.70 (14.52-84.76) 0.0 -

Exp. Behaviour 30.96 (13.82-44.83) 30.96 (13.82-44.83) 0.0 -

HAR 89.53 (15.75-99.99) 89.53 (15.75-99.99) 0.0 -

QBA 30.84 (6.91-52.36) 30.84 (6.91-52.36) 0.0 -

Welfare principle

Feeding 91.13 (56.76-100) 91.93 (60.92-99.91) 12.7 -0.09, 0.01

Housing 62.63 (36.94-87.80) 61.81 (35.55-87.57) 18.6 -1.61, -0.29

Health 52.47 (27.63-85.41) 57.55 (27.82-86.06) 38.5 4.44, 6.08

Behaviour 39.57 (16.8-47.90) 34.14 (15.81-48.58) 39.9 -5.84, -4.84

Confidence intervals of the criteria absence of hunger, comfort around resting, thermal

comfort, space allowance, social behaviour, exploratory behaviour, good human-animal

relationship and positive emotional state not shown due to no differences between the

WQ and MAUT methods (Euclidean distance=0).

Page 82: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

76

There were no differences between the methods for the following criteria: absence of

hunger, comfort around resting, thermal comfort, space allowance, social behaviour,

exploratory behaviour, human-animal relationship and QBA. The differences between

the methods for absence of thirst, absence of pain induced by management procedures,

absence of injuries and absence of disease were small being the Euclidian distances

12.3, 16.3, 10.3 and 9.3 respectively, and being the confidence intervals between the

means very narrow and close to the 0 value. Comparing the differences between the

methods for the four welfare principles, it can be seen that good feeding and good

housing had lower differences than good health and appropriate behaviour. The

confidence interval for the differences between methods’ means for good feeding

included the 0 value, which means that there were no significant differences between

the methods.

3.2. Sensitivity analysis of the MAUT model

For the sensitivity analysis, only the quantitative measures were considered due to the

fact that the variations in quantitative measures were not comparable in terms of

sensitivity with the rest of the measures. Thus, the qualitative measures related to

absence of thirst and absence of pain induced by management procedure criteria were

excluded from the study. Five quantitative welfare measures, twisted snouts, rectal

prolapse, shivering, panting and huddling, were also excluded from the sensitivity

analysis because of no variability between the observations. For some measures, there

was no influence on the results by improving/worsening their values either at criteria or

at principle level, and thus, the confidence intervals could not be calculated due to the

fact that the observations were tied. These measures all belong to the disease criteria,

and were pumping, pleuritis, coughs, sneezes, scouring, skin condition and hernias 1

and 2.

3.2.1 Sensitivity analysis at criteria level

Figure 4 shows the confidence intervals of the difference of means between the original

situation and the improved situation (grey) and between the original situation and the

worsened situation (black) for each criteria with respect to the welfare measure

modified.

Page 83: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

77

Figure 4 Confidence intervals of the difference in means between the original situation

and the improved situation (grey) and between the original situation and the worsened

situation (black) for each criteria with respect to the modified welfare measure.

The most important welfare measure for worsening the level of comfort around resting

in our study was manure 1. An increase in the mean value of manure 1 from 10.51 to

17.95 resulted in a decrease in the mean values of comfort around resting, which varied

between -21.85 and -17.55 with a 90% confidence level. However, manure 1 had a low

influence on improving the level of welfare, although the differences between the

original value and the improved value were high, being the mean of the original

situation 10.52 and 1.91 of the improved situation, the confidence interval of the

differences of means was 1.63 to 3.78.

For absence of injuries there was no single measure which led to an important

difference between the original and the improved or worsened situation. The confidence

intervals of the differences of means between the original and the improved situation

indicated that improving or worsening the level of each measure never led to an increase

or decrease at criteria level greater than 10 units, which in this study was considered as

a threshold to estimate when a measure was influencing the results at criteria level.

For absence of disease, only three measures (pneumonia, milkspots and mortality) had

an influence on improving or worsening the level of welfare. An increase in the mean

Page 84: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

78

value of pneumonia from 5.71 to 8.10 resulted in a decrease in the mean value of

absence of disease, which lay between -15.13 and -10.22 with a 90% confidence level.

A decrease in the mean value of pneumonia from 5.71 to 3.10 resulted in an increase in

the mean value of absence of disease, which varied between -0.9, 10.22 with a 90%

confidence level. For milkspots, substituting the original values with the first quartile

and the third quartile resulted in an improved situation of welfare. The decrease in the

mean value of milkspots from 9.70 to 1.20 and from 9.70 to 9.6 resulted in an increase

in the mean value of absence of disease, which lay between 14.78 and 23.68 with a 90%

confidence level. An increase in the mean value of mortality from 2.5 to 3.00 resulted in

a decrease in the mean value of absence of disease, which varied between -12.69 and -

12.54 with a 90% confidence level. A decrease in the mean value of mortality from 2.50

to 2.00 resulted in an increase in the mean values of absence of disease, which varied

between 11.28 and 12.69 with a 90% confidence level. The rest of the measures, i.e. the

results tied, had no influence at all and thus the confidence interval of the differences of

means could not be calculated.

Pen and enrichment investigation had low influence on improving or worsening the

values of the exploratory behaviour criterion. The confidence intervals of the

differences in means between the original and the improved situation indicated that

improving or worsening the level of each measure never led to an increase or decrease

of greater than 10 units at criteria level.

For the criteria conformed by a single measure, such as absence of hunger (assessed by

percentage of lean animals), space allowance (sq m/100kg pig), social behaviour

(negative behaviour) and human-animal relationship (panic), these measures had greater

influence to improve or worsen the level of welfare than measures which were

aggregated to form criteria, although the range of variation for some of these measures

was low, as is the case of lean animals.

3.2.2 Sensitivity analysis at principle level

Figure 5 shows the confidence intervals of the difference in means between the original

situation and the improved situation (grey) and between the original situation and the

Page 85: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

79

worsened situation (black) for each principle with respect to the modified welfare

measure.

Figure 5 Confidence intervals of the difference of means between the original situation

and the improved situation (grey) and between the original situation and the worsened

situation (black) for each principle with respect to the modified welfare measure.

By aggregating the criteria into principles, the sensitivity of the model to an

improvement or worsening of the values of the measures was lower than at criteria

level. We found that only two measures which led to important differences in the

confidence intervals of the means at criteria level also led to important differences at

principle level (confidence intervals of the differences in means in which at least one of

the confidence limits reached 10 units). These measures were manure on the body 1

(worsened) and space allowance (improved and worsened). For some other measures, at

least one of the confidence limits of the confidence interval reached values higher than 5

units and lower than 10 units: these measures were manure 2 (improved), lameness 2

(improved/worsened), pneumonia (worsened), milkspots (improved/worsened) and

mortality (improved/worsened). The rest of the measures had little influence on

improving the welfare at principle level, being the confidence limits lower than 5 units.

Page 86: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

80

4 Discussion

4.1 Data

In the present study, real data instead of simulated data was used in order to perform the

validation and the sensitivity analysis of the model. The main advantage of using real

data was that the actual performance of the measures is known (prevalence, variation,

interactions between measures), whereas the use of simulated data, as carried out by

Vries et al. (2013), would assess the performance of the model in extreme situations

which may not occur in practical conditions. On the other hand, by using real data some

measures may have low variation or non-prevalence on farms, and thus, it may be

difficult to assess the sensitivity of the model for these measures. However, we found

comparable results with the study of Temple et al. (2011b) and thus we could assume

that our results may be representative of the growing pigs. Running the WQ protocol on

a larger scale of farms may be necessary to obtain more information on the actual

variation in the welfare measures, due to the fact that few studies have yet been carried

out.

4.2 Validation

4.2.1 Validation at criteria level

There were no differences between the WQ and the MAUT methods for the criteria

assessed by just one welfare measure, such as absence of hunger, space allowance,

social behaviour, positive emotional state and exploratory behaviour, which was

assessed by two measures but combined using a weighted sum in both methodologies

before determining the I-spline (WQ) and the utility functions (MAUT). From this, it

can be concluded that the utility functions determined in MACBETH perfectly fitted the

I-spline functions proposed in the WQ protocol.

Slight difference were found for the criteria comfort around resting and absence of

injuries, which are assessed by several measures. These differences appear to be related

to the aggregation step, not with the utility function determination, since the utility

functions determined in MACBETH perfectly fitted the I-spline functions proposed in

the WQ protocol, also for the measures which form these criteria. Differences between

the methods did occur however although the differences in the parameters of the CI

between the methods were minor, and did not lead to differences between the methods

for the learning datasets when these parameters were implemented in a large data set.

Page 87: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

81

Although the differences were minor, this highlights the importance of the aggregation

of the parameters, even though varying them slightly can produce differences in the

results.

The differences between the methods for absence of disease are explained by the

different methodologies used in the WQ and MAUT models due to the fact that WQ

uses a weighted sum to combine the number of warnings and alarms found in the

different disease areas before determining the I-spline function. In this study a utility

function was first produced per disease area and the utilities were then aggregated using

the CI.

There were small differences or almost no differences between the methods for the

qualitative criteria (absence of thirst, thermal comfort and absence of pain induced by

management procedures) although the methodologies used in the WQ and the MAUT

were very different. This was due to the fact that the datasets used to determine the

aggregation parameters of the CI covered all the possible scenarios found on a farm, and

thus, once the model was adjusted, there could be no further variations.

4.2.2 Validation at principle level

The differences in the results at principle level were related with two factors, first the

differences between the methods at criteria level and second due to the parameters used

in the aggregation step. Good feeding, which is in turn assessed by absence of hunger

and absence of thirst, was the principle with lower differences between methods at

principle level, and as shown by the two criteria that form it had almost no differences

between methods. Thus it was possible to estimate that the differences between the

methods were mainly caused by the parameters used in the aggregation step. Comparing

the parameters of the CI used in the WQ protocol and the parameters used in this study,

it is possible to see small differences, although, differences between the methods were

found. The Shapley values (which measure the importance of the different criteria) used

in WQ were 0.39 and 0.61 for absence of hunger and absence of thirst, and the

interaction index between both criteria was 0.66. In this study, the Shapley values

assigned to absence of hunger and absence of disease were 0.38 and 0.62 respectively,

and the interaction index was 0.64.

The differences between methods were also small for good housing. There were no

differences at all between the methods for the criteria that form this principle, thus, it

Page 88: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

82

can be concluded that the aggregation parameters were responsible for the differences at

principle level.

Larger differences between the methods occurred for good health and appropriate

behaviour compared to good feeding and good housing. The effect of the differences

between the aggregation parameters for good health was joined with the differences

between the results at criteria level. WQ proposes three-additive capacities for the

aggregation of the criteria which form the appropriate behaviour principle, whereas we

decided to limit the capacity to two-additive solutions to keep things simple. The

differences between methods appear to be related to the differences between

considering interactions between pairs of criteria (two-additive capacity) and

considering interactions between three criteria (three-additive capacity).

4.3 Sensitivity analysis

For the sensitivity analysis, the original values were modified by improved or worsened

values which corresponded to the first and the third quartiles of the data. One of the

problems for the sensitivity analysis of an overall welfare assessment arises when the

range of variation of a measure is not known. The low variation of some measures could

explain the low influence on improving or worsening the welfare both at the criteria and

principle levels of these measures due to the fact that the first and the third quartiles of

the measures were not representative of an improvement or a worsening in the level of

welfare. The means and standard deviation for the measures with low incidence were

compared to the means and standard deviation of welfare measures presented in the

study of Temple et al. (2011b), where the WQ protocol was run to assess the welfare of

growing pigs kept under intensive conditions in Spain. This comparison aimed at

estimating whether the low influence of these measures might only have occurred in our

study due to the values chosen as an improved or a worsened situation of welfare, or

whether the low influence of these measures can be generalised due to similar

prevalence in other studies,

4.3.1 Sensitivity analysis at criteria level

Comfort around resting

Two main conclusions can be drawn for the sensitivity analysis for comfort around

resting. First, the low influence of bursitis in this study could have been caused by its

Page 89: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

83

low variation at farm level. For bursitis 1 and bursitis 2 respectively, mean values and

standard deviations of 50.74 ± 13.75 and 0.96 ± 1.32 were found, whereas Temple et al.

(2011b) presented values with higher variation for these measures, 45.06 ± 21.04 and

4.4 ± 5.6 respectively. Second, for manure on the body, similar values to Temple et al.

(2011) were found. However, although manure 2 assessed a severe condition of welfare

and manure 1 a moderate condition, the results did not indicate a greater influence of the

severe condition but of the moderate condition. Thus, it can be assumed that due to the

use of a weighted sum to aggregate the moderate and the severe conditions before

determining the utility function, compensation occurred between both levels, and the

model was not sensitive to the severe condition due to the fact that its values were

smaller than the values of the moderate condition. Although it was emphasised

throughout the development of the WQ model that welfare scores should not

compensate each other (Botreau et al., 2007 and Veissier et al., 2011), compensation

occurred in the first stages by using linear combinations, which were used both in the

WQ protocol and in this alternative methodology. Providing an individual utility

function for each severity measure and aggregating them afterwards by using the CI

could prove to be an alternative solution. On the one hand, the model accuracy would

increase, but on the other so would the complexity of the decision process, demanding

from the DMs that they interpret a higher number of measures in terms of welfare.

Absence of injuries

For absence of injuries there was no single measure which led to an important

difference between the original and the improved or worsened situation. There were no

differences in the confidence intervals for the moderate and severe conditions of

lameness and wounds on the body either. Low prevalence were found at farm level for

the measures which form this criteria. Comparing our data with the study of Temple et

al. (2011b) similar values for the absence of injuries measures were found. Temple et

al. (2011b) found means and standard deviations for lameness1, lameness2, wounds on

the body 2 and tail biting of 0.2 ± 0.43, 0.2 ± 0.45, 0.9 ± 1.38 and 0.9 ± 2.02

respectively, whereas the values found in this study for the same measures were 0.29 ±

0.44, 0.24 ± 0.46, 1.03 ± 1.98 and 2.88 ± 3.09. Thus, it can be concluded that due to the

general low variance of these measures on farms (comparable to other studies), these

measures have a low influence on improving or worsening the level of welfare.

Page 90: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

84

Absence of disease

What can be concluded from the sensitivity analysis for absence of disease is that by

converting the original data into an ordinal scale (three qualitative levels: no problem

recorded, a warning or an alarm), the original values at criteria level only changed when

alarm or warning thresholds were reached. Due to this fact, the model was only

sensitive when the number of warnings or alarms were changed by improving or

worsening the measures values. Thus, it was impossible for the model to distinguish

between situations where the thresholds were slightly or greatly exceeded. Further,

conversion into an ordinal scale might be reconsidered, and the measures should be

treated as quantitative ones, using the warning and alarm thresholds as references for the

DM to build the utility functions.

Exploratory behaviour

Pen exploration and enrichment exploration had low influences on improving or

worsening the values of exploratory behaviour criterion. The values obtained for

exploration of enrichment material were lower than values obtained in the study of

Temple et al. (2011b). Thus, it can be concluded that the low influence of this measure

lay in its low variability. However, although the ranges of variation for pen

investigation were wider and similar to the values obtained by Temple et al. (2011b),

the influence of this measure was low. It can be concluded that compensation occurred

to form the criteria values due to the fact that a weighted sum was used to combine pen

investigation and enrichment investigation, and enrichment investigation is considered

more important than pen investigation. This compensation did not allow the model to be

sensitive to pen investigation.

Absence of hunger, space allowance, social behaviour and human-animal relationship

As can be seen for the criteria conformed by a single measure, such as absence of

hunger (assessed by % of lean animals), space allowance (sq m/100kg pig), social

behaviour (negative behaviour) and human-animal relationship (panic), the welfare

measures had greater influence to improve or worsen the level of welfare than measures

aggregated to form criteria, although the range of variation for some of these measures

was low, as is the case for lean animals. What this suggests is that by aggregating the

Page 91: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

85

measures into criteria the sensitivity of the model for the measures was diluted,

although compensation between measures was always limited.

4.3.2 Sensitivity analysis at principle level

By aggregating the criteria into principles, the sensitivity of the model to an

improvement or worsening of the values of the measures was lower than at criteria

level. Only two measures which led to important differences in the confidence intervals

of the means at criteria level also led to important differences at principle level

(confidence intervals of the differences of means in which at least one of the confidence

limits reached 10 units). These measures were manure on the body 1 (worsened) and

space allowance (improved and worsened). It can be concluded that by following a three

aggregation step the sensitivity of the model is reduced, and thus, it may be difficult to

distinguish between farms with different levels of welfare at principle level, and

furthermore, this effect can be even more marked by aggregating the four welfare

principles into an overall evaluation.

5 Conclusions

By using the MAUT, it has been proven that the main difficulties described by Botreau

et al. (2007) faced by a multi-criteria aggregation model can be solved by allowing this

method to assign different importance to the measures, by limiting the compensation

between them and by working with data collected on different types of scales.

Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining

slight differences between our results and the ones obtained by implementing the WQ

protocol, both at criteria and principle level. Thus, it can be concluded that this model

could be implemented to produce an overall assessment of animal welfare in the context

of the WQ protocol for growing pigs. Furthermore this methodology could be also used

as a framework to produce an overall assessment of welfare for other livestock species.

However, from the sensitivity analysis carried out in this study, two main points were

observed which may need to be reconsidered. First, the use of weighted sums to

aggregate moderate and sever conditions as well as pen and enrichment investigation

should be reconsidered. Second, the conversion of disease measures into ordinal scores

which makes it impossible to distinguish between farms which slightly or largely

Page 92: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

86

exceed thresholds. Finally, the suitability of the three-step aggregation process to

distinguish between farms may need to be studied further, due to the fact that by

aggregating the criteria into principles, the sensitivity of the model to an improvement

or worsening of the values of the measures was reduced due to the aggregation of the

values. Running the model on a larger scale of farms may be needed to know the actual

variation in the measures on farms. In the case of no variation between the farms at

principle level, as occurred in our observations, or at overall assessment level, the three-

step aggregation process should be reconsidered.

6 Acknowledgements

The present study is part of the PHENOMICS research project which is funded by the

German Federal Ministry of education and research.

7 References

Bana e Costa CA, de Corte JM and Vansnick JC 1999. The MACBETH approach:

Basic ideas, software, and an application, in: Meskens, N., Roubens, M., (Eds.),

Advances in Decision Analysis. Kluwer Academic Publishers, Book Series:

Mathematical Modelling: Theory and Applications, vol. 4, pp.131-157.

Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical

foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds J

Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-

technical approach for group decision support in public strategic planning: The

Pernambuco PPA case. Group decision and negotiation 23, 5-29.

Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and

Veissier I 2007. Aggregation of measures to produce an overall assessment of

animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.

Botreau R, Capdeville J, Perny P and Veissier I 2008. Multicriteria evaluation of animal

welfare at farm level: an application of MCDA methodologies. Foundations of

Computing and Decision Science 33, 1-18.

Botreau R, Veissier I and Perny P 2009. Overall assessment of animal welfare: Strategy

adopted in Welfare Quality. Animal Welfare 18, 363-370.

Page 93: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

87

Choquet G 1953. Theory of capacities. Annales de l’Institut Fourier 5, 131-295.

de Vries M, Bokkers EAM, van Schaik, G, Botreau R, Engel B, Dijkstra T and de Boer

M 2013. Evaluating results of the Welfare Quality multi-criteria evaluation model

for classification of dairy cattle welfare at herd level. Journal of Dairy Science 96,

1-10.

Farm Animal Welfare Council 1992. FAWC updates the five freedoms. The Veterinary

Record 17, 357.

Geissman JR and Schultz RD 1991. Verification and validation of expert system. In

Validating and Verifying Knowledge-Based Systems (Ed. UG Gupta), pp. 12 - 19.

IEEE Computer Society Press, Washington, USA.

Grabisch M 1996. The application of fuzzy integrals in multicriteria decision making.

European Journal of Operational Research 89, 445-456.

Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification

methods for Choquet Integral based multi-attribute utility theory, Applications of

the Kappalab R package. European Journal of Operational Research 186, 766-

785.

Harrison SR 1991. Validation of agricultural expert systems. Agricultural Systems 35,

265-285.

O’Keefe RM, Osman B and Smith EP 1991. Validating expert system performance. In

Validating and Verifying Knowledge-Based Systems (Ed. UG Gupta), pp. 2 - 11.

IEEE Computer Society Press, Washington, USA.

Keeney LR and Raiffa H 1976. Decisions with multiple objectives: Preferences and

values tradeoffs. Wiley, New York.

Mayag B, Grabisch M and Labreuche C 2010. An interactive algorithm to deal with

inconsistencies in the representation of cardinal information, in: Hüllermeier E,

Kruse R and Hoffmann F (Eds), Information processing and management of

uncertainty in knowledge-based systems. Theory and Methods. Springer, Book

Series: Communication in computer and information science, vol.80, pp. 148-157.

Mayag B, Grabisch M and Labreuche C 2011. A characterization of the 2-additive Choquet

integral through cardinal information. Fuzzy sets and Systems 184, 84-105.

Murofushi T and Sugeno M 1989. An interpretation of fuzzy measure and the Choquet

integral as an integral with respect to a fuzzy measure. Fuzzy sets and systems 29,

201-227.

Page 94: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

88

Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision

analysis. New York: John Wiley and sons.

Qureshi ME, Harrison SR and Wegener MK 1999. Validation of multicriteria analysis

models. Agricultural systems 62, 105-116.

Ramsay JO 1988. Monotone regression splines in action. Statistical Science. 3, 425-

442.

Temple D, Manteca X, Velarde A, and Dalmau A 2011a. Assessment of animal welfare

through behavioural parameters in Iberian pigs in intensive and extensive

conditions. Applied Animal Behaviour Science 131, 29-39.

Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X & Velarde A 2011b. Application

of the welfare quality protocol to assess growing pigs kept under intensive

conditions in Spain. Journal of Veterinary Behavior: Clinical Applications and

Research 6, 138-149.

Temple D, Courboulay V, Manteca X, Velarde A and Dalmau A 2012a. The welfare of

growing pigs in five different production systems: Assessment of feeding and

housing. Animal 6, 656-667.

Temple D, Courboulay C, Velarde A, Dalmau A and Manteca X 2012b. The welfare of

growing pigs in five different production systems in France and Spain:

Assessment of health. Animal Welfare 21, 257-271.

Temple D, Manteca X, Dalmau A and Velarde A 2013. Assessment of test-retest

reliability of animal-based measures on growing pig farms. Livestock Science

151, 35-45.

Tierschutzbund, D. 2013. Kriterienkatalog für eine tiergerechte haltung und behandlung

von mastschweinen im rahmen des tierschutzlabels "Für mehr tierschutz".

Deutscher Tierschutzbund e.v., Bonn.

Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical

decisions underlying the scoring of animal welfare in the Welfare Quality scheme.

Animal Welfare 20, 89–101.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Pigs. Lelystad:

Wefare Quality® Consortium.

Page 95: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

89

ANNEX 1 Aggregation of growing pigs’ welfare measures into criteria

1.1 Criterion ‘Absence of prolonged hunger’

Absence of prolonged hunger is assessed by one quantitative measure: percentage of

lean animals.

1.1.1 Welfare Quality® (WQ)

In the WQ protocol they first calculate an ‘Index’ from the % of lean animals.

Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)

producing a ‘Score’.

When I ≤ 80 then:

When I≥ 80 then:

1.1.2 Multi-attribute utility theory (MAUT)

In the present study, we calculated the utility function directly with MACBETH from

the % of lean animals. We stablished performance levels which vary in one unit

between 0 and 20% lean animals and intervals of 10 units between 20 and 100% lean

animals.

Figure 1. Utility function for lean animals calculated with MACBETH.

Page 96: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

90

1.2 Criterion ‘Absence of prolonged thirst’

Absence of prolonged thirst is assessed by 3 qualitative measures, the number of

drinking places, the functioning of the drinkers and cleanliness of the drinkers. These

measures are taken at group level.

1.2.1 Welfare Quality®

For these type of measures WQ used a lexicographic valuation tree (Figure 2). The

score attribute to the farm is equal to the worst score obtained at group level on the

condition that this represents at least 15% of the animals observed from the whole farm.

Figure 2. Lexicographic valuation tree used in the WQ protocol for Absence of

prolonged thirst criterion.

1.2.2 MAUT

In this study, the number of drinking places, the functioning of the drinkers and

cleanliness of the drinkers were defined in MACBETH as three different qualitative

measures, their performance levels were established as yes/no. In Figure 3 we can see

the MACBETH scales for each measure.

35

100

80

60

45

55

40

Score

20

Are the drinkers clean?

Are the drinkers clean?

Is the number of drinker places sufficient?

Yes

No

Are there at least 2 drinkers

available for an animal?

Are there at least 2 drinkers

available for an animal?

Are there at least 2 drinkers

available for an animal?

Yes

Are there at least 2 drinkers

available for an animal?

Yes

No

No

Yes

Yes

Yes

No

No

Yes

No

No

Page 97: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

91

Figure 3. Utilities assigned to the performance levels of Absence of prolonged thirst

criteria.

An example of eight farms was used as learning data to determine the CI aggregation

parameters (Data in Table 1). The utilities calculated with MACBETH corresponding to

the examples’ data were used as subsets to express the WQ DMs preferences (Utilities

in Table 1). The results of the aggregation of the examples’ data following the WQ

protocol were used as initial preferences in order to use the least squares based approach

for capacity identification (WQ in Table 1). In Table 2 the Shapley values and the

interaction indices for the measures are shown.

Page 98: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

92

Table 1. Absence of prolonged thirst measures’ values, individual utilities and overall

utilities for each selected farm.

Farm Data Utilities WQ Overall utility

number clean 2/animal number clean 2/animal

a Yes Yes Yes 100 100 100 100 100

b Yes Yes No 100 100 0 80 84.17

c Yes No Yes 100 0 100 60 64.17

d Yes No No 100 0 0 45 40.83

e No Yes Yes 0 100 100 55 59.17

f No Yes No 0 100 0 40 35.83

g No No Yes 0 0 100 35 30.83

h No No No 0 0 0 20 0

Table 2. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Shapley value Interaction indices

number clean 2/animal

number 0.408 - 0.75 -0.75

clean 0.358 0.75 - -0.75

2/animal 0.233 -0.75 -0.75 -

1.3 Criterion ‘Comfort around resting’

Comfort around resting is assessed by 2 measures: bursitis and manure on the body. The

measures that form this criteria have in common that they are recorded at individual

level.

1.3.1 Welfare Quality®

Briefly, in the WQ protocol for this type of measures, they first produced an ‘Index’ by

combining the percentage of animals in each severity category with a weighted sum. For

instance, for bursitis:

Page 99: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

93

Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)

producing a ‘Score’.

When Index ≤ 50 then:

When Index ≥ 50 then:

For manure on the body:

Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)

producing a ‘Score’.

In the verification step we found that the I-spline functions proposed in the WQ

protocol for manure on the body were not working properly. The formulae proposed for

this measure are the same as the ones proposed for Space allowance, so we assumed that

there was an errata in the protocol and thus we substituted this formulae for an

approximation of the I-spline function derived from the Figure proposed in the protocol

for manure on the body.

To produce the criterion score they combine the partial scores obtained with the I-spline

function for the two measures with the CI.

1.3.2 MAUT

In the present study, before determining the utility functions of bursitis and manure on

the body we produced an Index as carried out in the WQ protocol to combine the

percentage of animals with a moderate problem and the percentage of animals with a

severe problem. We implemented the same weights used in the WQ protocol. For

bursitis:

For manure on the body:

In MACBETH, the measures that form this criteria were defined as quantitative

measures. We stablished performance levels which varied in five units between 0 and

Page 100: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

94

20% animals with manure on the body, we stablished intervals of 10 units between 20

and 100% affected animals. For bursitis we stablished intervals of 10 units between 0

and 100% affected animals.

Figure 4. Utility function for bursitis calculated with MACBETH.

Page 101: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

95

Figure 5. Utility function for manure on the body calculated with MACBETH.

An example of four farms was used as learning data to determine the CI aggregation parameters (Index in Table 3). The utilities calculated with MACBETH corresponding to the examples’ data were used as subsets to express the WQ DMs preferences (Utilities in Table 3). The results of the aggregation of the examples’ data following the WQ protocol were used as the WQ DMs’ initial preferences in order to use the least squares based approach for capacity identification (WQ in Table 3). The Shapley values for each measure are shown in Table 4. Manure on the body was considered more important than bursitis. As we can also see in Table 4, all the interaction between measures were positive, thus, the measures were defined as complementary. Table 3. Comfort around resting measures data for selected farms. Measures’ values, individual utilities and overall utilities for each selected farm.

Farm Index Utility Overall

utility WQ

bursitis manure bursitis manure

a 60 40 40 60 43.2 43.2

b 50 50 50 50 50 50

c 40 60 60 40 41.4 41.4

d 25 75 75 25 28.5 28.5

Page 102: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

96

Table 4. Shapley value and interaction indices to aggregate the measures’ utilities into the criteria with the Choquet integral.

Shapley value Interaction indices

bursitis manure on the body

bursitis 0.455 - 0.77

manure on the

body 0.545 0.77

-

1.4 Criterion ‘Thermal comfort’

Thermal comfort is assessed by 3 qualitative measures, huddling, shivering and panting.

These measures are taken at group level. If no pig is displaying

huddling/shivering/panting a score of 0 is assigned to the group, if up to 20% of the

animals in the group are displaying huddling/shivering/panting a score of 1 is assigned

to the group, and if more than 20% of the animals in the group are displaying

huddling/shivering/panting a score of 2 is assigned to the group.

1.4.1 Welfare Quality®

For these type of measures WQ used a lexicographic valuation tree (Figure 6). The

score attribute to the farm is equal to the worst score obtained at group level on the

condition that this represents at least 15% of the animals observed from the whole farm.

Figure 6. Lexicographic valuation tree used in the WQ protocol for the thermal comfort

criteria.

Huddling?

Shivering?

Shivering?

Shivering?

Panting?

Panting?

Panting?

Panting?

Panting?

Panting?

Panting?

Panting?

Panting?

100

59

24

26

46

20

56

35

3

34

18

0

1

2

0

1

2

0

1

0

1

0

2

0

0

0

0

0

2

1

0

2

0

0

Score

Page 103: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

97

1.4.2 MAUT

In this study, huddling, shivering and panting were defined in MACBETH as qualitative

measures, their performance levels were established as no huddling/shivering/panting,

<20% huddling/shivering/panting and >20% huddling/shivering/panting. Figure 7

shows the MACBETH scales for each measure. An example of 11 farms was used as

learning data to determine the CI aggregation parameters (Data in Table 5). The utilities

calculated with MACBETH corresponding to the examples’ data were used to express

the WQ DMs preferences (Utilities in Table 5). The results of the aggregation of the

examples’ data following the WQ protocol were used as initial preferences in order to

use the LS based approach for capacity identification (WQ in Table 5).

Table 6 shows the Shapley values and the interaction indices for the measures.

Figure 7. Utilities assigned to the performance levels of the thermal comfort.

Page 104: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

98

Table 5. Thermal comfort. Measures’ values, individual utilities and overall utilities for

each selected farm.

Farm Data Utilities WQ Overall utiity

huddling shivering panting huddling shivering panting

a No No No 100 100 100 100 100

b No No <20% 100 100 45 59 66.72

c No No >20% 100 100 -20 24 27.39

d No <20% No 100 45 100 46 55.35

e No >20% No 100 14 100 26 30.18

f <20% No No 35 100 100 56 62.10

g <20% <20% No 35 45 100 35 39.17

h <20% >20% No 35 14 100 20 17.95

i >20% No No -5 100 100 34 38.78

J >20% <20% No -5 45 100 18 15.85

k >20% >20% no -5 14 100 3 2.92

Table 6. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria ‘Thermal comfort’ with the Choquet integral.

Shapley value Interaction indices

huddling shivering panting

huddling 0.291 - 0.394 0.188

shivering 0.406 0.394 - 0.417

panting 0.303 0.188 0.417 -

1.5 Criterion ‘Ease of movement’

Ease of movement is assessed by one quantitative measure: space allowance. Space

allowance is expressed in m2/100 kg animal.

1.5.1 Welfare Quality®

In the WQ protocol they first calculate an index from the space allowance.

Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)

producing a ‘Score’.

Page 105: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

99

When Index≤ 20 then:

When Index≥ 20 then:

1.5.2 MAUT

In this study, we calculated the utility function with MACBETH. The performance

levels of this measure were defined according to the WQ protocol, where 0.3 m2 /100

kg is considered the very minimal space allowance and 10 m2 /100 kg is considered the

maximum.

Figure 8. Utility function for space allowance.

Page 106: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

100

Since this criteria is assessed by a single measure there is no need of aggregation.

1.6 Criterion ‘Absence of injuries’

Absence of injuries is assessed by 3 measures: lameness, wounds on the body and tail

biting. The measures that form this criteria have in common that they are recorded at

individual level.

1.6.1 Welfare Quality®

Briefly, in the WQ protocol for this type of measures, particularly for lameness and

wounds on the body, they first produced an ‘Index’ by combining the percentage of

animals in each severity category with a weighted sum. For instance, for lameness:

For wounds on the body,

Afterwards they resorted this ‘Index’ to a non-linear function (l-spline function)

producing a ‘Score’. For instance, for lameness:

When Index ≤ 85 then:

When Index ≥ 85 then:

For wounds on the body:

When Index ≤ 40 then:

When Index ≥ 40 then:

Page 107: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

101

For tail biting the I-spline function is directly calculated due to the fact that just the

absence or presence of it is scored and thus there is no need of a weighted sum to

combine the scores regarding the severity of the problem.

To produce the criterion score they combine the partial scores obtained with the I-spline

function for the three measures with the CI.

1.6.2 MAUT

In this study, before determining the utility functions of lameness and wounds on the

body we produced an Index as was carried out in the Welfare quality protocol to

combine the percentage of animals with a moderate problem and the percentage of

animals with a severe problem. We implemented the same weights used in the WQ

protocol. For instance, for lameness:

For wounds on the body:

For tail biting the utility function of the percentage of animals with presence of the

problem assessed by the measure was calculated directly.

The measures that form this criteria were defined as quantitative measures in

MABETH. We stablished performance levels which vary in one unit between 0 and

10% animals with lameness, we stablished intervals of 10 units between 10 and 100%

lame animals (Figure 9). For wounds on the body we stablished intervals of 5 units

between 0 and 100% affected animals (Figure 10). For tail biting we stablished intervals

of 1 unit between 0 and 20% affected animal and intervals of 10 units between 20 and

100% affected animals (Figure 11).

Page 108: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

102

Figure 9. Utility function for lameness calculated with MACBETH.

Figure 10. Utility function for wounds on the body calculated with MACBETH.

Page 109: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

103

Figure 11. Utility function for tail biting calculated with MACBETH.

An example of 10 farms was used as learning data to determine the CI aggregation

parameters (Data in Table 7). The utilities calculated with MACBETH corresponding to

the examples’ data were used to express the WQ DMs preferences (Utilities in Table 7).

The results of the aggregation of the examples’ data following the WQ protocol were

used as the WQ DMs’ initial preferences in order to use the LS based approach for

capacity identification (WQ in Table 7).

Page 110: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

104

Table 7. Absence of injuries measures data for selected farms. Measures’ values,

individual utilities and overall utilities for each selected farm.

Farm Index Utility WQ Overall

utility Lameness Wounds Tail biting Lameness Wounds Tail biting

a 75 50 25 25 50 75 25 24.5

b 75 25 50 25 75 50 25 24.25

c 50 75 25 50 25 75 32.5 32

d 25 75 50 75 25 50 39.5 39.25

e 60 50 40 40 50 60 40 39.5

f 60 40 50 40 60 50 40 39.4

g 50 60 40 50 40 60 42.9 42.5

h 50 50 50 50 50 50 50 49.5

i 50 25 75 50 75 25 34.25 33.5

j 25 50 75 75 50 25 41.5 41

k 50 40 60 50 60 40 43.7 43.1

l 40 60 50 60 40 50 45.8 45.4

o 40 50 60 60 50 40 46.6 46.1

¹Percentage of animals affected with lameness (L) /wounds on the body (W) scored 1

²Percentage of animals affected with lameness/wounds on the body/bitten tails (BT) scored 2

The Shapley values for each measure are shown in Table 8. As we can also see in Table

8, all the interaction between measures were positive, thus, the measures were defined

as complementary.

Table 8. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Shapley value Interaction indices

lameness wounds tail biting

lameness 0.54 - 0.395 0.315

wounds 0.24 0.395 - 0.315

tail biting 0.21 0.315 0.315 -

1.7 Criterion ‘Absence of disease’

Absence of disease is assessed by 13 measures. The measures used to check this

criterion lead to data expressed on different scales.

Page 111: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

105

1.7.1 Welfare Quality®

Due to the different nature of the measures (for instance, mortality is recorded as the

percentage of mortality on farm during the last 12 months whereas coughing and

sneezing are assessed as the average frequency of coughs/sneezes per animal during 5

minutes) WQ decided to compare the data to alarm thresholds that represent the limit

between what is considered abnormal and that considered to be normal. When the

incidence observed on a measure reaches approximately half the alarm threshold, a

warning is attributed (Table 9). The measures are grouped into 6 areas. The severity of

the problem is estimated per area: if in an area, the frequency of one symptom is above

the warning threshold and the other are below, then a warning is attributed to the area; if

in an area, the frequency of one symptom is above the alarm threshold, then the alarm is

attributed to the area; if neither, there is no problem recorded.

Table 9. Warning and alarm thresholds for the absence of disease measures.

Area Symptom Warning

threshold

Alarm threshold

Respiratory area coughing (frequency per pig and 5 min) 15 46

Sneezing (frequency per pig and 5 min) 27 55

%pigs with twisted snout 1.1 3.5

%pigs pumping 1.8 5

%slaughter pigs with pleuritis 28 55

%slaughter pigs with pericarditis 5 20

%slaughter pigs with pneumonia 2.7 6

Digestive area % pigs in herd with rectal prolapse 0.7 2.5

% pens in herd with rectal faeces 6 15

Liver %slaughter pigs with white spot on the liver

(parasites)

10 23

Skin % with 10% or more skin inflamed 3.1 8

Ruptures and hernias % pigs with hernias/ ruptures not bleeding, not

touching the floor

2.4 5

% pigs with hernias/ ruptures bleeding or

touching the floor

0.6 1.5

Mortality % mortality 2.6 4.5

The number of alarms and warnings detected on a farm are calculated and they are used

to calculate an ‘Index’ with a weighted sum.

Page 112: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

106

Finally the ‘Index’ is transformed into a score using I-spline functions.

When I≤ 10 then:

When I≥ 10 then:

1.7.2 MAUT

In this study, for the measures used to check this criteria, a transformation into an

ordinal scale was carried out in a first step, before determining the utility functions. The

data was compared to the warning and alarm thresholds defined in the WQ protocol.

The measures were grouped into 6 areas, mortality, respiratory, digestive, liver, skin and

hernias. The area was attributed with a warning or an alarm when one of the measures

was above the warning or the alarm threshold. The utility function was calculated per

area. We defined the 6 disease areas as qualitative measures where the performance

levels could be no problem recorded, a warning attributed to the area and an alarm

attributed to the area. In MACBETH when the area was attributed with a warning an

utility of 40 was assigned to the area, when the area was assigned with an alarm an

utility of 0 was assigned, and when there was no problem recorded the utility assigned

to the area was 100 (Figure 12).

Figure 12. Utilities assigned to the performance levels of the Absence of disease areas.

Page 113: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

107

An example of 10 farms was used as learning data to determine the CI aggregation

parameters (Data in Table 10). The utilities calculated with MACBETH corresponding

to the examples’ data were used as subsets to express the WQ DMs (Utilities in Table

10). The results of the aggregation of the examples’ data following the WQ protocol

were used as initial preferences in order to use the LS based approach for capacity

identification (WQ in Table 10).

We found that the initial Shapley values resulted of aggregating the utilities with the CI,

varied between each area slightly, and in the WQ protocol all the areas are consider

equally important. After imposing additional constraint to the Shapley values, the

importance attached to each area was the same and the overall utility remained equal.

The interaction indices (Table 11) varied from the initial calculation of the CI and the

second constrained calculation, but in both cases all the areas were performing as

complementary measures.

Table 10. Absence of disease Measures’ values for each selected farm. Measures’

values, individual utilities and overall utilities for each selected farm.

Farm

Data WQ Overall

utility Mortality Respiratory condition

Digestive

condition

Parasites Skin

condition

Ruptures and

hernias

M1 C2 Sn2 P3 TS3 RP3 Sc4 P Sk5 H5 H6

a 0.3 5 2 0.2 0.1 0.1 2 0 0.4 0.5 0.1 99.99 100

b 0.7 12 5 0.3 0.2 0.8 3 0 1 1 0.3 83.971 83.8

c 1 14 24 1.4 1 0.6 20 0 3 2.3 0.3 74.126 73

d 1.3 16 10 0.5 0.3 0.3 6 0 1.3 1.5 0.5 69.457 69.457

e 1.8 20 16 1 0.7 0.5 10 0 2.4 2 0.8 56.380 58.297

f 2 6 24 1.4 1 0.7 12 0 9 2.4 0.9 48.418 48.418

g 3 30 38 1.8 1.3 1 10 0 3.6 3 1 34.225 41.806

h 2.6 33 42 2 1.6 1.2 16 0 4 3.2 1.1 27.937 31

i 3 37 44 6.1 2 1.5 17 0 4.3 7 1.2 16.88 14.004

j 5.3 50 46 3 2.4 1.7 18 0 9.7 3.8 1.7 7.675 3.01

¹Percentage of mortality (M) on farm during the last 12 months.

² Average frequency of cough(C)/sneezes (Sn) per animal during 5 minutes. 3Percentage of pigs with evidence of laboured breathing (P)/twisted snouts (TS)/rectal prolapse (RP) 4Percentage of pigs in herd with liquid faeces (Sc) 5Percentage of pigs scored as 2 in skin condition (Sk)/ hernias (H) 6Percentage of pigs scored as 1 in hernias

Page 114: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

108

Table 11. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Shapley value Interaction indices

Mortality Respiratory Digestive Liver Skin Hernias

Mortality 0.165 - 0.024 0.046 0.029 0.018 0.024

Respirato

ry 0.167 0.024

- 0.017 0.055 0.046 0.035

Digestive 0.168 0.046 0.017 - 0.077 0.037 0.025

Liver 0.163 0.029 0.055 0.077 - 0.056 0.049

Skin 0.166 0.018 0.046 0.037 0.056 - 0.021

Hernias 0.168 0.0214 0.035 0.025 0.049 0.021 -

1.8 Criterion ‘Absence of pain induced by management procedures’

Absence of pain induced by management procedures is assessed by 2 qualitative

measures, Castration and Tail docking. These measures are taken at farm level. The

farms are classified in relation to the presence or absence of these mutilation

procedures, and in case of presence of the procedures, the use or not of anaesthetics.

1.8.1 Welfare Quality®

For these type of measures WQ used a lexicographic valuation tree (Figure 13).

Figure 13. Tree created in the MACBETH decision support system for the criteria

Absence of pain induced by management procedures.

Page 115: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

109

1.8.2 MAUT

In this study, Castration and Tail docking were defined in MACBETH as qualitative

measures, their performance levels were established as no castration/no tail docking,

castration/tail docking with anaesthetics and castration/tail docking without

anaesthetics, according to the WQ protocol. In Figure 14 we can see the MACBETH

scales for each measure.

Figure 14. Utilities assigned to the performance levels of the Absence of pain induced

by management procedures.

An example of 9 farms was used as learning data to determine the CI aggregation

parameters (Data in Table 12). The utilities calculated with MACBETH corresponding

to the examples’ data were used to express the WQ DMs preferences (Utilities in Table

12). The results of the aggregation of the examples’ data following the WQ protocol

were used as initial preferences in order to use the LS based approach for capacity

identification (WQ in Table 12).

After an initial calculation of the CI we decided not to impose any additional constraint

for the aggregation of Absence of injuries measures since the WQ DMs preferences

were satisfied. As shown in table 4 the utilities were adjusted as much as possible to the

scores defined in the WQ protocol for this criteria for the 9 possible situations that we

can find on a farm regarding Castration and Tail docking. When adjusting the utilities to

the WQ DMs preferences the CI parameters obtained indicated that Tail docking was

considered more important that Castration, and that both measure were performing in a

complementary way (Table 13).

Page 116: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

110

Table 12. Absence of pain induced by management procedures. Measures’ values,

individual utilities and overall utilities for each selected farm.

Farm Data Utilities WQ Overall

utility Castration2 Tail docking Castration Tail Docking

a No No 100 100 100 100

b No With1 100 45 60 67.34

c No Without2 100 0 38 40.62

d With1 No 60 100 77 79.36

e With1 With1 60 45 53 51.09

f With1 Without2 60 0 35 24.37

g Without2 No 0 100 47 48.40

h Without2 With1 0 45 27 21.78

i Without2 Without2 0 0 8 0

1Castration/tail docking with anaesthesia

2Castration/tail docking without anaesthesia

Table 13. Shapley value and interaction indices to aggregate the measures’ utilities into

the criteria with the Choquet integral.

Shapley value Interaction indices

Castration Tail docking

Castration 0.461 - 0.109

Tail

docking 0.539 0.000

-

1.9 Criterion ‘Expression of social behaviours’

Expression of social behaviours is assessed by the proportion of negative behaviour out

of all social behaviour.

1.9.1 Welfare Quality®

In the WQ protocol they first calculate an index:

)

Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:

When Isb≤ 70 then:

Page 117: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

111

When Isb≥ 70 then:

1.9.2 MAUT

Here, we calculated the utility function directly with MACBETH from the proportion of

negative social behaviour out of all social behaviours. We stablished performance levels

which vary in one unit between 0 and 10%negative behaviour and intervals of 10 units

between 10 and 100% lean animals.

Figure 15. Utility function for negative behaviour calculated with MACBETH.

1.10 Criterion ‘Expression of other behaviours’

Expression of other behaviours is assessed by the percentage of active behaviour spent

in exploration of the pen and by the percentage of active behaviour spent in exploration

of the enrichment material.

Page 118: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

112

1.10.1 Welfare Quality®

In the WQ protocol they first calculate an Index:

Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:

When Iob≤ 60 then:

When Iob≥ 60 then:

1.10.2 MAUT

Here, before determining the utility function of expression of other behaviours we

produced an Index as was carried out in the Welfare quality protocol to combine the

percentage of active behaviours spent exploring the pen and the enrichment material.

We implemented the same weights used in the WQ protocol. For instance, for lameness:

Afterwards we calculated the utility function with MACBETH. We stablished

performance levels which vary in one unit between 0 and 10% of other behaviours and

intervals of 10 units between 10 and 100% of other behaviours.

Figure 16. Utility function for other behaviours calculated with MACBETH.

Page 119: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

113

1.11 Criterion ‘Good human-animal relationship’

Good human-animal relationship is assessed by the percentage of pens showing a panic

response (score 2).

1.11.1 Welfare Quality®

In the WQ protocol they first calculate an Index:

Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:

When Iob≤ 10 then:

When Iob≥ 10 then:

1.11.2 MAUT

Here, we calculated the utility function with MACBETH. We stablished performance

levels which vary in ten units between 0 and 100% of pens showing a panic response

scored 2.

Figure 17. Utility function for other behaviours calculated with MACBETH.

Page 120: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

114

1.12. Criterion ‘Positive emotional state’

Positive emotional state is assessed by the 20 measures of the Qualitative Behaviour

Assessment.

1.12.1 Welfare Quality®

In the WQ protocol, the values (between 0 and 125) are turned into an index with a

weighted sum:

With Nk, the value obtained by a farm for a given term k, and Wk, the weight attributed

to a given term k (Table 14). In the verification of the WQ protocol, we found

difficulties in the calculations of the Positive emotional state that were solved by

substituting the weight for the measure fearful with the same value but negative (-

0.00475) and by substituting the I-spline function when the Index for positive emotional

state was greater than 0 with the spline function proposed in the dairy cattle protocol for

the same criteria:

When I≤ 0 then:

Table 14. Weights used in the calculation of the Positive emotional state Index.

Measures Weights Active 0.01228 Relaxed 0.01087 Fearful -0.00475 Agitated -0.00711 Calm 0.01122 Content 0.01184 Tense -0.00971 Enjoying 0.01030 Frustrated -0.01496 Sociable 0.00544 Bored -0.01230 Playful 0.00463 Positively occupied 0.01193 Listless -0.01448 Lively 0.01002 Indifferent -0.00747 Irritable -0.00883 Aimless -0.01193 Happy 0.01193 Distressed -0.00175

Page 121: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

115

Afterwards this ‘Index’ is transformed into a ‘Score’ using l-spline functions:

When I≤ 0 then:

When I≥ 0 then:

1.12.2 MAUT

In the present study for the criterion positive emotional state the same methodology as

in the WQ protocol was implemented.

2. Aggregation of growing pigs’ welfare measures into criteria

As well as in the WQ protocol in this study Choquet integrals were used to combine the

criteria into the corresponding principles. In WQ different data sets, combining different

criteria values, were presented to panels of experts who were asked to give absolute

scores at principle level for each of the combinations. From the mean of the experts’

answers the parameters of the CI were elicited. In this section we present the different

data sets used in the WQ protocol as well as the CI parameters they obtained. We used

the same data sets and overall scores given by the WQ DM’s preferences in order to

determine the CI parameters by least squares based approach. The parameters used in

the WQ protocol as well as our parameters are presented here.

2.1 Good feeding

Table 15. Examples of scores for ‘Good feeding’ according to combinations of

Criterion scores for absence of prolonged hunger and absence of prolonged thirst.

Criteria Principle

Absence of hunger Absence of thirst Good feeding 40 60 46 50 50 50 60 40 41 75 25 28

Page 122: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

116

2.1.1 Welfare Quality

Table 16. Choquet integral capacities, Shapley values and interaction indices for

absence of prolonged hunger and Absence of prolonged thirst.

Capacity Shapley values Interaction indices Absence of hunger 0.05 0.39 - Absence of thirst 0.28 0.61 - Absence of hunger & absence of thirst - - 0.66

2.1.2 MAUT

Table 17. Choquet integral Shapley values and interaction indices for absence of

prolonged hunger and absence of prolonged thirst.

Shapley values

Interaction indices Absence of hunger Absence of thirst

Absence of hunger 0.38 - 0.64 Absence of thirst 0.62 0.64 -

2.2 Good housing

Table 18. Examples of scores for ‘Good housing’ according to combinations of

Criterion scores for comfort around resting, thermal comfort and ease of movement.

Criteria Principle

Comfort around resting Thermal comfort Ease of movement Good housing 25 50 75 35 25 75 50 34 50 25 75 37 75 25 50 38 40 50 60 44 40 60 50 44 50 40 60 45 50 50 50 50 50 75 25 34 75 50 25 37 50 60 40 44 60 40 50 45 60 50 40 45

Page 123: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

117

2.2.1 Welfare Quality

Table 19. Choquet integral capacities, Shapley values and interaction indices for

comfort around resting, thermal comfort and ease of movement.

Capacity Shapley values

Interaction indices

Comfort around resting 0.20 0.37 Thermal comfort 0.11 0.28 Ease of movement 0.16 0.35 Comfort around resting & thermal comfort 0.26 0.27 Comfort around resting &ease of movement 0.33 0.28 Thermal comfort & ease of movement 0.25 0.29 Comfort around resting & thermal comfort &ease of movement

- 0.62

2.2.2 MAUT

Table 20. Choquet integral Shapley values and interaction indices for Comfort around

resting, Thermal comfort and Ease of movement.

Shapley values

Interaction indices

Absence of

hunger Absence of

thirst Ease of

movement Comfort around resting

0.372 - 0.271 0.271

Thermal comfort 0.289 0.271 - 0.305 Ease of movement

0.339 0.271 0.305 -

Page 124: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

118

2.3 Good health

Table 21. Examples of scores for ‘Good health’ according to combinations of Criterion

scores for absence of injuries, absence of disease and absence of pain induced by

management procedures.

Criteria Principle

Absence of injuries

Absence of disease

Absence of pain induced by management procedures

Good health

25 50 75 32 25 75 50 35 50 25 75 30 75 25 50 28 40 50 60 43 40 60 50 44 50 40 60 42 50 50 50 50 50 75 25 38 75 50 25 34 50 60 40 45 60 40 50 41 60 50 40 44

2.2.1 Welfare Quality

Table 22. Choquet integral capacities, Shapley values and interaction indices for

absence of injuries, absence of disease and absence of pain induced by management

procedures.

Capacity Shapley values

Interaction indices

Absence of injuries 0.04 0.30 - Absence of disease 0.20 0.43 - Absence of pain induced by management procedure 0.09 0.27 - Absence of injuries & absence of disease 0.31 - 0.43 Absence of injuries & absence of pain induced by management procedures

0.09 - 0.33

Absence of disease & absence of pain induced by management procedures

0.20 - 0.28

Absence of injuries & absence of disease &absence of pain induced by management procedures

- - 0.73

Page 125: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

119

2.2.2 MAUT

Table 23. Choquet integral Shapley values and interaction indices for absence of

injuries, absence of disease and absence of pain induced by management procedures.

Shapley values

Interaction indices

Absence of

injuries Absence of

disease

Absence of pain induced by management

procedures Absence of injuries

0.318 - 0.448 0.188

Absence of disease

0.413 0.448 - 0.348

Absence of pain induced by management procedures

0.268 0.118 0.348 -

2.4 Appropriate behaviour

Figure 24. Examples of scores for ‘Appropriate behaviour’ according to combinations

of Criterion scores for Expression of social behaviours, Expression of other behaviours,

Good human-animal relationship and Positive emotional state.

Criteria Principle

Expression of social

behaviours

Expression of other

behaviours

Good human-animal relationship

Positive emotional state

Appropriate behaviour

35 35 65 65 42 35 50 50 65 44 35 50 65 50 42 35 65 35 65 40 35 65 50 50 42 35 65 65 35 39 50 35 50 65 44 50 35 65 50 43 50 50 35 65 46 50 50 50 50 50 50 50 65 35 43 50 65 35 50 45 50 65 50 35 43 65 35 35 65 43 65 35 50 50 45 65 35 65 35 40 65 50 35 50 47 65 50 50 35 46 65 65 35 35 42

Page 126: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

120

2.2.1 Welfare Quality

Table 25. Choquet integral capacities, Shapley values and interaction indices for

Expression of social behaviours, Expression of other behaviours, Good human-animal

relationship (HAR) and Positive emotional state.

Capacity Shapley values

Interaction indices

Social behaviours 0.17 0.31 - Other behaviours 0.01 0.23 - Human-animal relationship 0.01 0.19 - Positive emotional state 0.10 0.27 - Social behaviours & Other behaviours 0.22 - 0.14 Social behaviours & HAR 0.17 - 0.06 Social behaviours & Positive emotional state 0.27 - 0.09 Other behaviours & HAR 0.13 - 0.14 Other behaviours & Positive emotional state 0.18 - 0.14 HAR & Positive emotional state 0.22 - 0.12 Social behaviours & Other behaviours& HAR 0.53 - 0.07 Social behaviours & Other behaviours& Positive emotional state

0.63 -

0.11

Social behaviours & Positive emotional state & HAR 0.52 - 0.00 Other behaviours & HAR& Positive emotional state 0.48 - -0.05 Social behaviours & Other behaviours& HAR& Positive emotional state

- -

-0.25

2.2.2 MAUT

Table 26. Choquet integral Shapley values and interaction indices for Expression of

social behaviours, Expression of other behaviours, Good human-animal relationship and

Positive emotional state.

Shapley values

Interaction indices

Social

behaviours Other

behaviours Good human-

animal relationship

Positive emotional

state Social behaviours

0.325 - 0.182 0.112 0.173

Other behaviours

0.242 0.182 - 0.154 0.149

Good human-animal relationship

0.177 0.112 0.154 - 0.089

Positive emotional state

0.254 0.173 0.149 0.089 -

Page 127: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

121

GENERAL DISCUSSION

The main aim of the present study was to develop a multi-criteria evaluation system to

assess animal welfare. Thereby, the welfare assessment of growing pigs proposed by

Welfare Quality® was used as a framework to develop the multi-criteria methodology.

A comparison of different multi-criteria methods indicated MACBETH and the

Choquet integral (CI) in the context of the multi-attribute utility theory (MAUT) to be

the best suitable methodology to solve the main problems faced by a multi-criteria

evaluation system for animal welfare. Therefore, MACBETH and the CI were used

throughout this thesis.

General methodology

The main difficulties faced by a multi-criteria evaluation system for animal welfare are

that data is collected on different types of scales, criteria may have different levels of

importance, and interactions may exist between them – this being a key aspect that

welfare criteria may not fully compensate each other (Botreau et al., 2007b).

Accordingly, a comparison of different multi-criteria methods which could be applied to

animal welfare was carried out in Chapter One of this thesis. As a result, the use of

MACBETH together with the CI in the context of the MAUT was identified as the best

suitable methodology to assess animal welfare. The use of MACBETH, compared to

other techniques for utility function determination, as the standard sequence method or

the I-spline function proposed in the WQ protocol presented several advantages. First,

by using MACBETH, the utility function determination process remained more

transparent, which can help the stakeholders gain confidence in the model. Second, the

use of MACBETH could help to facilitate consensus between stakeholders (Parnell et

al., 2013, Bana e Costa et al., 2014), which is one of the difficulties when panels of

different DMs are consulted to determine the utility functions and the aggregation

parameters. Third, by using MACBETH, it is easier to judge the different attractiveness

of options with an increasing number of criteria, due to its interactive software and due

to the use of qualitative judgments, and moreover, a scale of indifferent categories

(‘very weak’, ‘weak’, ‘moderate’, ‘strong’, ‘very strong’ or ‘extreme’), Bana e Costa et

al. (2004). Fourth, MACBETH allows for a comparison of not only quantitative

Page 128: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

122

performance levels but qualitative performances too, with no need for a previous

conversion of the qualitative scales into a quantitative scale, allowing a solution to one

of the problems presented by Botreau et al. (2007b). Finally, the assessment remains

more flexible. With this method, all the parameters can be changed according to new

scientific knowledge (inclusion or exclusion of measures based on new studies on their

influence in animal welfare), due to changes in societal expectations (if the welfare of

animals improves significantly on all farms, stakeholders may want to be more selective

when considering a farm as excellent), etc. The main drawback in using MACBETH is

related to the implementation of the M-MACBETH software, as it does not allow the

possibility of exporting the utility functions formulae to other environments, while

typing the information into the software can be indeed extremely tedious when working

with large amounts of data.

The use of the CI as an aggregator presented an important advantage with regard to

other methods proposed for the overall evaluation of animal welfare, which only allow

the user or investigator to assign different importance to the measures/criteria, such as

sum of ranks and sum of scores (Botreau et al., 2007a). It allowed interaction between

measures to be taken into account, thus allowing the possibility to limit the interaction

between them, and in this way, solving one of the main problems described by Botreau

et al. (2007b). The CI was also used in the WQ protocol for the aggregation of some

measures into criteria and for the aggregation of criteria into principles (Welfare

Quality, 2009). The main difficulty in implementing the least squares-based approach

for CI capacity identification is that it depends on information which the DM cannot

always provide, such as the overall scores for each criteria (Grabisch et al., 2008). Due

to the fitting of our results in accordance with the WQ DMs’ preferences, the results

obtained from the WQ model were used as initial preferences, thus avoiding this issue.

However, following the study of Merad et al. (2013), in other circumstances, it may be

difficult for the DMs to provide overall scores. Nevertheless, there are easier methods

for capacity identification proposed in the literature, such as the minimum variance

approach, which requires only a partial order over the farms as preference information.

In order to apply this methodology in the framework of the WQ protocol we found

some key points to be taken into account.

Page 129: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

123

Use of weighted sums

According to the WQ protocol, weighted sums were used for some measures before

determining the utility functions. Weighted sums were used to aggregate the moderate

and severe conditions for bursitis, manure on the body, lameness and wounds on the

body. Pen investigation and enrichment investigation were also aggregated with a

weighted sum before determining the utility function for exploratory behaviour.

Although it was emphasised throughout the development of the WQ model that welfare

scores should not compensate each other (Botreau et al., 2007b and Veissier et al.,

2011), as shown in Chapter Two by means of small examples, compensation occurred

in the first stages by using linear combinations, which were both used in the WQ

protocol and in this alternative methodology. The extent of these problem was estimated

in Chapter Three, in which a sensitivity analysis was performed in order to demonstrate

the relative importance of welfare measures in the different steps of the multi-criteria

aggregation process. Although the severe conditions were assigned with a higher value

in the weighted sum than the moderate conditions, a higher influence of the severe

conditions to modify the level of welfare at criteria or principle level was not found.

This could be explained by two main facts. First, the severe conditions had low

variations between farms, and thus the first and the third quartiles were not

representative of an improvement in or a worsening of the level of welfare. Second,

compensation between severity measures occurred and did not allow the model to

distinguish between small variations in the severe conditions. Providing an individual

utility function for each severity measure and aggregating them afterwards by using the

CI instead of aggregating them with a weighted sum could prove to be an alternative

solution. On one hand, the compensation issue would be avoided increasing the

sensitivity of the model, but on the other, so would the complexity of the decision

process demanding from the DMs that they interpret a higher number of measures in

terms of welfare.

Conversion to ordinal scores

Due to the different nature of the disease measures (for instance, mortality is recorded

as the percentage of mortality on farm during the last 12 months, whilst coughing and

sneezing are assessed as the average frequency of coughs/sneezes per animal over 5

minutes), WQ decided to compare the disease data with alarm thresholds which

Page 130: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

124

represent the limit between what is considered abnormal and what is considered normal.

When the incidence observed for a measure reaches approximately half the alarm

threshold, a warning is attributed. The measures are grouped into six areas: mortality,

respiratory, digestive, liver, skin and hernias. The severity of the problem is estimated

per area: if the frequency of one symptom within an area is above the warning threshold

and the other is below, a warning is attributed to the area. On the other hand, if the

frequency of one symptom within an area is above the alarm threshold, the alarm is

attributed to the area; if neither occurs, no problem is recorded. In order to simulate the

WQ DMs’ preferences, we compared the data for the absence of disease measures with

the warning and alarm thresholds established in the protocol. However, in the

development of the methodology in Chapter Two, by converting the original,

quantitative data into an ordinal scale (three qualitative levels: no problem recorded, a

warning or an alarm), it was impossible for the model to distinguish between herds

which slightly or greatly exceeded the thresholds. Furthermore, to stay in line with the

WQ protocol preferences, we decided to create a utility function per area rather than

calculate a utility per measure. Following this methodology allows large compensation

between disease measures per area. For instance, for the respiratory area, a farm with

only one measure of the respiratory area (for example pneumonia) assigned with a

warning is assigned with a warning in this area, as well as a farm which has the six

measures of the area (pneumonia, pleuritis, pericarditis, laboured breathing, coughing

and sneezing) is also assigned with a warning. What we can conclude from the

sensitivity analysis carried out in Chapter Three is that due to the comparison of the data

to warning and alarm thresholds and due to the compensation between measures in

between the disease areas the original values at criteria level for absence of disease only

changed when alarm or warning thresholds were reached. Due to this fact, the model

was only sensitive when the number of warnings or alarms were changed by improving

or worsening the measures values. Thus, the model was only sensitive to large

variations in the measures data, which makes it difficult to distinguish between different

levels of welfare between farms. Furthermore, conversion to an ordinal scale and

compensation of measures between each disease area are crucial points which might be

reconsidered, and the measures should be treated as quantitative, using the warning and

alarm thresholds as references for the DMs to build utility functions per measure instead

of per area.

Page 131: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

125

Learning data

Small datasets were used as learning data to determine the CI aggregation parameters, to

aggregate the measures into criteria and the criteria into principles. In Chapter Two and

Chapter Three we show that our results perfectly fit the WQ results for the criteria

assessed by just one welfare measure, such as absence of hunger, space allowance,

social behaviour and positive emotional state. The results were also completely in

accordance with the WQ results for exploratory behaviour, which although assessed by

two measures, they were combined using a weighted sum in both methodologies before

determining the utility and the I-spline functions. From this, it can be concluded that the

utility functions determined in MACBETH perfectly fitted the I-spline functions

proposed in the WQ protocol. There were small differences for the qualitative criteria

(absence of thirst, thermal comfort and absence of pain induced by management

procedures) and almost no differences between the methods although the methodologies

used in the WQ and the MAUT were very different. This was due to the fact that the

datasets used to determine the aggregation parameters of the CI covered all the possible

scenarios found on a farm, and thus, once the model was adjusted, there could be no

further variations.

However, differences were found for the criteria comfort around resting and absence of

injuries, which are assessed by several measures. These differences appear to be related

to the aggregation step, not with the utility function determination, since the utility

functions determined in MACBETH perfectly fitted the I-spline functions proposed in

the WQ protocol, also for the measures which form these criteria.. Two key points were

identify in the aggregation step. First, the selection of the learning data was found to be

the most important step in the determination of the parameters of the CI. It has to be

representative of all posible scenarios found on farms, otherwise, when these parameters

are implemented in a large dataset, the results may not be in accordance with the DMs’

preferences. In Chapter Three, the learning data used to determine the CI parameters for

absence of injuries was modified since large differences between our method and the

WQ method were found when the parameters determined in Chapter Two were applied

in order to aggregate the absence of injuries data for the 44 observations. By selecting

the learning data more carefully we could better approach the WQ DMs’ preferences,

and the differences between the methods were minor. Second, although the differences

of the CI parameters derived from our learning data and the CI parameters used in the

Page 132: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

126

WQ protocol for comfort around resting and absence of injuries were minor, differences

between the methods occurred when these parameters were implemented to aggregate

the data of the 44 observations. Although the differences were minor, this highlights the

importance of the aggregation of the parameters, even though varying them slightly can

produce differences in the results.

Further development and prospects

By using the MAUT, it has been proven that the main difficulties described by Botreau

et al. (2007b) faced by a multi-criteria aggregation model can be solved by allowing this

method to assign different importance to the measures, by limiting the compensation

between them and by working with data collected on different types of scales.

Furthermore, the model’s flexibility allowed us to fit the WQ assessment, obtaining

small differences between our results and the ones obtained by implementing the WQ

protocol, both at criteria and principle level. Thus, it can be concluded that this model

could be implemented to produce an overall assessment of animal welfare in the context

of the WQ protocol for growing pigs. Furthermore this methodology could be also use

as a framework to produce an overall assessment of welfare for other livestock species.

However, from the sensitivity analysis carried out in this study, two main points were

observed which may need to be studied further. First, it was found that the model was

not sensitive to variations in some measures at criteria level. The low variation of some

measures could explain the low influence on improving or worsening the welfare both

at the criteria and principle levels of these measures due to the fact that the first and the

third quartiles of the measures were not representative of an improvement in or a

worsening of the level of welfare. Comparable values for some measures were found in

other studies, for instance, the study of Temple et al. (2011), and thus, we could assume

that these measures have a low influence in improving or worsening the level of welfare

due to the general low variance of these measures in farms (comparable to other

studies). However, it may be necessary to run observations on a larger scale on farms to

obtain more information on the distribution of the measures. The second key point was

that the sensitivity of the model to an improvement or worsening of the values of the

measures was lower than at criteria level by aggregating the criteria into principles.

Thus, in order to prove if the three-step aggregation process is suitable to distinguished

between farms, running the model on a larger scale of farms may be needed to know the

Page 133: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

127

actual variation in the measures on the farms. In the case of no variation between the

farm at principle level, as occurred in our observations, or at overall assessment level,

the three-step aggregation process should be reconsidered.

References

Bana e Costa CA, de Corte JM and Vansnick JC 2004. On the mathematical

foundations of MACBETH. In MCDA, Multiple Criteria Decision Analysis (eds.

J Figueira, S Greco and M Ehrgott), pp. 409 - 442. Kluwer Academic Publishers,

Dordrecht, Netherlands.

Bana e Costa CA, Lourenço JC, Oliveira MD and Bana e Costa JC 2014. A socio-

technical approach for group decision support in public strategic planning: The

Pernambuco PPA case. Group decision and negotiation. 23, 5-29.

Botreau R, Bonde M, Butterworth A, Perny P, Bracke MBM, Capdeville J and Veissier

I 2007a. Aggregation of measures to produce an overall assessment of animal

welfare. Part 1: A review of existing methods. Animal 1, 1179-1187.

Botreau R, Bracke MBM, Perny P, Butterworth A, Capdeville J, van Reenen CG and

Veissier I 2007b. Aggregation of measures to produce an overall assessment of

animal welfare. Part 2: Analysis of constraints. Animal 1, 1188-1197.

Grabisch M, Kojadinovic I and Meyer M, 2008. A review of capacity identification

methods for Choquet Integral based multi-attribute utility theory, Applications of

the Kappalab R package. European Journal of Operational Research 186, 766-

785.

Parnell GS, Brensik TA, Tani SN and Johnson ER 2013. Handbook of decision

analysis. New York: John Wiley and sons.

Temple D, Dalmau A, Ruiz de la Torre JL, Manteca X, Velarde A 2011. Application of

the Welfare Quality® protocol to assess growing pigs kept under intensive

conditions in Spain. Journal of Veterinary Behaviour 6, 138-149.

Veissier, I., K. K. Jensen, R. Botreau, and P. Sandoe. 2011. Highlighting ethical

decisions underlying the scoring of animal welfare in the Welfare Quality scheme.

Animal Welfare 20, 89–101.

Welfare Quality 2009. Welfare Quality® Assessment Protocol for Fattening Pigs.

Lelystad: Wefare Quality® Consortium.

Page 134: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

128

GENERAL SUMMARY

Consumers’ concern about livestock living conditions has increased considerably in the

last few years. These consumers’ preferences create economic incentives for

stakeholders to meet animal welfare standards, as established by legislation or voluntary

certification schemes. It is a generally accepted fact that animal welfare is a multi-

dimensional concept and due to this fact, a multi-criteria evaluation model is required

for the assesment of an animal unit. Therefore, the current study deals with the

development of a multi-criteria evaluation system to assess animal welfare on farms,

based on the Welfare Quality® (WQ) protocol, with an example of growing pigs’

welfare assessment. In this regard, its main objective was to find a more transparent and

flexible methodology than the one proposed in the WQ protocol while solving the main

difficulties that such a model faces, which are that criteria may have different

importance, and interactions may exist between them, this being a key aspect that the

welfare criteria may not fully compensate for each other.

The Multi-attribute Utility Theory (MAUT) was applied in this study. A comparison of

different MAUT methods was provided in Chapter One. A theoretical model of a

welfare assessment for growing pigs was used considering only four criteria, good

feeding, good housing, good health and appropriate behaviour. Data for growing pig’s

farms was generated, with each farm receiving one score for each welfare criteria. Ten

farms were used as learning data and the complete dataset generated was used to

exemplify the differences between the methods. The utility functions and the

aggregation functions were constructed in two separated steps. Two utility function

determination methods (the standard sequences method and the MACBETH method),

and two aggregation functions (the weighted sum and the Choquet integral (CI)) were

compared. The utilities derived from MACBETH allowed us to model more adequately

the preferences of the decision-maker regarding the different importance of the criteria

and the interaction between them. A comparison of the weighted sum and the CI results

obtained from each method was carried out. The results showed that there were

interactions between the criteria, assuming independence among the criteria (weighted

sum) led to important differences in the classification of the farms. The use of the

MACBETH method together with the CI seemed to be the model which better solved

the difficulties presented before.

Page 135: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

129

In Chapter Two, the application of the MACBETH method together with the CI based

on a real welfare assessment, such as the WQ protocol for growing pigs, was presented

by means of examples. The WQ decision-makers’ preferences were fit to construct the

utility functions and to determine the CI parameters. Throughout this study the different

multi-criteria methods used in the WQ protocol were compared with the unique

methodology proposed in this study. The flexibility of the MAUT model allowed us to

fit the WQ assessment, obtaining results that were comparable to the ones obtained by

implementing the WQ protocol. Additionally, this flexibility allows the possibility of

modify the model, according, for instance, to new scientific knowledge. Due to the use

of an interactive approach like MACBETH the model remained more transparent for

stakeholders than the model proposed by WQ.

After the development of any multi-criteria evaluation system, a validation of the model

must be carried out in order to prove that it works as intended in practical conditions. In

Chapter Three, the MAUT methodology proposed above was implemented to

aggregate welfare data which was collected in different growing pig farms in

Schleswig-Holstein, Germany. In total, 44 visits were carried out. The whole WQ

assessment protocol for growing pig farms was implemented in each visit. The results

obtained for each observation were compared with the results obtained by implementing

the multi-criteria methodology proposed in the WQ protocol. Also, the influence of

variations in the welfare measure values was estimated in order to assess the sensitivity

of the model. Using the MAUT, similar results were obtained to the ones obtained

applying the WQ protocol aggregation methods, both at criteria and principle level.

Two main facts can be concluded from the sensitivity analysis, first, a limited number

of measures had a strong influence on improving or worsening the level of welfare at

criteria level and second, the MAUT model was not very sensitive to an improving or a

worsening of single welfare measures at principle level.

The findings of this study indicate that the MAUT model could be implemented to

produce an overall assessment of animal welfare in the context of the WQ protocol for

growing pigs. Furthermore this methodology could also be used as a framework to

produce an overall assessment of welfare for other livestock species. However, the use

of weighted sums and the conversion of disease measures into ordinal scores should be

reconsidered. Additionally, it may be necessary to run observations on a larger scale of

Page 136: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

130

farms to obtain more information about the distribution of the welfare measures and the

sensitivity of the model.

Page 137: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

131

ZUSAMMENFASSUNG

In den letzten Jahren rückten die Haltungsbedingungen von Nutztieren und damit die

Frage nach dem Tierwohl vermehrt in den Fokus der Verbraucher. Dabei schaffen die

gesellschaftlichen Präferenzen ökonomische Anreize für Interessengruppen, die

gesetzlich vorgeschriebenen oder freiwillig angesetzten Qualitätsstandards in Bezug auf

das Tierwohl einzuhalten. Aufgrund der Vielzahl an Einflüssen, die das Tierwohl

bedingen, wird eine Multi-Criteria-Analyse zur Bewertung eines tierhaltenden Betriebes

notwendig. Daher beschäftigt sich die vorliegende Studie mit der Entwicklung eines

mehrfaktoriellen Bewertungssystems zur Einschätzung des tierischen Wohlbefindens

auf Betrieben. Der Ansatz basiert auf dem Welfare Quality® (WQ)-Protokoll für

Mastschweine. Das Hauptziel dieser Arbeit war es, eine transparentere und flexiblere

Methode als die dem WQ-Protokoll zugrunde liegende zu entwickeln. Dabei sollte in

erster Linie Beachtung finden, dass Kriterien innerhalb des Protokolls eine

unterschiedliche (kontrollierbare) Gewichtung annehmen können und eine

Kompensation zwischen einzelnen Kriterien begrenzt ist.

In dieser Studie wurde die Multi-Attribute Utility Theorie (MAUT) verwendet. Das

erste Kapitel beinhaltet den Vergleich verschiedener MAUT-Methoden. Unter

Einbeziehung der vier Kriterien Fütterung, Haltungsbedingungen, Gesundheit und

Verhalten wurde hierfür ein theoretisches Modell der Einschätzung des Tierwohls für

Mastschweine genutzt. Die Daten wurden für Schweinemastbetriebe generiert, wobei

jeder Betrieb eine Bewertung für die vier Kriterien erhielt. Zehn Betriebe dienten als

Lernstichprobe und der komplette Datensatz wurde dazu genutzt, die Unterschiede

zwischen den einzelnen Methoden herauszustellen.

Dabei erfolgte die Bildung der Nutzenfunktion und der Funktion für Aggregation in

zwei getrennten Schritten. Zwei Methoden zur Bestimmung der Nutzenfunktion, die

Standard Sequences Methode und die MACBETH-Methode, sowie zwei Funktionen für

Aggregation (die gewichtete Summe und das Choquet Integral (CI)) wurden einem

Vergleich unterzogen. Die aus der MACBETH-Methode abgeleitete Nutzenfunktion

ermöglicht es, die Präferenzen des Entscheidungsträgers in Bezug auf die

unterschiedliche Gewichtung der Kriterien und deren mögliche Interaktionen in

angemessenerer Weise abzubilden. Ein Vergleich der gewichteten Summe mit den

Ergebnissen des CI wurde vorgenommen, wobei die Ergebnisse Interaktionen zwischen

Page 138: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

132

den Kriterien bestätigten. Setzt man die Unabhängigkeit zwischen den Kriterien voraus

(gewichtete Summe), führte dies zu entscheidenden Unterschieden in der Bewertung der

Betriebe. Als Konsequenz aus diesen Resultaten wurde im Folgenden die MACBETH-

Methode in Kombination mit dem CI angewendet.

Im zweiten Kapitel, wurde die Anwendung der MACBETH-Methode in Kombination

mit dem CI basierend auf dem WQ-Protokoll für Mastschweine mit Hilfe von

Beispielen untersucht. Die Präferenzen der WQ Entscheidungsträger wurden angepasst,

um die Nutzenfunktion zu erstellen und die Parameter des CI zu ermitteln. Die

verschiedenen Multi-Criteria Methoden des WQ-Protokolls wurden mit der in dieser

Arbeit vorgestellten Methode verglichen. Die Flexibilität der MAUT-Methode erlaubte

eine Anpassung an das WQ-Protokoll, was zu vergleichbaren Ergebnissen der beiden

Methoden führte. Zudem erlaubt es eine flexiblere Anpassung an sich ändernde

Voraussetzungen. Aufgrund der Anwendung eines interaktiven Ansatzes bleibt die

MACBETH-Methode transparenter für Interessengruppen gegenüber dem Modell,

welches vom WQ vorgeschlagen wird.

Nach der Entwicklung eines mehrdimensionalen Bewertungssystems muss die

Validierung des Modells folgen, um dessen Praxistauglichkeit zu überprüfen. Im

dritten Kapitel wird die im vorangegangenen Abschnitt vorgeschlagene MAUT-

Methode eingesetzt, um Daten in Bezug auf das Tierwohl zu aggregieren. Dazu

erfolgten insgesamt 44 Besuche auf Mastbetrieben in Schleswig-Holstein, Deutschland,

bei denen das gesamte WQ-Protokoll für Mastschweine angewendet wurde. Die

erzielten Resultate wurden mit den Ergebnissen der Multi-Criteria-Analyse des WQ

verglichen. Darüber hinaus wurde der Einfluss der Variation der Messwerte zur

Bewertung des Wohlbefindens geschätzt, um die Sensitivität des Modells ableiten zu

können. Aus der Verwendung von MAUT ergaben sich sowohl auf der Ebene der

Kriterien als auch für die Prinzipien ähnliche Ergebnisse wie beim Einsatz der

Aggregierungsmethode des WQ-Protokolls. Zwei wesentliche Fakten können aus der

Sensitivitätsanalyse abgeleitet werden. Auf Kriterienebene zeigte sich, dass nur wenige

Tierwohlindikatoren einen deutlichen Einfluss auf die Bewertung des Tierwohls haben,

während auf Prinzipienebene eine Verbesserung oder Verschlechterung einzelner

Indikatoren sich kaum auf die Bewertung des Tierwohls auswirken.

Die Ergebnisse aus dieser Studie zeigen, dass die Nutzung gewichteter Summen und die

Umwandlung krankheitsassoziierter Merkmale in Ordinalskalen überdacht werden

Page 139: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

133

sollten. Darüber hinaus sollte diese Studie mit einer größeren Anzahl an Betrieben

durchgeführt werden, um weitere Informationen über die Verteilung der

Tierwohlindikatoren und der Sensitivität des Modells zu erhalten. Dennoch zeigte sich,

dass das MAUT-Modell eingesetzt werden kann, um eine generelle Einschätzung des

Wohlbefindens von Mastschweinen in Bezug auf das WQ-Protokoll zu gewinnen.

Zudem kann die vorgestellte Methode auch für die Bewertung des Tierwohls bei

anderen Nutztierspezies angewendet werden.

Page 140: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

134

ACKNOWLEDGMENTS

At this point I would like to thank all those who have contributed in various ways

to my

research.

First of all I want to thank my supervisor Prof. Joachim Krieter for the

opportunity he gave me, for his support and his belief in me, for his time and

effort, thank you for everything.

I warmly thank Carlos Buxadé for embracing me in Madrid. Thanks to Antonio

Callejo, Martina Pérez and Andrea Luciana do Santos for all the coffees and

touching conversations we shared.

I am also indebted to my co-authors and colleagues from my working group for

the valuable contribution to my papers and all the lively discussions. Above all, I

would like to mention Imke Trauslen, Kathrin Büttner and Irena Czycholl.

My research was made possible through the financial support I received from

the German Federal Ministry of Education and Research (BMBF) within the

PHENOMICS research project.

I also want to thank my fellow PhD students for regular lunch times, for quality

times on courses and conferences, for their companionship and for their

friendship. My special thanks go to Julia Aulrich, Birte Tietgen, Christina Veit,

Anita Ehret and Karo Reckmann.

Finally, I would like to thank my family who always encouraged me to pursuit my

aims and supported me wherever they could. Last but not least this PhD would

not have been possible without Julia Kreuer and Gloria Heredia. Thank you for

your deep friendship and your assistance in all situations of life. Finally, I want

to thank Ignacio Santa-Cruz Rubio for being my person, my partner in crime

and for giving me unconditional support during the whole PhD.

Page 141: DEVELOPMENT OF A MULTI-CRITERIA EVALUATION SYSTEM TO ...

135

CURRICULUM VITAE

GENERAL INFORMATION

Name: Paula Martín Fernández

Date of Birth: 31.July.1986 in Madrid

Nationality: Spanish

EDUCATION

2004-2010: AGRONOMIC ENGINEERING

POLYTECHNIC UNIVERSITY OF MADRID.

RELEVANT WORK EXPERIENCE

Since 2011: PHD STUDENT

INSTITUT FÜR TIERZUCHT UND TIERHALTUNG, CAU