Download Learning to Adapt in Dialogue Systems: Data-driven Models for Personality Recognition and ... PDF

TitleLearning to Adapt in Dialogue Systems: Data-driven Models for Personality Recognition and ...
File Size2.4 MB
Total Pages285
Table of Contents
                            1 Introduction
	1.1 Modelling the user's linguistic variation
	1.2 Generating linguistic variation to the user
	1.3 Dimensions of linguistic variation
		1.3.1 Definitions of linguistic style
		1.3.2 Factors affecting linguistic style Formality Politeness Dialects and sociolects Personality
	1.4 Motivation for personality-based dialogue modelling
		1.4.1 Recognising the user's personality
		1.4.2 Controlling the system's personality
	1.5 Research hypotheses
		1.5.1 Boundaries
	1.6 Contributions and organisation of the thesis
2 Background
	2.1 Elements of personality psychology
		2.1.1 The main dimensions of personality
		2.1.2 Biological causes
	2.2 Language and personality
		2.2.1 Markers of extraversion
		2.2.2 Markers of other Big Five traits
	2.3 User modelling in dialogue
		2.3.1 Individual preferences
		2.3.2 Expertise
		2.3.3 Personality
	2.4 Modelling individual differences in natural language generation
		2.4.1 Early work: eliza and parry
		2.4.2 The standard NLG architecture
		2.4.3 Template and rule-based stylistic generation Pragmatic effects Linguistic style Politeness Personality and embodied conversational agents
		2.4.4 Data-driven stylistic generation Overgenerate and select methods Direct control of the generation process
	2.5 Summary
I Recognising the User's Personality in Dialogue
	3 Personality Recognition from Linguistic Cues
		3.1 Adapting to the user's personality
		3.2 Experimental method
			3.2.1 Sources of language and personality
			3.2.2 Features Content and syntax Utterance type Prosody
			3.2.3 Correlational analysis
			3.2.4 Statistical models
		3.3 Classification results
			3.3.1 Essays corpus
			3.3.2 EAR corpus
			3.3.3 Qualitative analysis
		3.4 Regression results
			3.4.1 Essays corpus
			3.4.2 EAR corpus
			3.4.3 Qualitative analysis
		3.5 Ranking results
			3.5.1 Essays corpus
			3.5.2 EAR corpus
			3.5.3 Qualitative analysis
		3.6 Discrete personality modelling in related work
		3.7 Discussion and summary
II Generating a Recognisable System Personality
	4 From Personality Markers to Generation Decisions
		4.1 Personality marker studies
			4.1.1 Sources of language
			4.1.2 Personality assessment methods
		4.2 NLG parameter mapping
		4.3 Extraversion
		4.4 Emotional stability
		4.5 Agreeableness
		4.6 Conscientiousness
		4.7 Openness to experience
		4.8 Summary
	5 Implementing Personality Markers in a Generator
		5.1 Framework overview
		5.2 Projecting personality in a specific domain
		5.3 Input structure
		5.4 Personage's architecture
		5.5 Implementation of generation decisions
			5.5.1 Content planning
			5.5.2 Syntactic template selection
			5.5.3 Aggregation
			5.5.4 Pragmatic marker insertion
			5.5.5 Lexical choice
			5.5.6 Surface realisation
		5.6 Summary
	6 Psychologically Informed Rule-based Generation
		6.1 Methodology
		6.2 Human evaluation
		6.3 Results
		6.4 Summary
	7 Stochastic Generation Capabilities
		7.1 Generation coverage and quality
			7.1.1 Ratings distribution Comparison with the rule-based approach
			7.1.2 Inter-rater agreement
			7.1.3 Naturalness
		7.2 Feature analysis
			7.2.1 Generation decisions
			7.2.2 Content-analysis features
			7.2.3 N-gram features
		7.3 Discussion and summary
	8 Generation of Personality through Overgeneration
		8.1 Methodology
		8.2 Statistical models
		8.3 Results with in-domain models
			8.3.1 Modelling error Discussion Modelling error distribution
			8.3.2 Sampling error
			8.3.3 Psychologically informed selection models
		8.4 Results with out-of-domain models
			8.4.1 Out-of-domain model accuracy
			8.4.2 Domain adaptation
		8.5 Summary
	9 Generation of Personality through Parameter Estimation
		9.1 Methodology
			9.1.1 Pre-processing steps
			9.1.2 Statistical learning algorithms
			9.1.3 Qualitative model analysis
			9.1.4 Model selection
			9.1.5 Generation phase
		9.2 Large-scale evaluation
			9.2.1 Evaluation method
			9.2.2 Evaluation results
			9.2.3 Comparison with rule-based generation
			9.2.4 Perception of fine-grained variation
			9.2.5 Inter-rater agreement
			9.2.6 Naturalness evaluation
			9.2.7 Socio-cultural analysis
		9.3 Discussion and summary
	10 Discussion and Conclusion
		10.1 Contributions of this thesis
		10.2 Generalisation to other domains
		10.3 Future research
		10.4 Conclusion
	A Utterances Generated using Personage-RB
	B Utterances Generated using Random Parameters
	C Utterances Generated using Personage-PE
Document Text Contents
Page 2

Learning to Adapt in Dialogue Systems:

Data-driven Models for Personality

Recognition and Generation

François Mairesse

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

Department of Computer Science

University of Sheffield, United Kingdom

February 2008

Page 142

6.3. Results 134

perience is the hardest trait to convey in our domain, with a rating difference of

1.32 between the utterance sets. This difference, however, is still largely significant

(p < .001) despite the small number of ratings.

Personality trait Low High
Extraversion 2.96 5.98
Emotional stability 3.29 5.96
Agreeableness 3.41 5.66
Conscientiousness 3.71 5.53
Openness to experience 2.89 4.21

Table 6.3: Average personality ratings for the utterances generated with the low
and high parameter settings for each trait on a scale from 1 to 7. The ratings of the
two extreme utterance sets differ significantly for all traits (p < .001, two-tailed).

Inter-rater agreement

Table 6.4 shows that the judges agree significantly for all Big Five traits, although

they agree more for some traits than others. The highest agreement is observed for

extraversion and emotional stability (r = .73 and r = .67), and the lowest for con-

scientiousness and openness to experience (r = .42 and r = .44). Unsurprisingly,

traits that are recognised more accurately produce a higher agreement, suggest-

ing that it is easier to agree on utterances expressing an extreme personality. This

level of agreement is only slightly lower than the one observed for conversation

extracts in the personality recognition task studied in Chapter 3 (r = .84), which is

encouraging considering that the judgements presented here are based on a single

utterance rather than audio conversation extracts collected over 48 hours.

Personality trait r
Extraversion .73
Emotional stability .67
Agreeableness .54
Conscientiousness .42
Openness to experience .44

Table 6.4: Average inter-rater correlation r over ratings of the utterances gener-
ated with the low and high parameter settings for each trait. All correlations are
significant at the p < .05 level (two-tailed).

Page 143

6.3. Results 135

Generation accuracy

While predefined parameters can generate recognisable personality on average, the

distributions of ratings over the two utterance sets shown in Figures 6.4 and 6.5

give additional insight into the variation produced by PERSONAGE-RB. Reported

ratings are averaged over all judges, thus they are less extreme than individual

judgements, e.g. an extraversion rating of 1.0 implies that all three judges agreed

on that score. As both extremes of each personality dimension are generated, gen-

eration accuracy is evaluated by splitting ratings into two bins around the neutral

rating (4 out of 7), and computing the percentage of utterances with an average

rating falling in the bin predicted by its generation parameters. As the rule-based

approach presented here aims at producing extreme personality, neutral ratings are

considered as misrecognitions.

Personality trait Low High Overall
Extraversion 82.5 100.0 91.3
Emotional stability 80.0 100.0 90.0
Agreeableness 70.0 100.0 85.0
Conscientiousness 60.0 100.0 80.0
Openness to experience 90.0 55.0 72.5
All utterances 85.0

Table 6.5: Generation accuracy (in %) for the utterance sets generated with the
low and high parameter settings for each trait. An utterance is correctly recognised
if its average rating falls in the half of the scale predicted by its parameter setting.
Neutral ratings (4 out of 7) are counted as misrecognitions.

Figure 6.4(a) shows that extravert utterances were all recognised as such, with

approximately normally distributed ratings, whereas 17.5% of the introvert utter-

ances were rated as neutral or extravert. Extraversion is the easiest trait to project

in our domain, with ratings covering the full range of the scale and an overall accu-

racy of 91.3% over both utterance sets. Figure 6.4(b) shows that PERSONAGE-RB

did not generate utterances perceived as extremely neurotic by all judges, as no

utterance were rated below 2 out of 7 on that scale. Also, while all emotionally sta-

ble utterances were perceived correctly, 20% of the neurotic utterances were rated

as neutral or moderately stable: the ratings’ distribution of neurotic utterances is

slightly biased towards the positive end of the scale. The parameter settings for

agreeableness produce utterances covering the largest range of ratings after extra-

version (from 1.5 to 6.5), although 30% of the disagreeable utterances were rated

Page 284


N. Wang, W. L. Johnson, R. E. Mayer, P. Rizzo, E. Shaw, and H. Collins. The

politeness effect: Pedagogical agents and learning gains. Frontiers in Artificial

Intelligence and Applications, 125:686–693, 2005.

D. Watson and L. A. Clark. On traits and temperament: General and specific factors

of emotional experience and their relation to the five factor model. Journal of

Personality, 60(2):441–76, 1992., 1.4.1, 4.4, 4.6, 4.7

J. B. Weaver. Personality and self-perceptions about communication. In J. C. Mc-

Croksey, J. A. Daly, M. M. Martin, and M. J. Beatty, editors, Communication and

Personality: Trait perspectives, chapter 4, pages 95–118. Hampton Press, 1998.

4.1, 4.3, 4.3, 4.4

N. Webb, M. Hepple, and Y. Wilks. Error analysis of dialogue act classification.

In Proceedings of the 8th International Conference on Text, Speech and Dialogue,

2005. 3.7

J. Weizenbaum. Eliza–a computer program for the study of natural language com-

munication between man and machine. Communications of the ACM, 9(1):36–45,

1966. 2.4.1

J. Weizenbaum. Computer power and human reason. Freeman, San Francisco, 1976.


J. Wiebe and E. Riloff. Creating subjective and objective sentence classifiers from

unannotated texts. In Proceedings of the 6th International Conference on Intelligent

Text Processing and Computational Linguistics, 2005. 1.4.1

J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin. Learning subjective language.

Computational Linguistic, 30(3):277–308, 2004. 1.4.1

J. Wilkie, M. A. Jacka, and P. J. Littlewood. System-initiated digressive proposals

in automated human-computer telephone dialogues: the use of contrasting po-

liteness strategies. International Journal of Human-Computer Studies, 62:41–71,


Page 285


T. Wilson, J. Wiebe, and R. Hwa. Just how mad are you? finding strong and

weak opinion clauses. In Proceedings of the 19th National Conference on Artificial

Intelligence (AAAI), pages 761–769, 2004. 1.4.1

I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and tech-

niques. Morgan Kaufmann, San Francisco, CA, 2005. 3.2.4, 8.2, 9.1.2

Similer Documents