Keywords: machine learning, genetic algorithms, learning classifer systems, personalization, context aware systems.
We adhere to Dey's definition of context in our research. He defines context as any information which we can use to identify the situation of an entity; an entity can be a person, a place or an object [6]. Along with this working definition of context, we model a desktop PC as a stationary robot with sensors and actuators that gather context from a user's environment. In this paper, we use a Genetics-Based Machine Learning (GBML) technique, XCS, for learning a mapping from sensor values to application actions. We hypothesize that we can learn to design better, more context-sensitive, user-interfaces (applications) using our stationary robot approach of a desktop PC to sense and respond to a user and increase the level of user-context awareness for applications.
For testing our hypothesis, we built Sycophant, our context-sensitive calendaring application which learns to predict the type of reminder preferred by a user for scheduled appointments and tasks [15]. Sycophant generates the following four types of reminders for an appointment 1) a visual reminder which displays the appointment text in a pop-up window 2) a voice reminder which speaks out the appointment text using the Festival Speech Synthesis system [3] 3) both visual and voice reminders 4) None (we suppress reminder generation until a more opportune moment).
In order to learn a model of useful user-behavior patterns and generate the correct user-preferred reminder, Sycophant continuously monitors and records information from a computer's external and internal environments using sensors. We store this data in a database and mine this user-context data to predict user preferences (behavior) and enhance user interaction. Sycophant uses XCS to mine user context data and constructs a mapping from user-related contextual features to user-preferred reminder types.
More generally, we want to investigate machine learning approaches for building a suite of improved human computer interfaces. In our earlier investigations, we limited the classifier population size of XCS to an estimate of the number of rules used by a decision-tree learner and were successfully able to learn the mapping for a single user (to appear). For the results presented here, we do not limit the classifier population, we operate XCS with and without condensation enabled and we evaluate our approach from data collected from three users. Condensation is a process of extracting a minimal subset of classifiers by reducing the size of the classifier population. This is done by running the genetic algorithm within XCS and setting cross-over and mutation probabilities to zero [20]. We validate the effectiveness of our approach by evaluating Sycophant's performance in learning this mapping task for all three users. We also compare our GBML approach for user-context learning with a well-established decision tree learner.
We break down our results presentation into two sets of supervised learning tasks; the first task is decide whether or not to generate a reminder and the second task is to choose a reminder from amongst four reminder types.
Our results on a data set that has user-context information from three users shows that XCS
achieves a test-set accuracy of
percent for the task of
predicting the type of reminder to use from among four reminder types.
This test-set performance is significantly better than that of a decision-tree which has an accuracy of
percent on the same task.
On the same data set, for the task of learning to decide whether or not to generate a reminder, we get an accuracy of
percent on the test-set using XCS and this accuracy is statistically the same as that of a decision-tree learner.
We obtain similar results with a reduced classifier population size after running XCS in the condensation mode.
XCS's test set performance across different data sets validates our user-context data collection methodology for learning
user-preferred reminder types.
Section 2 presents related work in the area of context-aware applications and genetics-based machine learning. We introduce Sycophant's architecture in section 3. Section 4 gives a brief introduction to XCS. Details about our experimental design and data pre-processing are given in section 5 . Section 6 presents results comparing our GBML technique with a decision-tree learner on various reminder generation tasks. We discuss the lessons learned from our experiments in section 7. The final section contains directions for future work.
John Holland introduced a domain-independent rule-based machine learning complex adaptive scheme, a Learning Classifiers System (LCS) and laid a comprehensive foundation for genetics-based machine learning techniques [8,9]. Simplifying Holland's classifier system, Wilson introduced XCS, a learning classifier system derived from his own ZCS and Animat programs [18,19,20]. XCS has been successfully used for learning Boolean functions by Kovacs and Wilson in the area of data mining [20,21]. Barry and Saxon further tested XCS on the Monk's problems [17]. Wilson evaluated XCS's performance on the Wisconsin Breast Cancer data set and Holmes did the same on epidemiological data in the domain of medicine [22,10]. Decision-tree algorithms for example, Weka's J48, an implementation of Ross Quinlan's C4.5 are also widely used in data-mining applications [16,23].
Our approach leverages ideas from machine learning and context-aware systems. We use XCS to learn the mapping from user-context features to reminder types for constructing a user model and evaluate the feasibility of using XCS for this context learning task by directly comparing XCS's performance to that of J48. For a user, Sycophant learns whether or not to generate a reminder (two-class reminder problem) and also learns to predict the type of reminder preferred by her from a set of four reminder types (four-class reminder problem). This prediction is based on the XCS-generated user model built by using context-data for that particular user.
At the scheduled time for an appointment, Sycophant generates a reminder for that particular task. We consider generating a reminder as an actuator action. With simple sensors providing richer user-context for Sycophant, even without knowing who is there or what is being said, user interaction can be improved. For example, Sycophant can learn the answers to the following questions if it was serving Jane:
Each sensor therefore provides six features.
The nine sensors as described in section 3.2 result in a total of
features.
We also consider the appointment time and user-identifier as attributes.
Therefore, we end up with a total of
features for our user-context data.
Three users, user-1, user-2 and user-3 used Sycophant over a period of six to eight weeks.
Our user-context data sets collected from these users had
exemplars from user-1,
exemplars from user-2 and
exemplars from user-3. We therefore had a total of
exemplars from all these three users.
We label the data set from user-1 as ContextData-1, user-2's data as ContextData-2, user-3's data as ContextData-3and the combined data from both users as ContextData-all-users.
Here is an exemplar from one of the data sets:
user-3, 13.00, 0, 0, 0, 0, 0, 0, 7, 0, 1, 0, 0, 0, 20, 1, 1, 1, 1, 1, 20, 1, 1, 1, 1, 1, 20, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 20, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3User-identifier is the first feature and the time of appointment is the second feature, The remaining set of features in groups of six represent the Sensor-Count, Sensor-All5, Sensor-Any5, Sensor-All1, Sensor-Any1 and Sensor-Immed. We order these sets of sensor features in the following sequence: motion, speech, process, keyboard and mouse. The last feature is type of reminder preferred by a user. We obtain this value through user feedback whenever Sycophant generates a reminder.
XCS consists of population of classifiers. Each classifier has the following important attributes:
During system operation, XCS samples a user-context data exemplar as a message string from the external environment.
This string is defined over
where
is the number of bits in each situation (classifier condition).
Classifiers whose conditions match the environmental message become members of a set called the match-set (
).
If none of the classifiers match the environmental message, the classifier system creates new classifiers with each of the possible actions and assigns these classifiers to
.
This procedure is called covering.
We initialize XCS with an empty population therefore covering occurs only at the beginning of a run in our system.
The system computes a fitness-weighted average of the predictions of each classifier in
for each of the actions present in
.
During our preliminary investigation, we experimented with choosing the action non-deterministically. This method of action selection seemed to degrade the system performance.
To remedy this problem we use the best-action selection mechanism for choosing an action from
and send it to the external environment as XCS's action for the externally sensed input
string.
A reward of
is provided if the action taken by XCS is correct,
otherwise.
Action-set (
) is created from the classifiers in
which proposed this particular action.
XCS updates the different attributes of every classifier in
based on the actual reward returned by the environment.
Subsequently the fitness is also updated.
The system has the option to execute a GA within
based on
, a threshold parameter .
The populations of classifiers are thus evolved by a process of trial and error.
Our GA parameter values are as follows:
the probability of mutating one allele in an offspring classifier is
,
the probability of applying crossover in an offspring classifier is
,
the probability of using a don't care symbol in an allele (
) is
and
is set to
.
When the system senses an input, it creates a match-set and chooses a particular action to execute.
After an appropriate reward is received from the environment, the attributes of the classifiers which resulted in this action being taken are updated.
This procedure ensures that classifiers capable of accurately predicting the correct action to be taken tend to be reproduced more (due
to their increased fitness for choosing the correct action) while inaccurate classifiers eventually get deleted.
The cycle of sensing the input, choosing a particular action, running a GA after a certain threshold is done until some termination condition is met.
We use the metric of randomly sampling
problems from the context data training set or a training system performance of 1.0 as the termination condition in our experiment.
The value of
, the learning rate is set to
.
Constrasted with the traditional model of a classifier system, in XCS, the GA is run on the action sets (instead of the entire population),
and the classifier system has no message-list. [9,8].
Details about the intricate process of formation of match-sets and action-sets,
the execution of a GA within the system,
widely used values for XCS parameters,
and the methodology for updating classifier fitness and is found in Wilson [20].
As a base-rate accuracy for evaluating the performance of a learning scheme, we use the performance of Zero-R, a learning algorithm which predicts the majority class in the training set as the default class. We use a decision-tree learner on all our user-context data (ContextData-1, ContextData-2, ContextData-3, ContextData-all-users) for checking the feasibility of our user-context data collection methodology.
Next, we extend an implementation of XCS in Java to create a user-context related environment for XCS [4].
Section 3.3 describes the user-context data gathered.
For each user-context data set, we transform all the contextual features except the reminder-type into an appropriate binary representation.
After performing this transformation, we end up with a string like the example shown below:
0100010101010101010011100101100001001000010011011110100110111101
00111111100000000000010011111110000000000000000000000:3
The value shown after the ":" is the correct action for the particular classifier condition.
In the above example, the correct action is to choose a reminder of type
.
With user-context data thus transformed, we create ten stratified folds of all our user-context data sets for evaluating the training and testing performance of XCS.
We also compare the performance of XCS with J48, a decision-tree learning algorithm.
You should note that we evaluate both XCS and J48 on the same cross-validation training and test set pairs. By this process, we ensure that both the learning schemes are exposed to the same data samples.
This design allows us to directly compare the performance of the two learning schemes.
We make XCS evolve a set of classifiers for a training fold and store these classifiers after our stopping criterion of
repeatedly sampling
random problems (exemplars) or when the training performance reaches
.
Thus XCS learns a classification model of the target concept (the type of reminder to generate) from the training data.
Next, we test these evolved classifiers on the testing fold and record the performance of the system.
Using the testing fold, we thus evaluate the predictive accuracy of the classification model learnt by XCS on unseen cases.
Performance is defined as the percentage of correctly solved problems in the last
problems sampled.
A problem is correctly solved if the classifier correctly predicts the type of reminder to be used for its condition component.
To get a robust statistical estimate, we conduct ten runs for each pair of training-testing folds and record the performance.
We repeat this procedure for all the ten folds of the cross-validation set to evaluate the system performance.
The same procedure is repeated for the decision-tree algorithm.
We merge the data from all the three users in our study and evaluate XCS's performance on this data set. This metric helps us understand how XCS generalizes learning from user-context data from all the three users.
Earlier, we had investigated an approach of restricting the number of classifiers in XCS's population to an estimate of number of the rules (the number of leaves in the tree) used by a decision tree learner.
A GA in XCS provides a niching facility for allowing cooperative rule-sets to coexist within a
population while allowing competing rule-sets to converge on optimum rule attributes within a niche.
Limiting the size of the population in XCS takes away the opportunity for the GA to experiment with recombination.
In this work, we therefore do not limit the classifier population size but use a maximum classifier population size of
(
) for XCS.
Further, we use condensation to extract minimal subset of classifiers which represent the final solution [20].
Condensation consists of running the system with crossover and mutation rates set to zero.
This process suspends genetic search as no new classifier conditions can now be generated.
However, the classifier selection and deletion processes still continue to operate whenever the GA
is triggered. This results in a tendency for less fit and less general classifiers to be weeded out of the population.
We enable condensation after our stopping criterion during the training fold evaluation. Condensation stops
after an additional
problems have been sampled from the training fold.
Currently, we avoid varying too many experimental parameters (XCS settings, data preprocessing options, learning schemes etcetera) to thoroughly explore the capabilities of our approach by simplifying the way we organize the presentation of our results.
Sycophant has to learn the type of reminder preferred a user.
We cast this problem into two sets of supervised learning tasks.
In the first learning task set, we consider the task of reminder generation as question of whether or not to interrupt a user.
We do this modification by assigning reminder type
(do not generate a reminder) to the first class and grouping
reminder types
,
and
to the second class (generate a reminder).
For the second learning task,
Sycophant has to learn to discriminate between the four types of reminder.
We employ this procedure across all our user-context data sets and evaluate the training and testing performance of
learning schemes.
In both the tables, the first column represents the user-context data set on which the reminder generation task was learnt, the second column represents the performance of the Zero-R learning algorithm which we use as the base-rate for evaluating the performance of a learning scheme, J48's performance is shown in column three, XCS's performance is shown in column four, XCS's performance with condensation is shown in column five along with the number of classifiers present after condensation, the sixth column is the inference from a two-sample t-test comparing XCS's performance with that of J48 on the test-set and the seventh column is a similar inference comparing the performance of XCS with condensation to that of J48 on the test-set.
From table 1, we see that both J48 and XCS significantly outperform Zero-R except in case of ContextData-3.
Closer analysis of user-3's data revealed that this user preferred to be reminded with a single type of reminder most of the time.
This explains the high accuracy achieved by Zero-R on this data set.
Table 1 shows that J48 and XCS are capable of achieving a minimum accuracy of
percent across all the user-context data sets.
This clearly validates our methodology for gathering user-context and creating context-data sets to learn user-preferred reminder types.
Statistical tests provide the inference that XCS equals the performance of J48 across all the user-context data sets and also XCS does better than J48 on Context-Data-2.
We notice that on ContextData-all-users, both the learning schemes achieve a base accuracy of
percent thereby showing that it is possible to learn preferences for reminder types across three different users.
We plan to gather more context data from diverse population of users to strengthen our claim for generalization of our learning techniques
across different users.
Table 2 shows that XCS with and without condensation enabled outperforms J48 across three user-context data sets, however the performance degrades on the ContextData-user-3. After condensation, Context-Data-1 also seems to have more classifiers than what we expected. We strongly suspect that condensation phase could have been triggered prematurely for both these cases, thereby leaving a non-optimal subset in the classifier population. We intend to investigate this issue further.
| Learning | II | III | IV | V | ||
| Algorithm |
Zero-R | J48 | XCS | XCS |
Inference | Inference |
| Data | ||||||
| Set |
||||||
| ContextData-1 | 0.5677 | 0.7428 | 0.7460 | 0.7406, 5059 | Equal | Equal |
|---|---|---|---|---|---|---|
| ContextData-2 | 0.5988 | 0.7885 | 0.9800 | 0.9300, 9678 | Better | Better |
| ContextData-3 | 1.0000 | 1.0000 | 1.0000 | 1.0000, 4722 | Equal | Equal |
| ContextData-all-users | 0.6115 | 0.8685 | 0.8791 | 0.8788, 7699 | Equal | Equal |
| Learning | II | III | IV | V | ||
| Algorithm |
Zero-R | J48 | XCS | XCS |
Inference | Inference |
| Data | ||||||
| Set |
||||||
| ContextData-1 | 0.5677 | 0.7000 | 1.0000 | 1.0000, 16289 | Better | Better |
|---|---|---|---|---|---|---|
| ContextData-2 | 0.5988 | 0.7267 | 1.0000 | 1.0000, 9433 | Better | Better |
| ContextData-3 | 0.9913 | 1.0000 | 0.9118 | 0.9118, 5071 | Worse | Worse |
| ContextData-all-users | 0.3884 | 0.8236 | 0.8791 | 0.8236, 8493 | Better | Better |
We provide all the experiment related files at
http://www.cse.unr.edu/anilk/IICAI_2005/experiment.html.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 1 iicai05.tex
The translation was initiated by Sushil Louis on 2005-08-18