This post is not primarily about practical training, but is rather about learning theory. Understanding it may, however, help with practical matters such as consistency and timing.
In behavioral analysis, the -P quadrant is traditionally described as follows: When a rewarding stimulus is removed (that's the minus-sign) as the outcome for a behavior, the behavior tends to decline (that's the P for punish).
Meanwhile, -R is described as: When an aversive stimulus is removed (again, the minus-sign) as the outcome for a behavior, the behavior tends to increase (that's the R for reinforce).
Neither of those definitions is found much in practice, either in the lab or in practical training.
Instead, the most common form of -P is this: When an object that represents an opportunity for reinforcement is removed as the outcome for a behavior, the behavior tends to decline.
And the most common form of -R is this: When an aversive stimulus is avoided as the outcome for a behavior, the behavior tends to increase.
As a 2Q trainer, I choose not to use -R in my training, just as I choose not to use +P. Those are the two quadrants that use aversive stimuli.
I try to use +R as much as possible, saving -P for tune-ups. Even though -P does not use an aversive stimulus the way -R and +P do, the use of -P tends to be demotivating. Another way of saying that is that using -P is itself aversive to the subject, even though no aversive stimulus is used.
What I've described above is a fairly traditional view of the behavioral quadrants described. But now I'd like to describe something new, something that I, at least, have never seen described before: the use of -P as -R.
The idea is that, since removing the opportunity for reinforcement is unpleasant for the dog, you can strengthen an alternative, desired behavior by letting the dog learn that she can avoid losing the opportunity for reinforcement by performing that desired behavior. Thus -P, a procedure for punishing an undesirable behavior, becomes avoidance conditioning as a way of reinforcing the desired behavior.
Here's a practical example. I've begun using a procedure I call the Assisted Walk Out (AWO) as a way of training Lumi and Laddie not to dawdle on their pick-ups and re-entries into water on land-water-land retrieves. Because the AWO involves removing the opportunity for reinforcement (the assistant picks up the duck, preventing the dog from being able to derive pleasure from completing the retrieve), the AWO functions as -P.
But if the dog learns that a particular cue will result in an AWO if the dog does not perform the desired response, and as a result the dog's response improves as a way of avoiding the AWO, then what is really occurring is avoidance conditioning, just as in -R.
In fact, although the AWO is -P, no specific behavior is being punished. That is, it doesn't matter whether the dog performs undesirable behavior A, B, or C, the outcome will be the same, the AWO. Only if the dog performs the cued behavior is the AWO avoided. Causing a single behavior to increase, while causing a variety of alternative behaviors to decline, is also typical of reinforcement, both -R and +R.
So is the AWO actually -R after all? The answer is no. The procedure involves no aversive stimulus. It is removal of opportunity for reinforcement, one of the classical forms of -P. But in this case it functions as -R, because the dog experiences avoidance conditioning.
Again, -P is typically used to cause a specific behavior, such as barking, to decline. But the AWO uses -P to cause a specific alternative behavior to increase.
As a final note, one of the reasons that -R is the most powerful of the four operant conditioning quadrants is that when the dog learns a behavior that enables her to avoid an aversive stimulus, she is reinforced every time the dog performs that behavior and successfully avoids the aversive. That is, -R continues to work even when the dog does not experience the aversive. In fact, in practice, the desired behavior can continue to get stronger over a period of time when the aversive stimulus is not occurring.
Hopefully, the AWO will function in a similar way with Lumi and Laddie, reinforcing the desired behavior not only when the AWO occurs, but also continuing to reinforce the desired behavior every time the dog performs the desired behavior and avoids the AWO.
Saturday, October 31, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment