Angels Tailwaggin Training LLC
HomeAbout AngelTrainingSchedules/FeesInfo:Playgroups/walkaboutsHousebreaking/TipsPositive Reinforcement TheoryStudentsWeblogMeet my 'furry kids"LinksDogs for AdoptionJust My Opinion!Contact Me
Explaining Positive Reinforcment Theory

Reinforcement Theory

Reinforcement theory is the process of shaping behavior by controlling the consequences of the behavior. In reinforcement theory a combination of rewards and/or punishments is used to reinforce desired behavior or extinguish unwanted behavior. Any behavior that elicits a consequence is called operant behavior, because the individual operates on his or her environment. Reinforcement theory concentrates on the relationship between the operant behavior and the associated consequences, and is sometimes referred to as operant conditioning.

BACKGROUND AND DEVELOPMENT OF REINFORCEMENT THEORY

Behavioral theories of learning and motivation focus on the effect that the consequences of past behavior have on future behavior. This is in contrast to classical conditioning, which focuses on responses that are triggered by stimuli in an almost automatic fashion. Reinforcement theory suggests that individuals can choose from several responses to a given stimulus, and that individuals will generally select the response that has been associated with positive outcomes in the past. E.L. Thorndike articulated this idea in 1911, in what has come to be known as the law of effect. The law of effect basically states that, all other things being equal, responses to stimuli that are followed by satisfaction will be strengthened, but responses that are followed by discomfort will be weakened.

B.F. Skinner was a key contributor to the development of modern ideas about reinforcement theory. Skinner argued that the internal needs and drives of individuals can be ignored because people learn to exhibit certain behaviors based on what happens to them as a result of their behavior. This school of thought has been termed the behaviorist, or radical behaviorist, school.

REINFORCEMENT, PUNISHMENT, AND EXTINCTION

The most important principle of reinforcement theory is, of course, reinforcement. Generally speaking, there are two types of reinforcement: positive and negative. Positive reinforcement results when the occurrence of a valued behavioral consequence has the effect of strengthening the probability of the behavior being repeated. The specific behavioral consequence is called a reinforcer. An example of positive reinforcement might be a salesperson that exerts extra effort to meet a sales quota (behavior) and is then rewarded with a bonus (positive reinforcer). The administration of the positive reinforcer should make it more likely that the salesperson will continue to exert the necessary effort in the future.

Negative reinforcement results when an undesirable behavioral consequence is withheld, with the effect of strengthening the probability of the behavior being repeated. Negative reinforcement is often confused with punishment, but they are not the same. Punishment attempts to decrease the probability of specific behaviors; negative reinforcement attempts to increase desired behavior. Thus, both positive and negative reinforcement have the effect of increasing the probability that a particular behavior will be learned and repeated. An example of negative reinforcement might be a salesperson that exerts effort to increase sales in his or her sales territory (behavior), which is followed by a decision not to reassign the salesperson to an undesirable sales route (negative reinforcer). The administration of the negative reinforcer should make it more likely that the salesperson will continue to exert the necessary effort in the future.

As mentioned above, punishment attempts to decrease the probability of specific behaviors being exhibited. Punishment is the administration of an undesirable behavioral consequence in order to reduce the occurrence of the unwanted behavior. Punishment is one of the more commonly used reinforcement-theory strategies, but many learning experts suggest that it should be used only if positive and negative reinforcement cannot be used or have previously failed, because of the potentially negative side effects of punishment. An example of punishment might be demoting an employee who does not meet performance goals or suspending an employee without pay for violating work rules.

Extinction is similar to punishment in that its purpose is to reduce unwanted behavior. The process of extinction begins when a valued behavioral consequence is withheld in order to decrease the probability that a learned behavior will continue. Over time, this is likely to result in the ceasing of that behavior. Extinction may alternately serve to reduce a wanted behavior, such as when a positive reinforcer is no longer offered when a desirable behavior occurs. For example, if an employee is continually praised for the promptness in which he completes his work for several months, but receives no praise in subsequent months for such behavior, his desirable behaviors may diminish. Thus, to avoid unwanted extinction, managers may have to continue to offer positive behavioral consequences.

Why should animal trainers be bothered with learning the theory behind how their animals learn? Many excellent trainers have no formal schooling or organized understanding of how their training is effective or how their charges work. But training is both an art and a science. More and more trainers - pet owners, show competitors, horseback riders, show-business trainers, zookeepers, aquarium trainers and more - are finding that an understanding of learning theory helps them understand their animals' behaviors better, and plan their training accordingly. So trainers are learning the theory of learning theory!

Classical or "Pavlovian" Conditioning

Theory

Classical Conditioning is the type of learning made famous by Pavlov's experiments with dogs. The gist of the experiment is this: Pavlov presented dogs with food, and measured their salivary response (how much they drooled). Then he began ringing a bell just before presenting the food. At first, the dogs did not begin salivating until the food was presented. After a while, however, the dogs began to salivate when the sound of the bell was presented. They learned to associate the sound of the bell with the presentation of the food. As far as their immediate physiological responses were concerned, the sound of the bell became equivalent to the presentation of the food.

Classical conditioning is used by trainers for two purposes: To condition (train) autonomic responses, such as the drooling, producing adrenaline, or reducing adrenaline (calming) without using the stimuli that would naturally create such a response; and, to create an association between a stimulus that normally would not have any effect on the animal and a stimulus that would.

Stimuli that animals react to without training are called primary or unconditioned stimuli (US). They include food, pain, and other "hardwired" or "instinctive" stimuli. Animals do not have to learn to react to an electric shock, for example. Pavlov's dogs did not need to learn about food.

Stimuli that animals react to only after learning about them are called secondary or conditioned stimuli (CS). These are stimuli that have been associated with a primary stimulus. In Pavlov's experiment, the sound of the bell meant nothing to the dogs at first. After its sound was associated with the presentation of food, it became a conditioned stimulus. If a warning buzzer is associated with the shock, the animals will learn to fear it.

Secondary stimuli are things that the trainee has to learn to like or dislike. Examples include school grades and money. A slip of paper with an "A" or an "F" written on it has no meaning to a person who has never learned the meaning of the grade. Yet students work hard to gain "A's" and avoid "F's". A coin or piece of paper money has no meaning to a person who doesn't use that sort of system. Yet people have been known to work hard to gain this secondary reinforcer.

Application

Classical conditioning is very important to animal trainers, because it is difficult to supply an animal with one of the things it naturally likes (or dislikes) in time for it to be an important consequence of the behavior. In other words, it's hard to toss a fish to a dolphin while it's in the middle of a jump or finding a piece of equipment on the ocean floor a hundred meters below. So trainers will associate something that's easier to "deliver" with something the animal wants through classical conditioning. Some trainers call this a bridge (because it bridges the time between when the animal performs a desired behavior and when it gets its reward). Marine mammal trainers use a whistle. Many other trainers use a clicker a cricket-like box with a metal tongue that makes a click-click sound when you press it.

You can classically condition a clicker by clicking it and delivering some desirable treat, many times in a row. Simply click the clicker, pause a moment, and give the dog (or other animal) the treat. After you've done this a few times, you may see the animal visibly startle, look towards the treat, or look to you. This indicates that she's starting to form the association. Some clicker trainers call this "charging up the clicker". It's also called "creating a conditioned reinforcer". The click sound becomes a signal for an upcoming reinforcement. As a shorthand, some clicker trainers will say that the click = the treat.

Operant Conditioning

Classical conditioning forms an association between two stimuli. Operant conditioning forms an association between a behavior and a consequence. (It is also called response-stimulus or RS conditioning because it forms an association between the animal's response [behavior] and the stimulus that follows [consequence])

Four Possible Consequences

There are four possible consequences to any behavior. They are:

Something Good can start or be presented;
Something Good can end or be taken away;
Something Bad can start or be presented;
Something Bad can end or be taken away.

Consequences have to be immediate, or clearly linked to the behavior. With verbal humans, we can explain the connection between the consequence and the behavior, even if they are separated in time. For example, you might tell a friend that you'll buy dinner for them since they helped you move, or a parent might explain that the child can't go to summer camp because of her bad grades. With very young children, humans who don't have verbal skills, and animals, you can't explain the connection between the consequence and the behavior. For the animal, the consequence has to be immediate. The way to work around this is to use a bridge (see above).

Technical Terms

The technical term for "an event started" or "an item presented" is positive, since it's something that's added to the animal's environment.

The technical term for "an event ended" or "an item taken away" is negative, since it's something that's subtracted from the animal's environment.

Anything that increases a behavior - makes it occur more frequently, makes it stronger, or makes it more likely to occur - is termed a reinforcer. Often, an animal (or person) will perceive "starting Something Good" or "ending Something Bad" as something worth pursuing, and they will repeat the behaviors that seem to cause these consequences. These consequences will increase the behaviors that lead to them, so they are reinforcers. These are consequences the animal will work to attain, so they strengthen the behavior.

Anything that decreases a behavior - makes it occur less frequently, makes it weaker, or makes it less likely to occur - is termed a punisher. Often, an animal (or person) will perceive "ending Something Good" or "starting Something Bad" as something worth avoiding, and they will not repeat the behaviors that seem to cause these consequences. These consequences will decrease the behaviors that lead to them, so they are punishers.

Applying these terms to the Four Possible Consequences, you get:

Something Good can start or be presented, so behavior increases = Positive Reinforcement (R+)

Something Good can end or be taken away, so behavior decreases = Negative Punishment (P-)

Something Bad can start or be presented, so behavior decreases = Positive Punishment (P+)

Something Bad can end or be taken away, so behavior increases = Negative Reinforcement (R-)

or:
 

Reinforcement
(behavior increases)

Punishment
(behavior decreases)
Positive
(something added)
Positive Reinforcement:
Something added increases behavior
Positive Punishment
Something added decreases behavior
Negative
(something removed)
Negative Reinforcement
Something removed increases behavior
Negative Punishment
Something removed decreases behavior

Stacy Braslau-Schneck, MA