By the end of this section, you will be able to:
- Define operant conditioning
- Explain the difference between reinforcement and punishment
- Distinguish between reinforcement schedules
The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning. In operant conditioning, organisms learn to associate a behavior and its consequence ([link]). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.
|Classical Conditioning||Operant Conditioning|
|Conditioning approach||An unconditioned stimulus (such as food) is paired with a neutral stimulus (such as a bell). The neutral stimulus eventually becomes the conditioned stimulus, which brings about the conditioned response (salivation).||The target behavior is followed by reinforcement or punishment to either strengthen or weaken it, so that the learner is more likely to exhibit the desired behavior in the future.|
|Stimulus timing||The stimulus occurs immediately before the response.||The stimulus (either reinforcement or punishment) occurs soon after the response.|
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike. According to the law of effect, behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ([link]). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ([link]).
|Positive||Something is added to increase the likelihood of a behavior.||Something is added to decrease the likelihood of a behavior.|
|Negative||Something is removed to increase the likelihood of a behavior.||Something is removed to decrease the likelihood of a behavior.|
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement, a desirable stimulus is added to increase a behavior.
For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid 20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.
With a fixed ratio reinforcement schedule, there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule, the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is 50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ([link]).
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ([link]). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
COGNITION AND LATENT LEARNING
Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman, had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map: a mental picture of the layout of the maze ([link]). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning: learning that occurs but is not observable in behavior until there is a reason to demonstrate it.
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.
________ is when you take away a pleasant stimulus to stop a behavior.
- positive reinforcement
- negative reinforcement
- positive punishment
- negative punishment
Which of the following is not an example of a primary reinforcer?
Rewarding successive approximations toward a target behavior is ________.
- positive reinforcement
- negative reinforcement
Slot machines reward gamblers with money according to which reinforcement schedule?
- fixed ratio
- variable ratio
- fixed interval
- variable interval
Critical Thinking Questions
What is a Skinner box and what is its purpose?
A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.
What is the difference between negative reinforcement and punishment?
In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)
What is shaping and how would you use shaping to teach a dog to roll over?
Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.
Personal Application Questions
Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.
Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?
- cognitive map
- mental picture of the layout of the environment
- continuous reinforcement
- rewarding a behavior every time it occurs
- fixed interval reinforcement schedule
- behavior is rewarded after a set amount of time
- fixed ratio reinforcement schedule
- set number of responses must occur before a behavior is rewarded
- latent learning
- learning that occurs, but it may not be evident until there is a reason to demonstrate it
- law of effect
- behavior that is followed by consequences satisfying to the organism will be repeated and behaviors that are followed by unpleasant consequences will be discouraged
- negative punishment
- taking away a pleasant stimulus to decrease or stop a behavior
- negative reinforcement
- taking away an undesirable stimulus to increase a behavior
- operant conditioning
- form of learning in which the stimulus/experience happens after the behavior is demonstrated
- partial reinforcement
- rewarding behavior only some of the time
- positive punishment
- adding an undesirable stimulus to stop or decrease a behavior
- positive reinforcement
- adding a desirable stimulus to increase a behavior
- primary reinforcer
- has innate reinforcing qualities (e.g., food, water, shelter, sex)
- implementation of a consequence in order to decrease a behavior
- implementation of a consequence in order to increase a behavior
- secondary reinforcer
- has no inherent value unto itself and only has reinforcing qualities when linked with something else (e.g., money, gold stars, poker chips)
- rewarding successive approximations toward a target behavior
- variable interval reinforcement schedule
- behavior is rewarded after unpredictable amounts of time have passed
- variable ratio reinforcement schedule
- number of responses differ before a behavior is rewarded