Variable Ratio Reinfocrement vs. Continuous Reinforcement?

Parrot Youtube Channel · by **Michael** » Fri Nov 13, 2009 3:03 pm

Let's discuss continuous reinforcement vs. fixed ratio and variable ratio reinforcement.

I don't have the time right now to explain these but if someone would care to define the terms for anyone that does not know. If not I will get back to this later.

by **MandyG** » Fri Nov 13, 2009 4:30 pm

I don't have time for a discussion right now but I have the time to basically define what they are. Someone can explain them in greater detail (or correct me if I'm wrong) if they'd like but I'm only going to write about the basic meanings.

Continuous Reinforcement - Reinforcement is given after every response.
Example - Everytime your bird waves you reinforce it.

Fixed Ratio Reinforcement - reinforcement is given after every nth response.
Example - Every third time your bird waves you reinforce it.

Variable Ratio Reinforcement - reinforcement is given after a random number of responses.
Example - Your bird waves twice, you reinforce it. The next time your bird waves 5 times, you reinforce it. Continue reinforcing after a random number of responses.

Parrot Youtube Channel · by **Michael** » Sat Nov 14, 2009 1:35 am

Mandy, your explanation of the definitions is pretty much right on. The only thing that I may add is that a variable ratio of reinforcement is usually based around some average rate of reinforcement. When using a variable ratio of reinforcement you cannot go from rewarding every occurrence of the behavior to a high ratio immediately or the behavior will become extinct.

Instead you may start by skipping rewarding the behavior on occasion and eventually moving to a 2:1, 3:1, or other average ratio. It is not completely random because reward continues to be given, however, not at an absolutely predictable rate. Variable ratio of reinforcement is stronger than a fixed ratio.

In a fixed ratio of 2:1 vs a variable ratio of reinforcement of 2:1 the difference would be as such:

Fixed Ratio: reward, don't reward. reward. don't reward, etc
Variable Ratio: reward. reward, don't reward, don't reward, reward, don't reward, etc

The fixed ratio could become predictable and the animal would not try as hard on the instances where a reward cannot be expected. However, with variable ratio reinforcement, since the schedule of reinforcement is random (yet based on a viable average rate of response), the animals cannot predict reinforcement and has to attempt the behavior every time it is cued because it has no way of knowing when reinforcement will be given.

A variable ratio of reinforcement can be very powerful because it is:

A) More resistant to extinction than continuous reinforcement
B) Allows more repetitions to practice a trick before parrot is over filled
C) Keeps the parrot interested/motivated because it's a bit of a game

Believe it or not, a variable ratio of reinforcement is less prone to extinction than continuous reinforcement. With continuous reinforcement, the bird learns to expect positive reinforcement for every exhibit of the behavior. So if for any reason reinforcement stops being provided, the parrot will stop providing the behavior because it will assume that reinforcement for the behavior has expired and move on to something else. You see in the wild certain behaviors are reinforced in a fixed quantity. Say a sunflower has only so many sunflower seeds. Regardless what behavior the parrot may have learned to extract the seeds, once they are consumed, the foraging behavior would no longer be reinforced and go extinct if there are no more seed-filled sunflowers. Variable Ratio of Reinforcement, however, sets the animal up to believe that it has to keep trying and eventually reinforcement will resume. This the parrot will continue performing the behavior for a much longer time before completely giving up on it.

A psychologist told me a great real-life example of this. Say there is a soda vending machine that you have always known to be reliable that in 100 uses never ate your money and one time it does. You would naturally assume the machine is broken and would stop trying because you know it never fails. On the other hand if the machine is known to fail intermittently you might opt to put another dollar in because it usually works the next time. The owner of the machine could sucker extra dollars on failed purchase this way. Not that I am saying they do it on purpose. :lol:

Continuous reinforcement has its place for initially training a behavior to a parrot. However, once the parrot knows the behavior, the best way to maintain the behavior and to maintain spontaneity of response is to continue with a variable ratio of reinforcement. In this video (from 0:25-0:33) you can see me cuing Kili for 5 tricks for a single treat. I mix it up. Sometimes I will reward her off the bat, but usually I'll have her do 3, 4, or 5 tricks to earn the single treat. By clicking each trick though, I can signal to her that she got it right and we're just moving onto the next trick rather than lingering on the previous trick and having her doubt whether or not she got it right.

Now I have started to introduce a variable ratio of reinforcement to flight recall training. I used to reward Kili every time for recalling because I was afraid of causing extinction by not rewarding but now she knows the recall so well that I can afford to skip rewarding it every time. It will take more than a few instances of not rewarding the trick to lead to extinction after all the work we did and the more I get her used to a variable ratio of reinforcement for the flight recall behavior, the more reliable I hope to get the recall with or without a treat.

Today, Kathleen and I vertical recall trained Kili on my staircase using a variable ratio of response. At first Kili seemed to respond poorly to the variable ratio when she wasn't being rewarded so we had to try hard to get her over the bump of the first few not-reinforcement recalls to get her to realize that she will be randomly rewarded again later. Once she got through a couple rewards/not-rewards she realize the name of the game: fly the recall, find out if you get the treat or not. Today we progressed to a variable ratio of reinforcement of VR2 but I hope to get to VR3-5 within a few weeks where I would only have to reward an average of once out of 3-5 recalls. Hopefully this will help me increase the rate of response as well as the number of recalls I can practice in a session.

Some tips about using variable ratio of reinforcement:

- Do not give any clues whether or not you will be rewarding each instance of behavior. Hold the treat as you would if you planned to reward it. You don't want to clue off the parrot that you are not planning to reward it because it will learn not to come when you are not going to reward
- Plan and be prepared to reward the parrot every time you cue it and make the decision whether or not to reward at the moment reward is due so that it can be truly random
- Use a clicker and click every time the parrot has done the correct behavior so that even without a treat it knows that the behavior was done correctly and worth rewarding in contrast to incorrect behavior not meant to be rewarded

by **Mona** » Mon Nov 16, 2009 12:23 pm

Hi Michael:

Every thing has advantages and disadvantages. You did a good job lying out advantages of variable reinforcement.

Any thoughts on disadvantages?

Thanks

Mona

Parrot Youtube Channel · by **Michael** » Mon Nov 16, 2009 1:32 pm

Stress and aggression.

by **Mona** » Tue Nov 17, 2009 12:14 pm

Whoa...Those are big disadvantages! Care to elaborate?

Parrot Youtube Channel · by **Michael** » Tue Nov 17, 2009 2:12 pm

Well there have been studies that show that excessively high ratios of variable ratio reinforcement frustrate the animals (across species) and lead to an aggressive response. A psychologist told me about a study he knew someone did where the guy took pigeons off of free feed and forced them to work for all their food. He used operant conditioning to teach the pigeons to press a lever to get food. This worked great at first so he put the pigeons on a variable ratio of reinforcement and started reducing the payout ratio to see how far he can push the pigeons to work for less and less food per operation.

He found that he could push the ratio into the hundreds and the pigeons would still try to push the lever because that was the ONLY way they could get even those little bits of food they got. So variable ratio of reinforcement is very powerful for maintaining a behavior with very little reinforcement. However, as he was hitting the peek of the variable ratio where the pigeons were taking in less food than the effort it takes to press the lever 100 times to get the food, they would go crazy. They would become very aggressive and start attacking the lever rather than press it. You did not want to get your hands near them when they were like that. He had to discontinue the experiment and return them to freefeed.

The great thing about variable ratio reinforcement is that you do not have to use reinforcement for every instance of the behavior and the behavior is more likely to be repeated. When the parrot learns that the reinforcement schedule is variable, it learns to keep trying rather than to give up. So by using variable ratio, you can get the parrot to do more repetitions for less reinforcement. This way you can have the behavior demonstrated and improved more times because the parrot has not gotten too full of food and is still eager to keep trying till it gets it.

The drawback is that it stresses the animal to some extent. Who is happy to do work and not get anything for it? I have yet to do more experimentation with variable ratio reinforcement for recall flight with Kili but thus far it seems really effective. I'm getting her to recall more times using the same amount of food than before. Each treat goes a longer way. However, I also noticed her giving off a shriek when I don't give her a treat and send her back to perch. If you own a Senegal I'm sure you know what their "I'm mad" shriek sounds like.

Variable ratio reinforcement is not recommended for at least the following two scenarios:

A) Teaching a new trick
B) Introducing an unfamiliar person or getting a new bird

Variable ratio would not be effective during the process of teaching a new trick because it would make the learning span much much longer for learning the trick because the parrot would not always be rewarded and shown that the behavior was correctly learned.

For introducing a new person or parrot I wouldn't recommend variable ratio because it would make the parrot angry or stressed at the new person for not rewarding it when due. I am careful about this and always have Kathleen or guests reward Kili more than I do so that she would like them. As I say, if her beak is full she can't bite. So I'd prefer a guest put a treat immediately in her beak after a recall so that she wouldn't even have the thought or change to bite. However, Kili is bonded to me and for my training efforts I am wiling to experiment with the effectiveness of variable ratio reinforcement. I'm not seeking to make the ratios so extensive that it could be harmful. All I want is that I can call Kili with or without a treat and she'll come to me. If I can set up enough of a variable ratio that I can just decide that I want her and call her, she would come to me.

Variable Ratio Reinfocrement vs. Continuous Reinforcement?

Variable Ratio Reinfocrement vs. Continuous Reinforcement?

Re: Variable Ratio Reinfocrement vs. Continuous Reinfocrement?

Re: Variable Ratio Reinfocrement vs. Continuous Reinfocrement?

Re: Variable Ratio Reinfocrement vs. Continuous Reinfocrement?

Re: Variable Ratio Reinfocrement vs. Continuous Reinfocrement?

Re: Variable Ratio Reinfocrement vs. Continuous Reinfocrement?

Re: Variable Ratio Reinfocrement vs. Continuous Reinfocrement?

Who is online