Mandy, your explanation of the definitions is pretty much right on. The only thing that I may add is that a variable ratio of reinforcement is usually based around some average rate of reinforcement. When using a variable ratio of reinforcement you cannot go from rewarding every occurrence of the behavior to a high ratio immediately or the behavior will become extinct.
Instead you may start by skipping rewarding the behavior on occasion and eventually moving to a 2:1, 3:1, or other average ratio. It is not completely random because reward continues to be given, however, not at an absolutely predictable rate. Variable ratio of reinforcement is stronger than a fixed ratio.
In a fixed ratio of 2:1 vs a variable ratio of reinforcement of 2:1 the difference would be as such:
Fixed Ratio: reward, don't reward. reward. don't reward, etc
Variable Ratio: reward. reward, don't reward, don't reward, reward, don't reward, etc
The fixed ratio could become predictable and the animal would not try as hard on the instances where a reward cannot be expected. However, with variable ratio reinforcement, since the schedule of reinforcement is random (yet based on a viable average rate of response), the animals cannot predict reinforcement and has to attempt the behavior every time it is cued because it has no way of knowing when reinforcement will be given.
A variable ratio of reinforcement can be very powerful because it is:
A) More resistant to extinction than continuous reinforcement
B) Allows more repetitions to practice a trick before parrot is over filled
C) Keeps the parrot interested/motivated because it's a bit of a game
Believe it or not, a variable ratio of reinforcement is less prone to extinction than continuous reinforcement. With continuous reinforcement, the bird learns to expect positive reinforcement for every exhibit of the behavior. So if for any reason reinforcement stops being provided, the parrot will stop providing the behavior because it will assume that reinforcement for the behavior has expired and move on to something else. You see in the wild certain behaviors are reinforced in a fixed quantity. Say a sunflower has only so many sunflower seeds. Regardless what behavior the parrot may have learned to extract the seeds, once they are consumed, the foraging behavior would no longer be reinforced and go extinct if there are no more seed-filled sunflowers. Variable Ratio of Reinforcement, however, sets the animal up to believe that it has to keep trying and eventually reinforcement will resume. This the parrot will continue performing the behavior for a much longer time before completely giving up on it.
A psychologist told me a great real-life example of this. Say there is a soda vending machine that you have always known to be reliable that in 100 uses never ate your money and one time it does. You would naturally assume the machine is broken and would stop trying because you know it never fails. On the other hand if the machine is known to fail intermittently you might opt to put another dollar in because it usually works the next time. The owner of the machine could sucker extra dollars on failed purchase this way. Not that I am saying they do it on purpose.
Continuous reinforcement has its place for initially training a behavior to a parrot. However, once the parrot knows the behavior, the best way to maintain the behavior and to maintain spontaneity of response is to continue with a variable ratio of reinforcement. In
this video (from 0:25-0:33) you can see me cuing Kili for 5 tricks for a single treat. I mix it up. Sometimes I will reward her off the bat, but usually I'll have her do 3, 4, or 5 tricks to earn the single treat. By clicking each trick though, I can signal to her that she got it right and we're just moving onto the next trick rather than lingering on the previous trick and having her doubt whether or not she got it right.
Now I have started to introduce a variable ratio of reinforcement to flight recall training. I used to reward Kili every time for recalling because I was afraid of causing extinction by not rewarding but now she knows the recall so well that I can afford to skip rewarding it every time. It will take more than a few instances of not rewarding the trick to lead to extinction after all the work we did and the more I get her used to a variable ratio of reinforcement for the flight recall behavior, the more reliable I hope to get the recall with or without a treat.
Today, Kathleen and I vertical recall trained Kili on my staircase using a variable ratio of response. At first Kili seemed to respond poorly to the variable ratio when she wasn't being rewarded so we had to try hard to get her over the bump of the first few not-reinforcement recalls to get her to realize that she will be randomly rewarded again later. Once she got through a couple rewards/not-rewards she realize the name of the game: fly the recall, find out if you get the treat or not. Today we progressed to a variable ratio of reinforcement of VR2 but I hope to get to VR3-5 within a few weeks where I would only have to reward an average of once out of 3-5 recalls. Hopefully this will help me increase the rate of response as well as the number of recalls I can practice in a session.
Some tips about using variable ratio of reinforcement:
- Do not give any clues whether or not you will be rewarding each instance of behavior. Hold the treat as you would if you planned to reward it. You don't want to clue off the parrot that you are not planning to reward it because it will learn not to come when you are not going to reward
- Plan and be prepared to reward the parrot every time you cue it and make the decision whether or not to reward at the moment reward is due so that it can be truly random
- Use a clicker and click every time the parrot has done the correct behavior so that even without a treat it knows that the behavior was done correctly and worth rewarding in contrast to incorrect behavior not meant to be rewarded