REINFORCEMENT SCHEDULES

Laboratory study has revealed a variety of reinforcement schedules. Puppy training has revealed that most of these are notorious ineffective, or impossible to administer in practice, with the notable exceptions of variable ratio and especially, differential reinforcement. Yet educators and trainers persist in using these relatively ineffective schedules of reinforcement when trying to teach children and employees and when attempting to train husbands and dogs. Wake up! Puppy training has taught us that most of this stuff doesn’t work too well.

Continuous Reinforcement (CR) — the dog is rewarded after every correct response, for example, the dog is rewarded after every sit

Ironically, continuous reinforcement is the biggest problem in reward-based education and training today. The dog receives far too many rewards, usually food. Certainly CR temporarily increases the frequency of behavior but CR is hopeless for maintaining the frequency and quality of behavior and CR is absolutely no good for improving the quality of behavior. If you reward a dog for every correct response, approximately 50% of the time you will reward the dog for above–average responses and 50% of the time you will reward a dog for below average responses. Consequently, the quality of the behavior will not improve. It is simply too silly to reward a dog for below-average responses.

To make matters worse, consistent reinforcement often causes behavior to decrease in frequency and quality. Since the dog knows that he will always be rewarded … eventually, there’s no need to hurry and so, the dog eventually does it in his own way, in his own good time and his slow sloppy behavior gets rewarded. Spoiled dog syndrome.

And it gets even worse. Rewarding the dog for every correct response makes it very difficult to phase out food rewards in training and usually, response-reliability becomes dependent on the owner having food in their hand or on their person. The dog may deign to work for you if he feels like it and if you have food, but the first time you don’t come up with a food reward, he’ll go on strike.

Basically, think food vending machine. You only use it when you want to (when you’re hungry) and if it fails to deliver food on a just a single occasion, you get mad at the machine and then never use it again.

NEVER use a continuous reinforcement schedule.

However, please do not confuse continuous reinforcement with the classical relationship between a secondary and primary reinforcer, e.g., between a click and a treat. If you click, you must always treat, however, you should progressively refine your criteria so that you click no more than 50% of previously correct responses. Also, you can never give a dog too much praise, too many hugs, or too many pieces of food when classically conditioning the dog to like people, especially children men and strangers and other dogs. But you must kick this food habit if you would like to train dogs to respond reliably to verbal cues.

 

Fixed Duration Reinforcement (FD) — the dog is rewarded after a specific time, for example, after every five seconds of sit-stay (FD5)

FD is no good for improving the quality of performance. In fact, FD produces marked inconsistencies in the quality of behavior — performance-quality tends drops off immediately after each reward. Performance-quality progressively improves as each expected reward-time comes closer but immediately after the dog is rewarded, attention and quality of behavior decrease (because the dog knows that the next reward is sometime in the future).

 

Fixed Ratio Reinforcement (FR) — the dog is rewarded after a specific number of responses, for example after every five sits (FR5)

Initially FR is very good at increasing the frequency of behavior. However, performance-quality often takes a nose-dive and the dog rushes through repetitions to get another reward. Also, if the ratio is stretched too much and too many responses are required for a single reward, the dog may slow down after being rewarded. If the ratio is stretched even more, the dog may give up altogether.

 

Fixed schedules are pretty hopeless in dog training. Fixed schedules are pretty useless for reliably increasing the frequency or duration of behavior and they do nothing to increase quality. Fixed schedules do nothing to specifically instruct the dog how to do better and they do not reinforce the dog for improving the quality of behavior. Performance-quality become inconsistent and usually decreases over time.

I would never use any fixed schedule of reinforcement to train a puppy. However, amazingly, the entire world’s work force is maintained on fixed schedules — FD Pay Day and FR Piece Rate. You simply cannot motivate people on an FD schedule. The reward (pay day or year-end bonus) is now expected. Quality of performance may improve as pay day or the year-end bonus approaches but you’ll still get Monday-morning mourning and January-blahs. Similarly, FR Piece Rate may increase speed of production but usually quality control takes a beating as workers rush to meet their quota. And of cause, the workers will strike if ever the quota is too much to ask for limited pay. Fixed schedules are no way to motivate and reinforce puppies, or the world’s work force.

 

Variable Duration Reinforcement (VD) — the dog is rewarded after unpredictable length durations, for example, for a VD5, the dog is rewarded after varying durations of sit-stay that average out to be five seconds.

Variable duration reinforcement is really good at getting dogs to perform for increasing lengths of time and preparing them to work without the prospect of reinforcement. VR makes it much easier to phase out training rewards. Since the reward-time is unpredictable, the dog’s behavior does not drop off immediately after each reward because the next one could be just one second later.

However, few people could calculate the ratio and train the dog at the same time. For example, to reinforce a dog’s sit-stay using a VD5, we would have to reward the dog after 5, 1, 7, 2, 6, 5, 9, 3, 4, and 8 seconds, for example. I can do this in my head because I am good at mental arithmetic. But what’s the point? Dog training shouldn’t comprise a mathematics test. Dog training should be relaxing and enjoyable. Much easier would be a Random Duration Reinforcement schedule. Just reward your dog after random lengths of sit-stay and progressively increase the average duration over time. Ahh! Now we’re getting there. We are going to rapidly increase the duration of stays and gradually phase out food rewards at the same time. Also, the dog will give you more attention. But … variable duration reinforcement does not the quality of performance.

 

Variable Ratio Reinforcement (VR) — the dog is rewarded after an unpredictable number of responses, for example, for a VR10, the dog is rewarded after varying numbers of sits that average out to be ten sits per reward.

VR reinforcement is wonderful for maintaining high frequencies of behavior for longer and longer durations and for fewer and fewer rewards. VR makes it much easier to phase out food rewards because the dog gets used to working for an increasing number of repetitions without reward. Think slot machine. What do you do when it hasn’t paid out on your last seven dollars? You take your eighth dollar and rub it and kiss it because you’re absolutely certain that this is the one. And then after three more dollars without a payout, you get five dollars back and the machine has you hooked.

Of course, we have the same problem as with all variable schedules that few human brains could calculate the schedule and train the dog ant the same time. But you know what? A Random Ratio schedule is just as good. Just reward recalls and sits at random and your dog is going to keep coming and sitting forever.

I just love the concept of random reinforcement — the notion that we can be utterly random, consistently inconsistent, a total ditz even, yet still maintain motivated levels of high frequency responding in our dogs. Love it. However, VR does not improve the quality of performance because you are still reinforcing as many below average responses as above average responses.

 

Differential Reinforcement (DR) — the dog is given different valued rewards that reflect the quality of the performance, for example, only reward the dog for above-average responses, give better rewards for better responses and give the best rewards for the best responses.

Years ago, I picked up my son from Montessori school and he showed me his previous night’s homework with glee — a gold star. I was furious. I explained to the teacher that the homework was rubbish and that it didn’t deserve a gold star, or a silver star, or a bronze star, or an oblong, or a triangle, or any geometric shape of any color. The homework deserved a massive red “F”. I wanted the grade to reflect the quality of the work. I wanted Jamie to realize that stellar homework was worth a gold star, but rubbish homework was barely worth the ink in the ”F”.

Right from the outset — the puppy’s very first lesson — differential reinforcement is the only way to go to continually and progressively increase the reliability, frequency, panache and pizzazz of performance. Basically, the value of the reward varies according to the quantitative and qualitative aspects of performance. As a guideline, never reward a dog for more than 50% of correct responses. Approximately 50% of responses will be below average and there is absolutely no pint in rewarding the dog for those and less you want his behavior to worsen.

For example, time a dog doing ten recalls and then work out his average recall time. Then only reward your dog for faster-than-average recalls. Recalculate his average after every ten recalls and you will find it is steadily improving as training proceeds. For every ten recalls, you will find than five or six are faster than average. (Because of the long tail of misbehavior — a single lengthy recall considerably biases the average.)

To hear more of Dr. Dunbar's views and insights see him live at one of his upcoming seminars.

Products from Dr. Ian Dunbar

Comments

I'm a fan of Dr. Patricia McConnell as well and she's reading a book that sparked a blog post about training schedules...

http://www.theotherendoftheleash.com/training-schedules/

which may be interesting to note here as well.

 

 

happy-houndz.blogspot.com cheers, kate

In reading this article I couldn’t help but think how it strayed from the doctrine of positive training methods and it made me chuckle because I do this all the time.

I start out by writing the “negatives”, the things you shouldn’t do, the problems, and so on...totally forgetting that

A)online readers don’t read, they skim, and

B) every time they read a paragraph they are punished until they get to the very last paragraph which gives them the info they were looking for and method they ought to use.

So for this blog post, let’s call this Negative Escalation Reinforcement lol

I tease cuz I love and devour all your books, DVDs and such…keep ‘em coming.

This is great - you just don't hear enough people talking about it. Its very important to help the dog distinguish the different areas of play and discipline. People end up confusing their dogs by using the same voice for everything and giving them way too many commands. 

Interesting post.

Why do you think Bob Bailey says a dog should be reinforced every single time he does something you want him to?

LCK

Hi there Charles. Ask Bob. However, I would imagine he means that "every time a dog meets criterion you would reward him." However, the criteria are progressively refined (tightened) as the dog's behavior improves. For example, if training a dog to pay attention,  first you would reward if the dog glances at you, then a look for one second, then two seconds, then three, five, eight, ten, fifteen, twenty and so on, until when the dog is cued he is expected to look at you for 30 seconds. So, when the dog looks at you, he is NOT rewarded every time. However, if he meets criteria, he IS rewarded every time. If the dog looks at you for less than 30 seconds he is not rewarded but if he looks at you for more than 30 seconds he is rewarded, i.e., we only reward above-average responses. Carefully and progressively refining criteria in clicker training is exactly the same principle as test-train-test and rewarding the dog on a differential reinforcement.

Dr. Ian Dunbar,

I do want to thank you for posting this. My dog suffered from a LDS (lazy dog syndrome) although he is very smart and this article made it very clear why. I will admit I was a CR. OH the shame. So I put your word to the test and averaged his recall. If I can, I want to share the results with you. 

1st 10 recalls: 9 sec average.....then with a few DR training

2nd 10 recalls : 3.5 second average

Unbelievable! I had to laugh when it happened. Thanks. 

I read and agree with much of your work, but where is the reinforcement (random or otherwise) in the 'furious' response to your child's teacher? Perhaps you left an impression, but how long-lived will that be once you're out of the room??!

Absolutely! I think the world would be a better place if these puppy training principles were applied to human relationships. I was furious that Jamie was given a gold star for rubbish homework. However, I was not angry or furious when I explained this to his marvelous teacher. I calmly explained that I was upset that the grading did not reflect the poor quality of the work. She gave him a "C' a few weeks later instead of the usual gold star for merely completing and turning in homework. Jamie has a wonderful work ethic and is one third of the Dog Star Daily workforce.

Thank you for responding, that makes perfect sense. So would you then agree with the use of a no reward Mark when training dogs? For example, the 'c' grade communicates to your son that he needs to try harder to hit the Mark; his efforts are not simply ignored (as when increasing criteria). Karen Pryor coined the term 'poisoned cue', suggesting NRM's retard experimentation, do you have any thoughts on this? I'm talking more in terms of modification than straight training. Would you consider it necessary to punctuate unruly/dangerous behaviours with a verbal NRM, or do you think extinction procedures based on withheld rewards suffice?
Again, thank you! I appreciate that you are an extremely busy guy and await any UK seminars!!!
Happy New Year to all!

And Happy New Year to you. Yes, I absolutely agree with an NRM, especially for autoshaping and when technical trainers try to emulate the autoshaping done in laboratories. I try to acknowledge every response — occcasionally with an NRM but mostly with a wide range of differential reinforcements (grades) that reflect the quality of the behaviour. However, with the example of unruly/dangerous beahaviour, or any undesirable behavior, I would do it differntly. Yes, we have to teach people how to quickly stop undesirable behavior but punishment, even non-aversive punishment, is not sufficient. Punishment only decreases the frequency of the immediately preceding behaviour such that it is less likely to occur in the future. Howver, we want much more than that in education. We need to communicate three pieces of information: 1. The present beaviour is undesirable, 2. This is what you need to do to get back on track and 3. The potential danger of non-compliance. This may be accomplished with a single word, so I always give verbal feedback becaseu it's instructive.

Thank you .... For your time, your thoughts/guidance and your honesty! Rare these days; but very much appreciated! www.taketheleadtraining.co.uk

I felt very happy while reading this site. This was really very informative site for me. I really liked it. This was really a cordial post. Thanks a lot!.
 

food processor reviews

You have done a great job on this article.  It’s very readable and highly intelligent.  You have even managed to make it understandable and easy to read.  You have some real writing talent. Thank you.
 kaders bed

I like this post,And I guess that they having fun to read this post,they shall take a good site to make a information,thanks for sharing it to me.
3 credit scores

I wanted to thank you for this great read!! I definitely enjoying every little bit of it I have you bookmarked to check out new stuff you post.  compra venta relojes

Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I'll be subscribing to your feed and I hope you post again soon. Big thanks for the useful info.  fremdenzimmer

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.  preview of the program

This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!.   watch movies online game of thrones

Thanks for sharing the info, keep up the good work going.... I really enjoyed exploring your site. good resource...   taxi obergurgl

These are some great tools that i definitely use for SEO work. This is a great list to use in the future..
about the venus factor

I am very much pleased with the contents you have mentioned. I wanted to thank you for this great article.
https://www.rebelmouse.com/dramamethodreview/

Great Article it its really informative and innovative keep us posted with new updates. its was really valuable. thanks a lot.
ebookers discount code

Hello, I have browsed most of your posts. This post is probably where I got the most useful information for my research. Thanks for posting, maybe we can see more on this. Are you aware of any other websites on this subject.
https://www.rebelmouse.com/fastplantarfasciitiscurereview/

I can set up my new idea from this post. It gives in depth information. Thanks for this valuable information for all,..
wap sbobet