Test example・Rollover test of timers and counters

10/09/2021Test qualityy

There are also timer and counter rollover bugs that threaten the stable operation of software.

It is an important requirement for embedded software to continue to operate stably . In the previous article, we introduced a dynamic memory and dynamic resource leak test as a test to find a time bomb bug that hinders stable operation before release , but resource leaks are not the only time bomb bugs. Rollover bugs hidden in the processing of timers and counters used inside the software also run the risk of becoming time bomb bugs that cause malfunctions long after the device is in operation . In this article, I’ll introduce you to rollover bugs and tests to find them in advance.

What is a timer or counter rollover?

To understand the timer and counter rollover bugs, let’s start with a brief description of rollover . Timers and counters used in software internal processing are usually realized by holding the current value in a variable declared as an unsigned integer type . When the processing system is 32 bits , a variable declared with an unsigned integer type is assigned a 32-bit long unsigned binary number, so the value that can be expressed is 0000 0000 0000 0000 0000 0000 0000 when expressed in binary. From 0000 to 1111 1111 1111 1111 1111 1111 1111 1111 . It’s a little hard to see, so if you rewrite it in hexadecimal notation, it will be from 00000000 to FFFFFFFF . It’s still hard to see, so if you rewrite it in decimal, it’s from 0 to 42949672960 .

Suppose you want to use a variable declared as an unsigned integer that can represent a value between 0 and 42949672960 as an increment counter . Since it is an inkment counter, if the initial value is 0, the counter value will increase from there to 1, 2, 3, 4. What happens if the counter value increases steadily to reach the maximum value that can be expressed, 42949672960, and then increases by 1 further ? For variables declared in the type of 32-bit long unsigned integers, 42949672959 –> 42949672960 –> 0 –> 1, and so on, 42949672960 is followed by 0 . It’s strange to think in decimal, but if you think in the original 32bit binary, it’s easy to see that adding 1 to 1111 1111 1111 1111 1111 1111 1111 1111 gives 1 0000 0000 0000 0000 0000 0000 0000 0000. Carry it is happening, but the bit is not to maintain the only 32bit worth the value 1 of the carry in is gone it will occur state becomes zero all the remaining 32 bit. This is the rollover of the 32-bit length counter . In decimal terms, it is 42949672960 + 1 = 0.

In the case of increment counters, 0 is the rollover after 42949672960, but in the case of decrement counters and subtraction timers , 2 –> 1 –> 0 –> 42949672960 –> 42949672959 and 0 is decremented by 1 to -1. The   rollover is 42949672960 instead of.

What bugs are hidden in the rollover process?

Now that we know the timer and counter rollover process , where is the bug ? In fact, there are no bugs in the rollover process itself. If the rollover countermeasures are not properly implemented in the processing using timers and counters, strange things will happen after the rollover occurs.

For example, suppose you are using a counter called count-1 as the increment counter . In process-A, let’s execute process-B when the value of count-1 increases by 5 or more. method of determining that the value of the count-1 is increased by 5 or more as, (the current value of count-1) – Get (previous count-1 value) the process B When it to have been reached 5 or more Suppose you have implemented software that executes it. For example, if the current value of count-1 is 1005 and the previous value is 1003, then process B does not have to be executed yet, and if the current value is 1009 and the previous value is 1003, process B is executed correctly.

So what happens when the value of count-1 is increasing and rolling over ? Suppose the current coutn-1 value is 6 and the previous count-1 value is 42949672959  , assuming a short time after all-over . The calculation of (current count-1 value) – (previous count-1 value) is 6 – 42949672959 = – 42949672953 . When subtraction or comparison operation is executed with a signed integer (usually the default type), process B is not executed because this value is 5 or less . By the time it goes from 42949672959 to 6, the value of the counter is incremented by 8, so processing B should be executed originally , but there is a problem that processing B is not executed. This is a bug hidden in the rollover process.

As one of the coding measures, when performing the operation (subtraction) of a timer or counter of a 32-bit long unsigned integer , assign the value to a 64-bit long signed integer (in the C language system, it is a long integer). There is a method of calculating after that . There are various other countermeasures, but if such rollover countermeasures are implemented, the problem can be prevented.

Why Rollover Bugs Become Time Bombs

The process of making some judgments using timers and counters will cause problems if measures are not taken when the timers and counters roll over. Why is it a time bomb bug ? This has to do with the initial values ​​of timers and counters . When using an increment counter, you usually set the minimum value, or 0, as the initial value . Then the counter must be incremented 42949672960 times before the rollover occurs. This usually doesn’t happen during internal testing . This makes it difficult to find potential bugs in the rollup process in pre-release in-house testing .

However, once software starts operating in the market, embedded software often continues to operate 24 hours a day, 365 days a year, so the value of the increment counter whose initial value is 0 will change after a considerable amount of time has passed since the power was turned on. It will be the moment to roll over . As a result, some problems will occur in the software functions after a long period of operation in the market. The most troublesome part is that if there is a rollover bug, all the devices will have problems due to the bug . This is the scary part of the time bomb bug .

How do you test for rollover bugs, what’s difficult?

Testing rollover bugs is actually easy . All you have to do is set the timer and counter values ​​to the values just before the rollover and run the software. You can change the value of the timer or counter with the debugger to the value just before the rollover , or you can modify the software a little and set the initial value to the value just before the rollover . In short, the rollover test creates a state in which the values ​​of timers and counters roll over, and confirms that the software is operating normally even in that state.

The test itself is not difficult, but timers and counters are used in many places in the software. The difficulty of rollover testing is how to maintain the completeness of the test by identifying those timers and counters and performing rollover tests on all timers and counters.

In addition, in recent software, in order to prevent the potential of rollover bugs, there are many cases where implementation measures are taken to set the initial values ​​of timers and counters to the values ​​immediately before rollover.

What bugs are hidden in the rollover process?

Now that we know the timer and counter rollover process , where is the bug ? In fact, there are no bugs in the rollover process itself. If the rollover countermeasures are not properly implemented in the processing using timers and counters, strange things will happen after the rollover occurs.

For example, suppose you are using a counter called count-1 as the increment counter . In process-A, let’s execute process-B when the value of count-1 increases by 5 or more. method of determining that the value of the count-1 is increased by 5 or more as, (the current value of count-1) – Get (previous count-1 value) the process B When it to have been reached 5 or more Suppose you have implemented software that executes it. For example, if the current value of count-1 is 1005 and the previous value is 1003, then process B does not have to be executed yet, and if the current value is 1009 and the previous value is 1003, process B is executed correctly.

So what happens when the value of count-1 is increasing and rolling over ? Suppose the current coutn-1 value is 6 and the previous count-1 value is 42949672959  , assuming a short time after all-over . The calculation of (current count-1 value) – (previous count-1 value) is 6 – 42949672959 = – 42949672953 . When subtraction or comparison operation is executed with a signed integer (usually the default type), process B is not executed because this value is 5 or less . By the time it goes from 42949672959 to 6, the value of the counter is incremented by 8, so processing B should be executed originally , but there is a problem that processing B is not executed. This is a bug hidden in the rollover process.

As one of the coding measures, when performing the operation (subtraction) of a timer or counter of a 32-bit long unsigned integer , assign the value to a 64-bit long signed integer (in the C language system, it is a long integer). There is a method of calculating after that . There are various other countermeasures, but if such rollover countermeasures are implemented, the problem can be prevented.

Why Rollover Bugs Become Time Bombs

The process of making some judgments using timers and counters will cause problems if measures are not taken when the timers and counters roll over. Why is it a time bomb bug ? This has to do with the initial values ​​of timers and counters . When using an increment counter, you usually set the minimum value, or 0, as the initial value . Then the counter must be incremented 42949672960 times before the rollover occurs. This usually doesn’t happen during internal testing . This makes it difficult to find potential bugs in the rollup process in pre-release in-house testing .

However, once software starts operating in the market, embedded software often continues to operate 24 hours a day, 365 days a year, so the value of the increment counter whose initial value is 0 will change after a considerable amount of time has passed since the power was turned on. It will be the moment to roll over . As a result, some problems will occur in the software functions after a long period of operation in the market. The most troublesome part is that if there is a rollover bug, all the devices will have problems due to the bug . This is the scary part of the time bomb bug .

How do you test for rollover bugs, what’s difficult?

Testing rollover bugs is actually easy . All you have to do is set the timer and counter values ​​to the values just before the rollover and run the software. You can change the value of the timer or counter with the debugger to the value just before the rollover , or you can modify the software a little and set the initial value to the value just before the rollover . In short, the rollover test creates a state in which the values ​​of timers and counters roll over, and confirms that the software is operating normally even in that state.

The test itself is not difficult, but timers and counters are used in many places in the software. The difficulty of rollover testing is how to maintain the completeness of the test by identifying those timers and counters and performing rollover tests on all timers and counters.

In addition, in recent software, in order to prevent the potential of rollover bugs, there are many cases where implementation measures are taken to set the initial values ​​of timers and counters to the values ​​immediately before rollover.

Rollover bugs famous for 497 days and 49 days

I think many people have heard the terms 497 days problem and 49 days problem . This 497-day problem or 49-day problem is a rollover bug hidden in the timer used in the basic part of the OS (called the kernel) . The basic part of the OS (kernel) has a function to record the elapsed time since it was started, and for that purpose, it uses a 32-bit long unsigned type-declared counter variable . This counter variable counts up with each timer interrupt from the device hardware . In many devices, the timer interrupt from the hardware is designed in 10msec or 1msec .

If the timer interrupt was 10 msec, the time it takes for the counter variable to roll up is 10 msec X 42949672960 times = 429496729600 msec = 715827 minutes = 11930 hours = 497 days . In other words, the counter variable used in the basic part of the OS rolls over 497 days after booting, so if there is a potential rollover bug, the software will malfunction at that point, which is the problem for 497 days. .. If the timer interrupt is 1ms, the rollover will occur 10 times faster, and the problem will occur 49 days after startup, so this is called the 49-day problem .

Next to the time bomb bug is a multi-functional concurrency test

We have introduced tests to identify dynamic resource leaks and timer / counter rollovers, which are time bomb bugs that hinder the stable operation of embedded software, through in-house tests. There are various other bugs that hinder stable operation, and there are also various tests to identify them. Next, let’s introduce the simultaneous operation test of multiple functions.