Test type・Functional testing of RAS

24/09/2020Test qualityy

Check the operation and quality of important RAS functions with dedicated tests

 There are many types of tests, but there are two tests for features that have become more important these days. RAS functional and security tests . Both are not functions that are directly related to the services that the software provides to end users, but are behind- the- scenes functions that correspond to non-functional requirements for the software to always provide stable services . Therefore, it may not be clearly written in the required specifications of the software, so care must be taken when designing the test. However, there are some parts of embedded systems that require some ingenuity in test design, so let’s introduce the actual test contents and points to note in test design in this article.

What is the function of RAS in the first place?

RAS function is an abbreviation that stands for Reliability , Availability , and Serviceability , but in Japanese it is reliability , availability , and maintainability . Reliability refers to the property that system outages and malfunctions are unlikely to occur due to failures, etc. In short , it is a performance that always operates stably . Availability is similar to reliability, but it is a property that refers to high system utilization and short outages due to failures or maintenance. Simply put, it is a performance that can be used at any time . And Serviceability (serviceability) is a property that indicates the recovery ease and speed, such as from a failure, heal immediately even if the failure it is that of performance. The RAS function test confirms the operation of these RAS functions.

The reliability of embedded systems ( Reliability )

In embedded system reliability ( Reliability test a) to, do I test at any point of view. Reliability is the property that system stoppages and malfunctions are unlikely to occur due to failures , so as a test, try various failures and operate without stopping or malfunctioning the system at that time. You can check the reliability by doing a test that confirms that it will continue . A similar test is the fault recovery test.

Failure recovery is a test that confirms that the failure recovery function installed in the system in advance is operating normally, so it can be considered as a quasi-normal system of reliability test . Then, the abnormal system of the reliability test is a test to confirm that the system does not stop or malfunction when an unexpected failure or error occurs.

The reliability test may be designed independently, but the content of the test to be performed in the reliability test is a quasi-normal system test or an abnormal system test. is often almost the same as that of the . For example, consider a reliability test in the event of a device hardware failure. Consider a failure situation in which the hardware deteriorates over time and an error response is sometimes returned to the command execution instruction from the software. It is assumed that the software is designed to try retries up to 3 times if an error is returned from the hardware.

In this case, a test in which a hardware failure occurs once and the software tries to recover by retrying and confirms that the function operates safely, and a test in which a hardware failure occurs three times in a row and the software The reliability test will be designed with two patterns: it is judged that the recovery has failed due to a retry and error processing is performed.

By the way, these two test contents actually confirm the operation of the software when the hardware fails and returns an error, so from the viewpoint of the function of controlling the hardware , a predetermined error It can also be seen as a quasi-normal test to confirm the processing for .

As in this example, the reliability test can be considered to be included in the quasi-normal system and abnormal system tests performed in the normal functional test , rather than considering it alone . Even if the content of the test and the criteria for pass / fail are the same, I think that there are two ways of looking at the test, one is to check the quasi-normal system of a certain function, and the other is to see it as a reliability test. Care must be taken to avoid unnecessary test duplication.

Embedded System Availability ( Availability )

Thinking about availability in embedded systems is a bit tricky. Availability is a property that indicates the rate at which the system can be used , and is sometimes expressed as the operating rate of the system . For example, suppose a system with specifications that cannot be used because maintenance work is required every year on December 31st out of 365 days a year. The availability of this system is 99.7% in terms of uptime (365 days-1 day) / 365 days. What is the availability test of this system ? It is a test to confirm that the performance can be used at any time, that is, the system can be stopped due to maintenance in one day according to the specifications . Therefore, if the maintenance work is actually carried out and it is completed in one day, the test is passed, but for example, the amount of data backed up in the maintenance work is so large that the backup cannot be completed and the maintenance work must be completed in one day. If so, the availability test fails.

In this way, if the system has clear availability as a specification , the availability can be tested according to the specification. However, embedded systems often assume that they will continue to operate 24 hours a day, 365 days a year . Then, even if I try to test to confirm that it keeps working 24 hours a day, 365 days a year, I can’t get a result whether it passes or is disqualified. In this way, it is difficult to think of availability testing in the true sense of the word in a built-in system that is supposed to operate 24 hours a day, 365 days a year.

In such a case, you can think about the availability test by changing the viewpoint a little and checking how long it takes to recover in the event that the system cannot be used . For example, the system hangs up reset of the system according to the watchdog function is effective recovery process by the restart Suppose you are carried out. If there is a system that takes 1 minute from reset to restart and 60 minutes, the system that restarts in 1 minute has less usable time, so availability is high. considered to be . With this in mind, if the watchdog’s system recovery time is stated in the specification, the test to see if the reboot finishes within that time is an availability test. Even if the recovery time is not stated in the specifications, it is possible to measure the availability by comparing whether the recovery time is longer or shorter than that of similar devices.

In this way, when considering availability testing in embedded systems, the test items can be seen by focusing on the time when the system becomes unavailable .