Evaluating automated white-box software testing tools
In our latest post, ‘Performance and Functionality Evaluation of White-box Software Testing Tools, Part 1’, we emphasized the importance of software safety through a number of software incidents, and then explained the definition of white box software testing for software verification and available tools. Software testing methods in order to discover errors or bugs that may occur in the software development process are categorized into black-box testing, white-box testing, and gray-box testing that combines the advantages. In this post, we conducted source-code level white-box testing, a software testing method based on the internal structure and the operation principle of programs. White-box testing tools are being utilized in wide variety of industry areas, such as testing software for automobiles, aircraft, and ships. In particular, with increasing use of electronic equipments in vehicles recently, vehicle manufacturing companies are focusing on securing software and hardware quality. With this trend, the International Organization for Standardization (ISO) has included standards for testing software components in ISO 26262, an international standard for functional safety of electronic devices in vehicles, established back in 2012.
Standardized testing for software components is becoming increasingly important in all industries. Given this trend, what would be the effective way to increase software quality, stability, and reliability? One possible answer to the question would be to conduct white-box software testing to verify that software is free of defects, or identify and fix any existing defects. In order to do so, we test the software on a suite of definite test cases. A test case is a set of executable code snippets or instructions created to check that the software performs as expected. In other words, it is a criterion for testing whether output values are derived normally for each input value extracted from the requirements specification.
Traditionally in software testing, the user needs to manually generate test cases based on test requirements. However, as the size and complexity of software system increases, such manual approach quickly becomes unscalable. Due to the need for a more efficient analysis and design of modern software system, active ongoing research efforts have been focusing on automating test case generation. Such efforts resulted in the development of a number of automated white-box software testing tools.
So far, we have introduced the background and need for automated white-box software testing and corresponding tools. However, a software developer who wishes to purchase and utilize such tool would have practical considerations in mind. For instance, would an automated testing tool that generates as many test cases as possible be more suitable? Or instead, would there be situations that tools focused on generating minimal test cases more preferable? There should be no definitive answer here. Since each tool has different standards and methodologies in generating test cases, users (customers) are inclined to choose testing tools only based on informal measures, such as familiarity or market share of the company. Therefore, in this post, we systematically conduct qualitative and quantitative evaluation on the performance and functionality of automated white-box software tools, in an attempt to provide measurable criterion for the selection.
Lists of automatic white-box software testing tools
As mentioned, in the traditional white-box testing method, users (humans) manually write test cases via specifying appropriate preconditions, execution methods, and input and output values, and test if the software functions accordingly. To ease this process, automated white-box testing tool automates the process of building and exploring the source code using the built-in compiler for the project (source code) without user intervention.
The following is a summary and key features of a number of automation tools that are capable of generating test cases for projects (source code) written in C/C++. All listed information is based on the product catalogue of the corresponding tool.
Product Name | Key Features |
Controller Tester | – Developed by Korean company SureSoft Technologies – The automated testing solution to perform unit and integration tests – Automatically generate test and test data, and it is possible to support simulation and real target environment tests required for testing – Allows automatic generation of stubs for functions (such as libraries) without definitions – Provides control flow graphs and checks coverage results in conjunction with source code – Possible to automatically insert fault injection code without modifying the source code |
Coyote | – Developed by Code Mind Corporation, a Korean company – Fully automatic white box testing tool that combines symbolic testing and machine learning technology – Symbolic testing technology ensures efficient testing without re-accessing covered branches or paths as much as possible – A Fully automated unit testing tool enables complete automation of unit testing from one-click creation to execution to results analysis without user intervention – Achieved over 80~90% code coverage with full automation |
Resort | – Developed by Soft4Soft, a Korean company – Static analysis enables source code analysis without a code compilation process – Top-of-the-line code verification of inter-procedure path analysis, coding standards, runtime errors, security vulnerabilities, and code quality checks possible – Automate path testing based on code requirements with dynamic analysis – Enables automatic extraction of run paths as test cases and automatically generates test data (test domains, input values) for each run path |
Cantata++ | – Developed by German company QA Systems, Hancom Nflux Co., Ltd. is in charge of domestic reseller – Automated unit and integration testing tools for C/C++ languages that combine testing for both unit and scalable integrated testing to provide an integrated testing framework – Support for functional testing and integrated code inspection results analysis – Reports can be generated in RTF (Text), XML, and HTML formats, along with ASCII and HTML text – Developed based on Eclipse IDE to provide familiar UI to Eclipse users – Visual presentation of test results for code coverage to the user – Automatically create a unit test collection that runs all required code paths |
VectorCAST/C++ | – Developed by Vector, a German company, and has a branch in Korea – Specialized in verifying embedded systems that prioritize software safety – Automatically configures the test environment for unit and integration tests, providing a very convenient test environment for users without the need to write test codes – Test support via GUI and script, code coverage, regression testing, code complexity calculation – Built-in compiler, so there is no need to build a separate compilation environment – Supports user-defined tests for requirements-based testing |
ParaSoft C/C++ | – Developed by ParaSoft, an American company, and handled by several companies, including VWAY in Korea – Automatically generate verification reports by automating software testing – Integrates into C/C++ IDEs, CI/CD pipelines, and containerized deployments to detect defects early and automatically enforce compliance with industry standards – Build an automated and scalable CI pipeline using Docker Hub images for seamless access to the latest automated code analysis – VS Code plug-in for GitLab enables results review within IDE |
LDRA | – Developed by British company LDRA and in Korea by Moasoft – Leading the software analysis and automated testing tools market for over 45 years – One of the leading software testing companies – Most commonly used in software testing in aerospace and defense, automotive, industrial and energy, healthcare, and rail transport, not only in Korea but also abroad – Provides test case management and automated execution for unit and integration testing – Provides requirement traceability and can automatically generate test cases that provide 50-80% of code coverage |
Figure 1 . International Standard ISO 25010 Evaluation Criteria (Source: splex)
What features and elements are essential for automated white-box testing tools?
On what criteria should the automated white-box testing tools be evaluated? In order to set up qualitative and quantitative grounds for testing, we referred to two standards: (i) “Commercial software evaluation criterion and score assignment” of “Guidelines on Technical Assessment on Software” (Article 49, Paragraph 2 of Software Promotion Act), and (ii) ISO 25010 (international standard for software evaluation). Based on the standards, we set up 34 separate evaluation criteria, where similar criteria are grouped to form three-level hierarchy classes: evaluation property, evaluation item, and individual criterion. In particular, according to the purpose of referencing the ISO international standard, the evaluation criteria’s extensive comprehensiveness, ambiguity, and unnecessariness were reduced, revised, and reflected to match the purpose of this study.
For instance, in Figure 1, which presents the evaluation criteria of ISO 25010, the international standard “Functional Suitability” was revised to “Functionality” as shown in Figure 2 (our evaluation criterion), and likewise, “Functional Maturity”, “Functional Accuracy”, and “Functional Validity” were reduced to “Accuracy” in our version. As there is no practical way to evaluate “Confidentiality” and “Non-Reputation” under “Security” in our study, we added them to “Functionality” property in order to evaluate minimal security. Properties “Usability” and “Reliability” are kept as-is, but “Maturity” and “Fault Tolerance” under “Reliability” are excluded from the evaluation criteria as they are less significant. Additionally, “Performance Efficiency”, “Portability”, and “Compatibility” are embedded into “Usability” and “Reliability”, and “Stability” was modified to “Operational Stability”, which is included in “Reliability”. The overall evaluation items and criteria are shown in Figure 2.
Selection of open source projects for evaluation
White box testing tools are widely used in fields where highly reliable and safe software is required, such as aerospace, military, automobiles, or safety-critical industries. Most of the software in this field is composed of C/C++ — due to this, venders of automated white-box testing tools mostly target C/C++ projects. In our study, we also selected C/C++ open-source software commonly used in various performance and functional evaluation, including libraries for system UI, encryption, signal processing, communication protocols, and more. Specifically, we referred to paper “CITRUS: Automated Unit Testing Tool” for real-world C++ programs. Additionally, we only filtered open-source software projects that can be natively built without aid of any external software.
Figure 2 . Evaluation Criteria
Project Name | Release Date | Language | Number of File** | LOC** | Size** | |
1 | nuklear | 2019.12.02 | C | 192 | 131828 | 10 MB |
2 | libsodium | 2019.5.31 | C | 690 | 51132 | 44.8 MB |
3 | mathc | 2019.5.31 | C | 4 | 5886 | 223 KB |
4 | aubio | 2022.1.26 | C | 347 | 17381 | 1.6 MB |
5 | s2n-tls | 2022.10.25 | C | 7817 | 92793 | 17.9 MB |
6 | qnite | 2022.04.14 | C++ | 138 | 2372 | 168 KB |
7 | QPULib | 2020.12.09 | C++ | 82 | 5611 | 978 KB |
8 | yaml-cpp | 2021.07.10 | C++ | 399 | 54922 | 4.9 MB |
9 | jsoncpp | 2021.08.12 | C++ | 250 | 8271 | 828 KB |
10 | json-voorhees | 2021.07.12 | C++ | 227 | 8421 | 3 MB |
** LOC : Lines of Code
Table 1. Projects for Evaluation
Selection of automated white-box testing tools and test results on open-source projects
In our study, we selected four among seven automated white-box software testing tools according to their product market share, information, and technical support, and named them A, B, C, and D. Selected tools were used to analyze open-source projects listed in Table 1. Tests were conducted after establishing a test environment directly through the manual provided with the product, but all of the selected white box automated testing tools were completely unable to proceed without user intervention. In particular, for products A, B, and D, only the ‘mathc’ project, number 3 in Table 1, was tested normally, but for product C, it was impossible to test not only the ‘mathc’ project but all 10 projects selected by us. Please refer to “Evaluation of Whitebox Automated Testing Tools” for specific reason why testing was impossible.
Except ‘mathc’ project, 9 open-source projects caused unknown errors including failure in compilation, and thus ended up failing to complete software testing. Considering that the failure might have resulted from inexperience of handling the tool, we have requested technical support from each company in order to resolve the problem. In the case of tool ‘B’, a simple configuration file was provided in order to aid white box testing, and it was sufficient to easily complete automated testing all ten projects within a few days. On the other hand, for the other three (A, C, D) tools, it took close to two months in fully resolving the issue, even with the technical helps in configuring test environment, compilers, handling series of processes, and more. We had to invest much time and effort in resolving this issue, and the procedure for getting technical help was too complicated that a separate phonecalls or remove meetings were required for each project. Multiple human (expert) interventions were necessary, to the extent that we were doubtful whether this process could be called fully automated. We note that due to these difficulties, we have reconsidered the meaning of ‘automated software testing’ different from when we started this study. The results are shown in Table 2. Of the four selected tools, tool ‘B’ has satisfied more than 90% of our criterion for evaluating ‘automated testing’, which shows the excellence of the tool’s automation function.
Open-Source Project | Coverage | Number of Test cases | Build time | Test time | ||||||||
Project Name | Release | Language | File | LOC | Analyzed file | Number of functions | Line | Branches | ||||
1 | nuklear | 2019.12.02 | C | 192 | 131828 | 67 | 609 | 87.21% | 79.88% | 5294 | 00:03:15 | 02:04:02 |
2 | libsodium | 2019.5.31 | C | 690 | 51132 | 195 | 887 | 94.94% | 84.93% | 3223 | 00:04:48 | 00:09:15 |
3 | mathc | 2019.5.31 | C | 4 | 5886 | 1 | 843 | 99.43% | 100% | 1479 | 00:00:24 | 00:10:21 |
4 | aubio | 2022.1.26 | C | 347 | 17381 | 139 | 520 | 93.98% | 89.73% | 4906 | 00:02:45 | 02:29:15 |
5 | s2n-tls | 2022.10.25 | C | 7817 | 92793 | 688 | 1621 | 86.44% | 80.53% | 19464 | 00:19:39 | 09:12:36 |
6 | qnite | 2022.04.14 | C++ | 138 | 2372 | 95 | 645 | 95.64% | 89.0% | 3471 | 00:16:42 | 02:38:28 |
7 | QPULib | 2020.12.09 | C++ | 82 | 5611 | 40 | 278 | 86.66% | 81.97% | 3801 | 00:01:25 | 00:34:57 |
8 | yaml-cpp | 2021.07.10 | C++ | 399 | 54922 | 155 | 367 | 95.52% | 93.85% | 3985 | 00:05:11 | 02:15:39 |
9 | jsoncpp | 2021.08.12 | C++ | 250 | 8271 | 14 | 309 | 91.21% | 87.2% | 5645 | 00:00:49 | 02:50:49 |
10 | json-voorhees | 2021.07.12 | C++ | 227 | 8421 | 61 | 451 | 90.39% | 84.56% | 3246 | 00:03:40 | 00:40:14 |
Except tool ‘C’, all tools were able to successfully complete tests on ‘mathc’ project. Hence, we have decide to evaluate performance and functionality of automated tools using ‘mathc’. Although we acknowledge that evaluating on ‘mathc’ alone is insufficient, we believe the results can provide initial results in comparing different automated testing tools.
Evaluating automated white-box testing tools
For each qualitative criterion, we assign one of three scores depending on whether or not the functionality is supported: ‘O: Supported, provision given / △: Supported, but insufficient or lack of information / X: Not supported, no provision given’. For each quantitative criterion, if the quantity is not available or disclosed by the software, we simply state ‘Not Supported’ instead of ‘X’ (which belongs to qualitative score) to prevent misunderstandings.
Property 1. Functionality
Functionality is a property on the most essential (functional) aspect of the automated white box testing tool, where we check information on analyzed (detected) codes, number of generated test cases, code coverage information, and more. As described above, evaluation items should be evaluated based on 10 open source projects, but due to the reason detailed above, some of the most essential items among white box automation testing evaluation items were evaluated and compared only based on the ‘mathc’ project. However, in the case of product ‘C’, testing on ‘mathc’ resulted in failure. Due to this, we note that some tests are excluded as manually writing test codes (test cases) takes much time and defeats our purpose of assessing automated white-box testing.
- Measurement of build time: The build time of the ‘mathc’ project, the source code file size is 159kb, and the total number of code lines is 5586 Lines. Tool ‘B’ resulted in the fastest build time of 24 seconds, and tool ‘A’ took 2 minutes and 27 seconds to build, resulting in a difference of about 6 times or more.
- Test completion time: We measured the time taken for the white box testing tool to completely finish the test. The tool that finished the test the fastest was 2 minutes and 50 seconds with tool ‘D’, and tool ‘A’ took the longest at 25 minutes. However, the fast and long test times depend on the generated test cases in the 13th indicator, and a large number of test cases is not a good thing. See indicator 13 for details.
- Pre-build file: It means checking in advance whether the test proceeds normally through the pre-build before the test. The actual source code (file) to be built was detected (analyzed) as ‘1 file’ for the ‘B’ tool, but by default, it is counted as ‘2 files’ because it detects (analyzed) including header files. The other two tools could not be verified because they could not be tested or did not show relevant information.
- The number of detected files: The number of source code (files) analyzed (detected) after completion of the test. Only the ‘B’ tool was detected (analyzed) as ‘1 file’, but as in the previous evaluation criteria, it was counted except for the header file, and the remaining three tools were evaluated as ‘unsupported’ because they were unable to test or did not show related information.
- The number of detected functions: Number of functions analyzed (detected) after completion of the test. For ‘B’ and ‘D’ products, the detected function matched 100% of the actual number of functions, but for ‘A’ and ‘C’ tools, it was not supported or testable.
- The actual number of build files: This is a build phase for the actual testing, and errors that occur in the optimization and pre-build phases may result in different results from the number of pre-build files. For the actual source code (file) to be built, the ‘B’ tool was detected (analyzed) as ‘1 file’, but the ‘B’ tool showed the number excluding header files, and the ‘D’ tool was detected (analyzed) including header files. The result was evaluated as ‘2 files’. The other two tools could not be tested or could not be verified because they did not show relevant information.
- The number of detected code lines: the actual number of code lines executed by the white box automation testing tool, which detects the number of optimized testable lines by removing executable and lines with specific character ‘{ }’ alone. The ‘A’ tool detected 2887 lines, but the ‘B’ tool analyzed 4500 lines, approximately 1700 more than ‘A’, showing the best results. The ‘C’ and ‘D’ tools were not supported or could not be tested.
- The number of detected branches: The number of branch (conditional statements) items in the code by the white box automation testing tool. The evaluation compared the branch number (138 branches) with only If or else statements. Based on the tool and the detection criteria, we counted 1045 branches for For, Call, Return, and Switch, and the ‘A’ tool detected 190 branches closest to the criteria (138 branches), and the ‘D’ tool detected 978 branches for the ‘D’ tool. The correct answer is not determined because the assessment item’s value depends on the tool’s character.
- Line coverage value: ‘%’ value of how many lines were tested during the test based on evaluation indicator 7. For example, the ‘A’ tool detected 2877 lines in indicator 7, and the coverage value showed 98% results. The meaning is that the ‘A’ tool has tested 2819 lines from 2877 lines at least once. In the case of the ‘A’ tool and the ‘D’ tool, the ‘%’ values were the same, but it was not possible to determine how many lines the ‘D’ tool was found, showing ambiguous results, and in the case of ‘B’ tool, 4192 lines were 100% covered and tested, showing outstanding results.
- Branch Coverage Value: ‘%’ value of how many branches (conditional statements) were tested during the test based on evaluation indicator 8. For example, the ‘B’ tool detected 190 branches in indicator 8, and the coverage value showed 100% results. The ‘B’ and ‘D’ have the same coverage value, and the ‘A’ tool has 95% branch coverage, which is the lowest rating. (Except for ‘C’ that cannot be evaluated)
- Support for Line Coverage by Code (File): Assessment indicator of whether or not the tool has the ability to show line coverage by code (file) rather than full line coverage as a result (report). Except for the ‘C’ tool, all other tools were supported by code (file).
- Support for per-code (file) branch coverage: Assessment indicator of whether or not the result of the tool (report) has the ability to show branch coverage by code (file) rather than by total branch coverage. As in Evaluation indicator 11, all other tools were supported by code (file) except for the ‘C’ tool.
- The total number of test cases created: The result of the number of tests created and tested by the white box automation testing tool. The ‘A’ tool produced the most 2461 test cases, and the ‘C’ tool produced the least 978 test cases. The ‘B’ tool produced the second-largest number of 1479 test cases. Different tools have different algorithms and methods, so the number of test cases cannot be evaluated as good or bad.
– We note that the average code coverages for testing ‘mathc’ project was higher than other cases, as seen in Table 2. Specifically, tool ‘A’ and ‘D’ had high code coverage, where we suspect that because ‘mathc’ project consists of functions that return simple arithmetic operations without dependencies between source codes, many lines of codes are easily analyzed.
– For this reason, in the case of a project with many dependencies between source codes, it is considered impossible to test because it is difficult for the white box automation testing tool to configure the built environment.
– The most important thing in a white box automated testing tool is to achieve high coverage with fewer test cases. In other words, efficient testing is performed when the test cases are small and the coverage is high.
– A large number of test cases increases the likelihood of having many duplicate test cases (repeated numbers) that do not contribute to coverage.
Evaluation Items | Evaluation Criterion | A | B | C | D |
Accuracy | Build time for project (Code File size: 159 kb, Code line: 5586) | 02:27 | 24s | Unable to test | 52s |
Time for white box testing tools to complete tests during project analysis | 25:04 | 10:21 | Unable to test | 02:50 | |
How many pre-built files appeared on the report or completion screen when analyzing the project (1 Header, 1 Source Code)? ※ Basically, it is good to have the same number as the actual source code files, and depending on the analysis tool, header files may be excluded from the number of builds. | Unsupported | 1 | Unable to test | 2 | |
How many detected files (source code) appeared on the report or completion screen when analyzing a project (1 Header, 1 Source Code)? | Unsupported | 1 | Unable to test | Unsupported | |
How many functions were detected in this project with 843 functions? | Unsupported | 843 | Unable to test | 843 | |
How many actually built files appeared on the report or completion screen when analyzing the project (1 Header, 1 Source Code)? ※ Basically, it is good to have the same number of actual source code files, and depending on the analysis tool, header files may be excluded from the number of builds. | Unsupported | 1 | Unable to test | 2 | |
How many lines of code were detected in the report or completion screen when analyzing a project with 5586 Code Lines excluding spaces and comments? | 2877 | 4192 | Unable to test | Unsupported | |
Conditional statements based on source code (excluding For, Call, Return, switch) Number of analyzed branches in a project with 138 branches? ※ Depending on the analysis tool, the branch value may differ from the actual number of branches considering all conditions such as For, Call, Return, switch, etc. | 1045 | 190 | Unable to test | 978 | |
What is the value of the tested line coverage? ※ ‘%’ value of how many lines were tested during the test based on the results of the evaluation indicator 7 Example) ‘A’ tool detects 2877 lines, and tests 98% of 2819 lines | 98% | 99.43% | Unable to test | 98% | |
Branch coverage value that has been tested? ※ Based on the results of the evaluation indicator 8, it is a ‘%’ value of how many branches have been tested during the test. Example) Tool B detects 190 branches and tests 190 branches that are 100% | 95% | 100% | Unable to test | 100% | |
Possible to support or verify line coverage per code (file)? | O | O | X | O | |
Possible to support or verify branch coverage per code (file)? | O | O | X | O | |
Total number of test cases generated in reports and results screens after test completion? | 2461 | 1479 | Unable to test | 978 | |
Security | Is there a function that is accessible only to authorized users in the test program or system? | X | O | X | O |
Is it possible to analyze the source code without external leakage? | O | O | △ | O |
Property 2. Usability
Usability is an overall assessment of user-centric learning and usage. In case of criterion no. 17, ‘B’ and ‘C’ products were well-written according to the order of use, making it easy to understand the flow. However, in the case of tools ‘A’ and ‘D,’ the provided manual had poor quality, so that there were difficulties in users to set up test environment themselves. Criterion no. 18 mainly focuses on the primary function, and it was difficult to set it up because it did not provide a detailed description of the function. In case of product ‘B’, the tool provided explanations for all the functionalities in detail, which elements are required for the test and how changing it would affect the performance. In particular, tooltips were provided for each feature, making it easy to use and understand detailed features. In criterion no. 22, product ‘B’ was very convenient to use compared to other tools as it was able to be tested without installing an external compiler. While the other three also had internal compilers, they often didn’t work properly or had to install separate compilers for each project. In criterion no. 25, both tools ‘B’ and ‘D’ can only select specific modules and required functions and generate the desired information as a report, but they were evaluated as ‘△’ because they have limited areas to choose from.
Evaluation Items | Evaluation Criterion | A | B | C | D |
User learning accesibility | In what languages are help and manuals available? | English | English, Korean | English, Korean | English |
Are the help and manuals produced in the same order as the order of use? | △ | O | O | △ | |
Do the help and manuals provide descriptions of the detailed features of the test tool? | △ | O | △ | △ | |
Input data understanding | How many formats and methods are supported for input data? | 2 formats (File & directory) | 2 formats (Local or remote directory) | 3 formats (Single & multi file, and project file) | 2 formats (File & directory) |
Understanding progress accessibility | Does it provide UI/UX so that you can easily understand the progress of tests? | O | O | X | O |
Suitability for installation environment | What operating systems can be supported? | Windows, Linux | Windows, Linux, Mac | Windows, Linux, Mac | Windows, Linux |
Does the user not need to install the right compiler for the build environment? | △ | O | △ | △ | |
Uninstallation accessibility | Is the product installed and removed normally? | O | O | O | O |
Possibility of generating a report | In what formats can the report be generated? | html, text | html, csv, excel | html | xml, html, text |
Can the user select a report configuration when generating a report? | X | △ | X | △ |
Property 3. Reliability
Evaluating reliability indicators are the evaluation of the white box testing tool’s self-stability and resilience when a problem occurs. For criterion no. 26, only ‘A’ tool could be tested reliably. The remaining three tools were not tested during the test or were abnormally terminated with an error message, and there were cases in which the tool had to be forcibly terminated due to a freezing phenomenon during the test, so it was evaluated as ‘X’. In the case of criterion no. 27, the ‘A’ tool was evaluated as ‘O’ because the failure did not occur through stable operation, and the ‘B’ tool was evaluated as ‘O’ because it skipped the parts already performed and retested from the point of failure. The other two tools were rated as ‘X’ because they had to start a new test instead of starting when they stopped during the test.
Evaluation Items | Evaluation Criterion | A | B | C | D |
Operational Stability | Were there any errors during the continuous testing? (3 times tested) | O | X | X | X |
Recoverability | If a failure occurs during the test, is the test conducted from the point of failure? | O | O | X | X |
Property 4. Maintainability
The maintainability is a property that identifies problems through technical support through suppliers or maintenance companies and is evaluated in terms of managing the target (project) to be tested from the user’s point of view. In the case of criterion no. 28, when an error occurred while using the system, the three tools that received a ‘△’ rating did not provide information on how to take action or measures when an error message or an error of unknown cause occurred, or related documents did not exist. In the case of the ‘A’ tool, a solution was provided for some simple errors, such as providing a solution if *Stub did not exist during the test. In addition, in criterion no. 32 and criterion no. 33, ‘B’ tools can be separated by organization or team by issuing separate accounts, and ‘A’ and ‘D’ tools can be separated by organization or team by installing separate extensions.
Evaluation Items | Evaluation Criterion | A | B | C | D |
Problem Diagnosis and Support | If an error occurs, does it provide an error code and solution? | O | △ | △ | △ |
Does it support Q&A or FAQ? | O | O | X | X | |
Can the problem be quickly resolved through a maintenance company in Korea? | O | O | O | O | |
Are there periodic product updates and feature additions? | O | O | X | O | |
Organization and project management | Is it possible to separate workspace by organization or team? | O | O | O | O |
Is it possible to set permissions and policies per organization or team? | O | X | X | X | |
Project backup and recovery accessibility | Is it possible to back up working project settings and restore them if needed? | O | O | O | O |
Conclusion
In this post, we systematically set up evaluation criteria for comparing performance and functionality of different automated white box automated white-box software testing tools. We have qualitatively and quantitatively tested four automated white-box testing tools on the 34 different evaluation criteria we developed, using 10 selected open-source projects. Summary on the evaluation results is shown in Table 3. For each evaluation property, product that have showed the best result is marked in blue.
A | B | C | D | ||
Functionality | O | 3 | 4 | 0 | 4 |
△ | 0 | 0 | 3 | 0 | |
X | 1 | 0 | 1 | 0 | |
Usability | O | 2 | 5 | 2 | 2 |
△ | 3 | 1 | 2 | 4 | |
X | 1 | 0 | 2 | 0 | |
Reliability | O | 2 | 1 | 0 | 0 |
△ | 0 | 0 | 0 | 0 | |
X | 0 | 1 | 2 | 2 | |
maintainability | O | 7 | 5 | 2 | 3 |
△ | 0 | 1 | 1 | 1 | |
X | 0 | 1 | 4 | 3 |
A | B | C | D | |
O | 13 | 15 | 5 | 9 |
△ | 3 | 2 | 6 | 5 |
X | 3 | 2 | 8 | 5 |
Note that in comprehensive evaluation (shown in Table 4), more ‘O’ marks does not necessarily mean that it is superior over other tools. In most practical use cases, quantitative aspects such as faster build time, broader code coverage, or faster generation of test cases will be more important in evaluating testing tools. Also note that unlike other tools, tool ‘B’ was able to complete testing complex projects (i.e., nine other projects except ‘mathc’) without much difficulties in configuring compilation environment. In the former posting, it was very interesting to define the evaluation indicators for white box automated testing tools, the importance of white box automated testing tools, and the automatic creation and testing of test cases without user intervention. We envision that if one properly acknowledges the strengths and weaknesses of different automated software testing tools, and utilizes them according to the purpose, one can not only bring improvements to the safety of system, but also to the overall quality of software development process.
3 명이 이 글에 공감합니다.