''BioSearch'' is creating an Al model used for predicting cancer occurrence via examining X-Ray images. The accuracy of the model in isolation has been found to be good. However, the users of the model started complaining of the poor quality of results, especially inability to detect real cancer cases, when put to practice in the diagnosis lab, leading to stopping of the usage of the model.
A testing expert was called in to find the deficiencies in the test planning which led to the above scenario.
Which ONE of the following options would you expect to MOST likely be the reason to be discovered by the test expert?
SELECT ONE OPTION
Answer : A
The question asks which deficiency is most likely to be discovered by the test expert given the scenario of poor real-world performance despite good isolated accuracy.
A lack of similarity between the training and testing data (A): This is a common issue in ML where the model performs well on training data but poorly on real-world data due to a lack of representativeness in the training data. This leads to poor generalization to new, unseen data.
The input data has not been tested for quality prior to use for testing (B): While data quality is important, this option is less likely to be the primary reason for the described issue compared to the representativeness of training data.
A lack of focus on choosing the right functional-performance metrics (C): Proper metrics are crucial, but the issue described seems more related to the data mismatch rather than metric selection.
A lack of focus on non-functional requirements testing (D): Non-functional requirements are important, but the scenario specifically mentions issues with detecting real cancer cases, pointing more towards data issues.
ISTQB CT-AI Syllabus Section 4.2 on Training, Validation, and Test Datasets emphasizes the importance of using representative datasets to ensure the model generalizes well to real-world data.
Sample Exam Questions document, Question #40 addresses issues related to data representativeness and model generalization.
A system was developed for screening the X-rays of patients for potential malignancy detection (skin cancer). A workflow system has been developed to screen multiple cancers by using several individually trained ML models chained together in the workflow.
Testing the pipeline could involve multiple kind of tests (I - III):
I . Pairwise testing of combinations
II . Testing each individual model for accuracy
III . A/B testing of different sequences of models
Which ONE of the following options contains the kinds of tests that would be MOST APPROPRIATE to include in the strategy for optimal detection?
SELECT ONE OPTION
Answer : B
The question asks which combination of tests would be most appropriate to include in the strategy for optimal detection in a workflow system using multiple ML models.
Pairwise testing of combinations (I): This method is useful for testing interactions between different components in the workflow to ensure they work well together, identifying potential issues in the integration.
Testing each individual model for accuracy (II): Ensuring that each model in the workflow performs accurately on its own is crucial before integrating them into a combined workflow.
A/B testing of different sequences of models (III): This involves comparing different sequences to determine which configuration yields the best results. While useful, it might not be as fundamental as pairwise and individual accuracy testing in the initial stages.
ISTQB CT-AI Syllabus Section 9.2 on Pairwise Testing and Section 9.3 on Testing ML Models emphasize the importance of testing interactions and individual model accuracy in complex ML workflows.
Which ONE of the following characteristics is the least likely to cause safety related issues for an Al system?
SELECT ONE OPTION
Answer : B
The question asks which characteristic is least likely to cause safety-related issues for an AI system. Let's evaluate each option:
Non-determinism (A): Non-deterministic systems can produce different outcomes even with the same inputs, which can lead to unpredictable behavior and potential safety issues.
Robustness (B): Robustness refers to the ability of the system to handle errors, anomalies, and unexpected inputs gracefully. A robust system is less likely to cause safety issues because it can maintain functionality under varied conditions.
High complexity (C): High complexity in AI systems can lead to difficulties in understanding, predicting, and managing the system's behavior, which can cause safety-related issues.
Self-learning (D): Self-learning systems adapt based on new data, which can lead to unexpected changes in behavior. If not properly monitored and controlled, this can result in safety issues.
ISTQB CT-AI Syllabus Section 2.8 on Safety and AI discusses various factors affecting the safety of AI systems, emphasizing the importance of robustness in maintaining safe operation.
Which ONE of the following tests is LEAST likely to be performed during the ML model testing phase?
SELECT ONE OPTION
Answer : C
The question asks which test is least likely to be performed during the ML model testing phase. Let's consider each option:
Testing the accuracy of the classification model (A): Accuracy testing is a fundamental part of the ML model testing phase. It ensures that the model correctly classifies the data as intended and meets the required performance metrics.
Testing the API of the service powered by the ML model (B): Testing the API is crucial, especially if the ML model is deployed as part of a service. This ensures that the service integrates well with other systems and that the API performs as expected.
Testing the speed of the training of the model (C): This is least likely to be part of the ML model testing phase. The speed of training is more relevant during the development phase when optimizing and tuning the model. During testing, the focus is more on the model's performance and behavior rather than how quickly it was trained.
Testing the speed of the prediction by the model (D): Testing the speed of prediction is important to ensure that the model meets performance requirements in a production environment, especially for real-time applications.
ISTQB CT-AI Syllabus Section 3.2 on ML Workflow and Section 5 on ML Functional Performance Metrics discuss the focus of testing during the model testing phase, which includes accuracy and prediction speed but not the training speed.
The activation value output for a neuron in a neural network is obtained by applying computation to the neuron.
Which ONE of the following options BEST describes the inputs used to compute the activation value?
SELECT ONE OPTION
Answer : A
In a neural network, the activation value of a neuron is determined by a combination of inputs from the previous layer, the weights of the connections, and the bias at the neuron level. Here's a detailed breakdown:
Inputs for Activation Value:
Activation Values of Neurons in the Previous Layer: These are the outputs from neurons in the preceding layer that serve as inputs to the current neuron.
Weights Assigned to the Connections: Each connection between neurons has an associated weight, which determines the strength and direction of the input signal.
Individual Bias at the Neuron Level: Each neuron has a bias value that adjusts the input sum, allowing the activation function to be shifted.
Calculation:
The activation value is computed by summing the weighted inputs from the previous layer and adding the bias.
Formula: z=(wiai)+bz = \sum (w_i \cdot a_i) + bz=(wiai)+b, where wiw_iwi are the weights, aia_iai are the activation values from the previous layer, and bbb is the bias.
The activation function (e.g., sigmoid, ReLU) is then applied to this sum to get the final activation value.
Why Option A is Correct:
Option A correctly identifies all components involved in computing the activation value: the individual bias, the activation values of the previous layer, and the weights of the connections.
Eliminating Other Options:
B . Activation values of neurons in the previous layer, and weights assigned to the connections between the neurons: This option misses the bias, which is crucial.
C . Individual bias at the neuron level, and weights assigned to the connections between the neurons: This option misses the activation values from the previous layer.
D . Individual bias at the neuron level, and activation values of neurons in the previous layer: This option misses the weights, which are essential.
ISTQB CT-AI Syllabus, Section 6.1, Neural Networks, discusses the components and functioning of neurons in a neural network.
'Neural Network Activation Functions' (ISTQB CT-AI Syllabus, Section 6.1.1).
Upon testing a model used to detect rotten tomatoes, the following data was observed by the test engineer, based on certain number of tomato images.
For this confusion matrix which combinations of values of accuracy, recall, and specificity respectively is CORRECT?
SELECT ONE OPTION
Answer : A
To calculate the accuracy, recall, and specificity from the confusion matrix provided, we use the following formulas:
Confusion Matrix:
Actually Rotten: 45 (True Positive), 8 (False Positive)
Actually Fresh: 5 (False Negative), 42 (True Negative)
Accuracy:
Accuracy is the proportion of true results (both true positives and true negatives) in the total population.
Formula: Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}Accuracy=TP+TN+FP+FNTP+TN
Calculation: Accuracy=45+4245+42+8+5=87100=0.87\text{Accuracy} = \frac{45 + 42}{45 + 42 + 8 + 5} = \frac{87}{100} = 0.87Accuracy=45+42+8+545+42=10087=0.87
Recall (Sensitivity):
Recall is the proportion of true positive results in the total actual positives.
Formula: Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}Recall=TP+FNTP
Calculation: Recall=4545+5=4550=0.9\text{Recall} = \frac{45}{45 + 5} = \frac{45}{50} = 0.9Recall=45+545=5045=0.9
Specificity:
Specificity is the proportion of true negative results in the total actual negatives.
Formula: Specificity=TNTN+FP\text{Specificity} = \frac{TN}{TN + FP}Specificity=TN+FPTN
Calculation: Specificity=4242+8=4250=0.84\text{Specificity} = \frac{42}{42 + 8} = \frac{42}{50} = 0.84Specificity=42+842=5042=0.84
Therefore, the correct combinations of accuracy, recall, and specificity are 0.87, 0.9, and 0.84 respectively.
ISTQB CT-AI Syllabus, Section 5.1, Confusion Matrix, provides detailed formulas and explanations for calculating various metrics including accuracy, recall, and specificity.
'ML Functional Performance Metrics' (ISTQB CT-AI Syllabus, Section 5).
Which ONE of the following options describes a scenario of A/B testing the LEAST?
SELECT ONE OPTION
Answer : C
A/B testing, also known as split testing, is a method used to compare two versions of a product or system to determine which one performs better. It is widely used in web development, marketing, and machine learning to optimize user experiences and model performance. Here's why option C is the least descriptive of an A/B testing scenario:
Understanding A/B Testing:
In A/B testing, two versions (A and B) of a system or feature are tested against each other. The objective is to measure which version performs better based on predefined metrics such as user engagement, conversion rates, or other performance indicators.
Application in Machine Learning:
In ML systems, A/B testing might involve comparing two different models, algorithms, or system configurations on the same set of data to observe which yields better results.
Why Option C is the Least Descriptive:
Option C describes comparing the performance of an ML system on two different input datasets. This scenario focuses on the input data variation rather than the comparison of system versions or features, which is the essence of A/B testing. A/B testing typically involves a controlled experiment with two versions being tested under the same conditions, not different datasets.
Clarifying the Other Options:
A . A comparison of two different websites for the same company to observe from a user acceptance perspective: This is a classic example of A/B testing where two versions of a website are compared.
B . A comparison of two different offers in a recommendation system to decide on the more effective offer for the same users: This is another example of A/B testing in a recommendation system.
D . A comparison of the performance of two different ML implementations on the same input data: This fits the A/B testing model where two implementations are compared under the same conditions.
ISTQB CT-AI Syllabus, Section 9.4, A/B Testing, explains the methodology and application of A/B testing in various contexts.
'Understanding A/B Testing' (ISTQB CT-AI Syllabus).