Free Amazon MLA-C01 Übungsprüfungen - Seite 5 von 6

Question #41

You are a data scientist working on a regression model to predict housing prices in a large metropolitan area. The dataset contains many features, including location, square footage, number of bedrooms, and amenities. After initial testing, you notice that some features have very high variance, leading to overfitting. To address this, you are considering applying regularization to your model. You need to choose between L1 (Lasso) and L2 (Ridge) regularization.

Given the goal of reducing overfitting while also simplifying the model by eliminating less important features, which regularization method should you choose and why?

A . L2 regularization, because it evenly reduces all feature coefficients, leading to a more stable model
B . L2 regularization, because it eliminates less important features by setting their coefficients to zero, simplifying the model
C . L1 regularization, because it can shrink some feature coefficients to zero, effectively performing feature selection
D . L1 regularization, because it penalizes large coefficients more heavily, making the model less
sensitive to high-variance features

Lösung einblenden Lösung ausblenden

Richtige Antwort: C
C

Explanation:

Correct option:

L1 regularization, because it can shrink some feature coefficients to zero, effectively performing feature selection

Regularization helps prevent linear models from overfitting training data examples by penalizing extreme weight values. L1 regularization reduces the number of features used in the model by pushing the weight of features that would otherwise have very small weights to zero. L1 regularization produces sparse models and reduces the amount of noise in the model. L2 regularization results in smaller overall weight values, which stabilizes the weights when there is high correlation between the features.

L1 regularization (Lasso) applies a penalty proportional to the absolute value of the coefficients. This characteristic allows it to shrink some coefficients to exactly zero, effectively removing less important features and performing feature selection. This makes L1 regularization particularly useful when you suspect that only a subset of features are truly important.

Incorrect options:

L2 regularization, because it evenly reduces all feature coefficients, leading to a more stable model – L2 regularization (Ridge) penalizes the sum of the squared coefficients, which tends to reduce the coefficients evenly but does not shrink any to zero. While this can lead to a more stable model by controlling high variance, it does not simplify the model by removing features.

L1 regularization, because it penalizes large coefficients more heavily, making the model less sensitive to high-variance features – While L1 regularization does penalize large coefficients, the key advantage here is its ability to perform feature selection by setting some coefficients to zero, not just by reducing high variance.

L2 regularization, because it eliminates less important features by setting their coefficients to zero, simplifying the model – This statement incorrectly attributes L2 regularization with the ability to eliminate features by setting coefficients to zero, which is actually a characteristic of L1 regularization.

References:

https://docs.aws.amazon.com/machine-learning/latest/dg/training-parameters.html

https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html

Question #42

You are responsible for deploying a machine learning model on AWS SageMaker for a real-time prediction application. The application requires low latency and high throughput. During deployment, you notice that the model’s response time is slower than expected, and the throughput is not meeting the required levels. You have already optimized the model itself, so the next step is to optimize the deployment environment. You are currently using a single instance of the ml.m5.large instance type with the default endpoint configuration.

Which of the following changes is MOST LIKELY to improve the model’s response time and throughput?

A . Change the instance type to ml.p2.xlarge and add multi-model support
B . Enable Auto Scaling with a target metric for the instance utilization
C . Switch to an ml.m5.2xlarge instance type and use multi-AZ deployment
D . Increase the instance count to two and enable asynchronous inference

Lösung einblenden Lösung ausblenden

Richtige Antwort: B
B

Explanation:

Correct option:

Enable Auto Scaling with a target metric for the instance utilization

Amazon SageMaker supports automatic scaling (auto scaling) for your hosted models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in your workload. When the workload increases, auto scaling brings more instances online. When the workload decreases, auto scaling removes unnecessary instances so that you don’t pay for provisioned instances that you aren’t using.

Enabling Auto Scaling allows the endpoint to dynamically adjust the number of instances based on actual traffic. By targeting instance utilization, the deployment can automatically scale out during peak times and scale in during low demand, improving both response time and throughput without over-provisioning. With target tracking, you choose an Amazon CloudWatch metric and target value. Auto scaling creates and manages the CloudWatch alarms for the scaling policy and calculates the scaling adjustment based on the metric and the target value. The policy adds and removes the number of instances as required to keep the metric at, or close to, the specified target value. For example, a scaling policy that uses the predefined InvocationsPerInstance metric with a target value of 70 can keep InvocationsPerInstance at, or close to 70.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

Incorrect options:

Change the instance type to ml.p2.xlarge and add multi-model support – While changing to an ml.p2.xlarge instance type, which is optimized for GPU, could improve performance for compute-intensive models, it may not be necessary for all types of models, especially if the model is CPU-bound. Adding multi-model support may further complicate the deployment without addressing the core issue of latency and throughput. Multi-model endpoints provide a scalable and cost-effective solution to deploying large numbers of models. They use the same fleet of resources and a shared serving container to host all of your models. This reduces hosting costs by improving endpoint utilization compared with using single-model endpoints. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on the traffic patterns to your endpoint.

The following diagram shows how multi-model endpoints work compared to single-model endpoints.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Increase the instance count to two and enable asynchronous inference – Asynchronous inference is typically used when latency is less of a concern, which contradicts the requirements of real-time prediction. Increasing the instance count without addressing scalability could help throughput but may not effectively reduce latency.

Switch to an ml.m5.2xlarge instance type and use multi-AZ deployment – Switching to a more powerful ml.m5.2xlarge instance type and using multi-AZ deployment could improve performance, but this option mainly adds redundancy and fault tolerance rather than optimizing response time and throughput directly.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html

Question #43

Your data science team is working on developing a machine learning model to predict customer churn. The dataset that you are using contains hundreds of features, but you suspect that not all of these features are equally important for the model’s accuracy. To improve the model’s performance and reduce its complexity, the team wants to focus on selecting only the most relevant features that contribute significantly to minimizing the model’s error rate.

Which feature engineering process should your team apply to select a subset of features that are the most relevant towards minimizing the error rate of the trained model?

A . Feature extraction
B . Feature creation
C . Feature transformation
D . Feature selection

Lösung einblenden Lösung ausblenden

Question #44

You are a data scientist working on a predictive maintenance model for an industrial manufacturing company. The model is designed to predict equipment failures based on sensor data collected over time. During the development process, you notice that the model performs exceptionally well on the training data but struggles to generalize to new, unseen data. Additionally, there are some indications that the model might not be fully capturing the complexity of the problem. To ensure the model performs well in production, you need to identify whether it is overfitting, underfitting, or both.

Which of the following strategies is the MOST EFFECTIVE for identifying overfitting and underfitting in your model?

A . Perform cross-validation with different subsets of the data; if the model’s performance varies significantly across folds, the model is underfitting
B . Compare the training and validation loss curves over time; if the validation loss is much higher than the training loss, the model is likely overfitting
C . Reduce the number of features in the model; if performance improves, the model was previously overfitting
D . Analyze the model’s performance on a separate test set; if the model performs well on both the training and test sets, it is neither overfitting nor underfitting

Lösung einblenden Lösung ausblenden

Richtige Antwort: B
B

Explanation:

Correct option:

Compare the training and validation loss curves over time; if the validation loss is much higher than the training loss, the model is likely overfitting

Your model is underfitting the training data when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y). Your model is overfitting your training data when you see that the model performs well on the training data but does not perform well on the evaluation data. This is because the model is memorizing the data it has seen and is unable to generalize to unseen examples.

via – https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html

Comparing the training and validation loss curves is an effective way to identify overfitting. If the validation loss is significantly higher than the training loss, it indicates that the model is overfitting the training data and failing to generalize to unseen data. This is a clear sign that the model is too complex or trained for too many epochs.

Incorrect options:

Reduce the number of features in the model; if performance improves, the model was previously overfitting – Reducing the number of features might reduce overfitting, but it’s not a diagnostic method. It’s a corrective measure rather than a strategy to identify overfitting or underfitting. The model’s behavior on training versus validation data is a better indicator of overfitting or underfitting.

Perform cross-validation with different subsets of the data; if the model’s performance varies significantly across folds, the model is underfitting – Cross-validation is a useful technique for assessing model performance, but significant variation in performance across folds is more indicative of data variance rather than underfitting. Underfitting is generally identified when the model performs poorly on both the training and validation sets.

Analyze the model’s performance on a separate test set; if the model performs well on both the training and test sets, it is neither overfitting nor underfitting – Analyzing performance on a test set is important, but it only confirms the final model’s ability to generalize. If the model performs well on both training and test data, it likely indicates a good fit, but this does not help identify overfitting or underfitting during the training process itself.

Reference: https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html

https://aws.amazon.com/what-is/overfitting/

Question #45

You are a machine learning engineer tasked with building a deep learning model to classify images for an autonomous vehicle project. The dataset is massive, consisting of millions of labeled images. Initial training runs on a single GPU instance in Amazon SageMaker are taking too long, and the training costs are rising. You need to reduce the model training time without compromising performance significantly.

Which of the following approaches is the MOST LIKELY to effectively reduce the training time while maintaining model performance?

A . Implement distributed training using multiple GPU instances to parallelize the training process, reducing the overall time
B . Reduce the size of the training dataset to speed up training, even if it means using fewer examples per class
C . Switch to a smaller instance type to reduce computational costs, accepting a longer training time as a trade-off
D . Enable early stopping to halt training when the model’s performance on the validation set stops improving, thereby avoiding overfitting

Lösung einblenden Lösung ausblenden

Richtige Antwort: A
A

Explanation:

Correct option:

Implement distributed training using multiple GPU instances to parallelize the training process, reducing the overall time

Distributed training allows you to split the workload across multiple GPU instances, significantly reducing training time by processing more data in parallel. Amazon SageMaker supports distributed training, making this an effective approach for large datasets and complex models.

SageMaker provides distributed training libraries and supports various distributed training options for deep learning tasks such as computer vision (CV) and natural language processing (NLP). With SageMaker’s distributed training libraries, you can run highly scalable and cost-effective custom data parallel and model parallel deep learning training jobs.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html

Incorrect options:

Enable early stopping to halt training when the model’s performance on the validation set stops improving, thereby avoiding overfitting – Early stopping is a useful technique to prevent overfitting by stopping training once the validation performance plateaus. While this can reduce training time, its effectiveness depends on the model’s behavior and may not significantly shorten training time if the model converges slowly, as exemplified by the long training runs on a single GPU instance for the given use case.

Switch to a smaller instance type to reduce computational costs, accepting a longer training time as a trade-off – Switching to a smaller instance type might reduce costs, but it will likely increase training time, which is counterproductive to the goal of reducing overall training time.

Reduce the size of the training dataset to speed up training, even if it means using fewer examples per class – Reducing the dataset size could speed up training, but it would likely compromise model performance by reducing the amount of data the model can learn from, especially in a scenario where data diversity is critical, such as image classification for autonomous vehicles.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html

https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-early-stopping.html

Question #46

In what scenario would traditional programming techniques be preferable over ML?

A . When complex logic and scalability are required.
B . When the problem can be solved with simple rules.
C . When personalized recommendations are needed.
D . When quick adaptation to new data is necessary.

Lösung einblenden Lösung ausblenden

Question #47

You are tasked with building a predictive model for customer lifetime value (CLV) using Amazon SageMaker. Given the complexity of the model, it’s crucial to optimize hyperparameters to achieve the best possible performance. You decide to use SageMaker’s automatic model tuning (hyperparameter optimization) with Random Search strategy to fine-tune the model. You have a large dataset, and the tuning job involves several hyperparameters, including the learning rate, batch size, and dropout rate. During the tuning process, you observe that some of the trials are not converging effectively, and the results are not as expected. You suspect that the hyperparameter ranges or the strategy you are using may need adjustment.

Which of the following approaches is MOST LIKELY to improve the effectiveness of the hyperparameter tuning process?

A . Decrease the number of total trials but increase the number of parallel jobs to speed up the tuning process
B . Switch from the Random Search strategy to the Bayesian Optimization strategy and narrow the range of critical hyperparameters
C . Use the Grid Search strategy with a wide range for all hyperparameters and increase the number of total trials
D . Increase the number of hyperparameters being tuned and widen the range for all hyperparameters

Lösung einblenden Lösung ausblenden

Richtige Antwort: B
B

Explanation:

Correct option:

Switch from the Random Search strategy to the Bayesian Optimization strategy and narrow the range of critical hyperparameters

When you’re training machine learning models, each dataset and model needs a different set of hyperparameters, which are a kind of variable. The only way to determine these is through multiple experiments, where you pick a set of hyperparameters and run them through your model. This is called hyperparameter tuning. In essence, you’re training your model sequentially with different sets of hyperparameters. This process can be manual, or you can pick one of several automated hyperparameter tuning methods.

Bayesian Optimization is a technique based on Bayes’ theorem, which describes the probability of an event occurring related to current knowledge. When this is applied to hyperparameter optimization, the algorithm builds a probabilistic model from a set of hyperparameters that optimizes a specific metric. It uses regression analysis to iteratively choose the best set of hyperparameters.

Random Search selects groups of hyperparameters randomly on each iteration. It works well when a relatively small number of the hyperparameters primarily determine the model outcome.

Bayesian Optimization is more efficient than Random Search for hyperparameter tuning, especially when dealing with complex models and large hyperparameter spaces. It learns from previous trials to predict the best set of hyperparameters, thus focusing the search more effectively. Narrowing the range of critical hyperparameters can further improve the chances of finding the optimal values, leading to better model convergence and performance.

How hyperparameter tuning with Amazon SageMaker works:

via – https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html

Incorrect options:

Increase the number of hyperparameters being tuned and widen the range for all hyperparameters – Increasing the number of hyperparameters and widening the range without any strategic approach can lead to a more extensive search space, which could cause the tuning process to become inefficient and less likely to converge on optimal values.

Decrease the number of total trials but increase the number of parallel jobs to speed up the tuning process – Reducing the total number of trials might speed up the tuning process, but it also reduces the chances of finding the best hyperparameters, especially if the model is complex. Increasing parallel jobs can improve throughput but doesn’t necessarily enhance the quality of the search.

Use the Grid Search strategy with a wide range for all hyperparameters and increase the number of total trials – Grid Search works well, but it’s relatively tedious and computationally intensive, especially with large numbers of hyperparameters. It is less efficient than Bayesian Optimization for complex models. A wide range of hyperparameters without focus would result in more trials, but it is not guaranteed to find the best values, especially with a larger search space.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html https://aws.amazon.com/what-is/hyperparameter-tuning/ https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html

Question #48

You are an ML engineer at a startup that is developing a recommendation engine for an e-commerce platform. The workload involves training models on large datasets and deploying them to serve real-time recommendations to customers. The training jobs are sporadic but require significant computational power, while the inference workloads must handle varying traffic throughout the day. The company is cost-conscious and aims to balance cost efficiency with the need for scalability and performance. Given these requirements, which approach to resource allocation is the MOST SUITABLE for training and inference, and why?

A . Use on-demand instances for both training and inference to ensure that the company only pays for the compute resources it uses when it needs them, avoiding any upfront commitments
B . Use on-demand instances for training, allowing the flexibility to scale resources as needed, and use provisioned resources with auto-scaling for inference to handle varying traffic while controlling costs
C . Use provisioned resources with spot instances for both training and inference to take advantage of the lowest possible costs, accepting the potential for interruptions during workload execution
D . Use provisioned resources with reserved instances for both training and inference to lock in lower
costs and guarantee resource availability, ensuring predictability in budgeting

Lösung einblenden Lösung ausblenden

Richtige Antwort: B
B

Explanation:

Correct option:

Use on-demand instances for training, allowing the flexibility to scale resources as needed, and use provisioned resources with auto-scaling for inference to handle varying traffic while controlling costs

Using on-demand instances for training offers flexibility, allowing you to allocate resources only when needed, which is ideal for sporadic training jobs. For inference, provisioned resources with auto-scaling ensure that the system can handle varying traffic while controlling costs, as it can scale down during periods of low demand.

via – https://aws.amazon.com/ec2/pricing/

Incorrect options:

Use on-demand instances for both training and inference to ensure that the company only pays for the compute resources it uses when it needs them, avoiding any upfront commitments – On-demand instances are flexible and ensure that you only pay for what you use, but they can be more expensive over time compared to provisioned resources, especially if workloads are consistent and predictable. This approach may be suboptimal for cost-sensitive long-term use.

Use provisioned resources with reserved instances for both training and inference to lock in lower costs and guarantee resource availability, ensuring predictability in budgeting – Provisioned resources with reserved instances provide cost savings and guaranteed availability but lack the flexibility needed for sporadic training jobs. For inference workloads with fluctuating demand, this approach might not handle traffic spikes efficiently without additional auto-scaling mechanisms.

Use provisioned resources with spot instances for both training and inference to take advantage of the lowest possible costs, accepting the potential for interruptions during workload execution – Spot instances provide significant cost savings but come with the risk of interruptions, which can be problematic for both training and real-time inference workloads. This option is generally better suited for non-critical batch jobs where interruptions can be tolerated.

References:

https://aws.amazon.com/ec2/pricing/

https://aws.amazon.com/ec2/pricing/reserved-instances/

https://aws.amazon.com/ec2/spot/

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

Question #49

Which of the following summarizes the differences between a token and an embedding in the context of generative AI?

A . An embedding is a sequence of characters that a model can interpret or predict as a single unit of meaning, whereas, a token is a vector of numerical values that represents condensed information obtained by transforming input into that vector
B . Both token and embedding refer to a sequence of characters that a model can interpret or predict as a single unit of meaning
C . A token is a sequence of characters that a model can interpret or predict as a single unit of meaning, whereas, an embedding is a vector of numerical values that represents condensed information obtained by transforming input into that vector
D . Both token and embedding refer to a vector of numerical values that represents condensed
information obtained by transforming input into that vector

Lösung einblenden Lösung ausblenden

Question #50

A machine learning specialist is developing a proof of concept for government users whose primary concern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model for a photo classifier application. The specialist wants to protect the data so that it cannot be accessed and transferred to a remote host by malicious code accidentally installed on the training container.

Which action will provide the MOST secure protection?

A . Encrypt the weights of the CNN model.
B . Enable network isolation for training jobs.
C . Remove Amazon S3 access permissions from the SageMaker execution role.
D . Encrypt the training and validation dataset.

Lösung einblenden Lösung ausblenden