Free Amazon MLA-C01 Übungsprüfungen - Seite 4 von 6

Question #31

You are a data scientist working for a media company that processes large volumes of video and image data to generate personalized content recommendations. The dataset, which is stored in Amazon S3, contains tens of millions of small image files and several terabytes of high-resolution large video files. The training jobs you run on Amazon SageMaker require low-latency access to this data and need to be completed quickly to keep up with the dynamic content pipeline.

Given the characteristics of your data and the requirements for low-latency, high-throughput access, which approach is the MOST APPROPRIATE for this scenario?

A . Use Fast File mode with Amazon S3 for the large video files, enabling on-demand streaming of data, and store the small image files locally on the training instances to reduce I/O latency
B . Use Amazon FSx for Lustre to mount the entire dataset as a high-performance file system, providing consistently low-latency access to both the small image files and the large video files
C . Create an FSx for Lustre file system linked with the Amazon S3 bucket folder having the training data for the small image files and apply Fast File mode for the video files in the relevant Amazon S3 bucket folder, thereby combining the strengths of both approaches
D . Use Fast File mode with Amazon S3 to stream the small image files directly to the training instances on-demand, minimizing the time required to start training

Lösung einblenden Lösung ausblenden

Richtige Antwort: C
C

Explanation:

Correct option:

Create an FSx for Lustre file system linked with the Amazon S3 bucket folder having the training data for the small image files and apply Fast File mode for the video files in the relevant Amazon S3 bucket folder, thereby combining the strengths of both approaches

via – https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html

FSx for Lustre can scale to hundreds of gigabytes of throughput and millions of IOPS with low-latency file retrieval. When starting a training job, SageMaker mounts the FSx for Lustre file system to the training instance file system, then starts your training script. Mounting itself is a relatively fast operation that doesn’t depend on the size of the dataset stored in FSx for Lustre.

If your dataset is too large for file mode, has many small files that you can’t serialize easily, or uses a random read access pattern, FSx for Lustre is a good option to consider. Its file system scales to hundreds of gigabytes per second (GB/s) of throughput and millions of IOPS, which is ideal when you have many small files.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html

For the given use case, you can create an FSx for Lustre file system linked with the Amazon S3 bucket folder having the training data for the small image files, like so:

via – https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html

You can then apply Fast File mode for the video files in the relevant Amazon S3 bucket folder, like so:

via – https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html

Incorrect options:

Use Amazon FSx for Lustre to mount the entire dataset as a high-performance file system, providing consistently low-latency access to both the small image files and the large video files – Amazon FSx for Lustre is designed for high-performance workloads with large datasets, especially when you need low-latency access to many small files that you can’t serialize easily, or uses a random read access pattern.

FSx is not the optimal solution to provide low-latency access to many large video files, you should rather use the Fast File mode for the video files in the relevant Amazon S3 bucket folder. So, this option is incorrect.

Use Fast File mode with Amazon S3 to stream the small image files directly to the training instances on-demand, minimizing the time required to start training – While Fast File mode is effective for large files, it does not provide the low-latency, high-throughput access needed for a large number of small files. So, this option is incorrect.

Use Fast File mode with Amazon S3 for the large video files, enabling on-demand streaming of data, and store the small image files locally on the training instances to reduce I/O latency – Splitting data management between Fast File mode for large files and local storage for small files adds unnecessary complexity and additional costs without providing a proportional performance improvement, so this option is incorrect.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html

https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/

https://aws.amazon.com/fsx/lustre/faqs/

Question #32

Which of the following highlights the differences between model parameters and hyperparameters in the context of generative AI?

A . Both Hyperparameters and model parameters are values that define a model and its behavior in interpreting input and generating responses
B . Model parameters are values that define a model and its behavior in interpreting input and generating responses. Hyperparameters are values that can be adjusted for model customization to control the training process
C . Hyperparameters are values that define a model and its behavior in interpreting input and generating responses. Model parameters are values that can be adjusted for model customization to control the training process
D . Both Hyperparameters and model parameters are values that can be adjusted for model customization to control the training process

Lösung einblenden Lösung ausblenden

Richtige Antwort: B
B

Explanation:

Correct option:

Model parameters are values that define a model and its behavior in interpreting input and generating responses. Hyperparameters are values that can be adjusted for model customization to control the training process

Hyperparameters are values that can be adjusted for model customization to control the training process and, consequently, the output custom model. In other words, hyperparameters are external configurations set before the training process begins. They control the training process and the structure of the model but are not adjusted by the training algorithm itself. Examples include the learning rate, the number of layers in a neural network, etc.

Model parameters are values that define a model and its behavior in interpreting input and generating responses. Model parameters are controlled and updated by providers. You can also update model parameters to create a new model through the process of model customization. In other words, Model parameters are the internal variables of the model that are learned and adjusted during the training process. These parameters directly influence the output of the model for a given input. Examples include the weights and biases in a neural network.

via – https://docs.aws.amazon.com/bedrock/latest/userguide/key-definitions.html

Incorrect options:

Both Hyperparameters and model parameters are values that can be adjusted for model customization to control the training process

Both Hyperparameters and model parameters are values that define a model and its behavior in interpreting input and generating responses

Hyperparameters are values that define a model and its behavior in interpreting input and generating responses. Model parameters are values that can be adjusted for model customization to control the training process

These three options contradict the explanation provided above, so these options are incorrect.

Reference: https://docs.aws.amazon.com/bedrock/latest/userguide/key-definitions.html

Question #33

You are a machine learning engineer at a fintech company responsible for maintaining the ML infrastructure that powers real-time credit scoring for loan applications. The system must handle high volumes of requests with low latency and be resilient to any failures. To ensure the infrastructure meets the company’s performance and reliability requirements, you need to monitor key performance metrics related to scalability, availability, utilization, throughput and fault tolerance.

Which combination of metrics and monitoring strategies is the MOST EFFECTIVE for ensuring the ML infrastructure meets these requirements?

A . Monitor CPU and memory utilization to ensure that compute resources are not overburdened, track request throughput to measure the number of predictions per second, and use auto-scaling policies to maintain high availability and scalability during traffic spikes
B . Track model accuracy and precision to ensure that predictions are correct, monitor disk space utilization to prevent storage overflows, and manually scale the infrastructure during peak usage periods
C . Monitor training loss and validation accuracy to track model performance, measure network bandwidth to ensure efficient data transfer, and use Amazon CloudWatch alarms to automatically restart failed instances
D . Measure latency to ensure predictions are delivered quickly, monitor the number of failed requests to assess fault tolerance, and use scheduled scaling to adjust resources based on anticipated demand

Lösung einblenden Lösung ausblenden

Richtige Antwort: A
A

Explanation:

Correct option:

Monitor CPU and memory utilization to ensure that compute resources are not overburdened, track request throughput to measure the number of predictions per second, and use auto-scaling policies to maintain high availability and scalability during traffic spikes

This option correctly identifies the key performance metrics that directly impact the reliability and efficiency of ML infrastructure. Monitoring CPU and memory utilization ensures that resources are being used effectively and are not overburdened, which is critical for maintaining system performance. Tracking request throughput helps measure the system’s capacity to handle large volumes of predictions. Auto-scaling policies ensure that the system can scale up or down in response to varying demand, maintaining availability and minimizing costs.

Incorrect options:

Track model accuracy and precision to ensure that predictions are correct, monitor disk space utilization to prevent storage overflows, and manually scale the infrastructure during peak usage periods – While monitoring model accuracy and precision is important for evaluating model performance, these metrics do not directly relate to the infrastructure’s ability to handle requests efficiently. Manual scaling is less effective and responsive than auto-scaling policies.

Measure latency to ensure predictions are delivered quickly, monitor the number of failed requests to assess fault tolerance, and use scheduled scaling to adjust resources based on anticipated demand – Latency and fault tolerance are important metrics, but relying solely on scheduled scaling is less adaptive to real-time traffic fluctuations than auto-scaling. Latency should be monitored alongside other metrics, such as utilization and throughput, for a comprehensive view.

Monitor training loss and validation accuracy to track model performance, measure network bandwidth to ensure efficient data transfer, and use Amazon CloudWatch alarms to automatically restart failed instances – Training loss and validation accuracy are key during the model development phase but are not relevant for monitoring deployed ML infrastructure. While network bandwidth and CloudWatch alarms are useful, they should be part of a broader strategy that includes resource utilization, throughput, and scalability.

References:

https://aws.amazon.com/cloudwatch/ https://docs.aws.amazon.com/machine-learning/latest/dg/cw-doc.html

Question #34

You are a machine learning engineer at a financial services company tasked with building a real-time fraud detection system. The model needs to be highly accurate to minimize false positives and false negatives. However, the company has a limited budget for cloud resources, and the model needs to be retrained frequently to adapt to new fraud patterns. You must carefully balance model performance, training time, and cost to meet these requirements.

Which of the following strategies is the MOST LIKELY to achieve an optimal balance between model performance, training time, and cost?

A . Deploy a simpler model like logistic regression to reduce training time and cost, while accepting a slight reduction in model accuracy
B . Implement a tree-based model like XGBoost with early stopping and hyperparameter tuning, balancing accuracy with reduced training time and computational cost
C . Use a deep neural network with multiple layers and complex architecture to maximize performance, even if it requires significant computational resources and longer training times
D . Choose a support vector machine (SVM) with a nonlinear kernel to enhance accuracy, regardless of
the increased training time and cost associated with large datasets

Lösung einblenden Lösung ausblenden

Question #35

How is deep learning different from general machine learning?

A . It requires no domain knowledge.
B . It uses a layered architecture mimicking the human brain.
C . It relies solely on historical data.
D . It is unrelated to pattern recognition.

Lösung einblenden Lösung ausblenden

Question #36

Which AWS services are specifically designed to aid in monitoring machine learning models and incorporating human review processes? (Select two)

A . Amazon SageMaker Feature Store
B . Amazon SageMaker Ground Truth
C . Amazon Augmented AI (Amazon A2I)
D . Amazon SageMaker Data Wrangler
E . Amazon SageMaker Model Monitor

Lösung einblenden Lösung ausblenden

Question #36

Which AWS services are specifically designed to aid in monitoring machine learning models and incorporating human review processes? (Select two)

A . Amazon SageMaker Feature Store
B . Amazon SageMaker Ground Truth
C . Amazon Augmented AI (Amazon A2I)
D . Amazon SageMaker Data Wrangler
E . Amazon SageMaker Model Monitor

Lösung einblenden Lösung ausblenden

Question #38

You are a data scientist at a healthcare company working on deploying a machine learning model that predicts patient outcomes based on real-time data from wearable devices. The model needs to be containerized for easy deployment and scaling across different environments, including development, testing, and production. The company wants to ensure that container images are managed efficiently, securely, and consistently across all environments.

Given these requirements, which combination of AWS services is the MOST SUITABLE for building, storing, deploying, and maintaining the containerized ML solution?

A . Use Docker Hub to store the container images, Amazon EKS for orchestrating the containers, and AWS Lambda to trigger updates to the containers when new images are pushed
B . Use Amazon ECR to store the container images, Amazon EKS for orchestrating the containers, and AWS CodePipeline for automating the CI/CD pipeline, ensuring that updates to the model are seamlessly deployed
C . Use Amazon ECR to store container images, manually deploy containers on Amazon EC2 instances, and use AWS CloudFormation to manage the infrastructure configuration
D . Use Amazon ECS to manage and deploy the containerized model, Amazon S3 to store container images, and manually push updates to the containers using the AWS CLI

Lösung einblenden Lösung ausblenden

Richtige Antwort: B
B

Explanation:

Correct option:

Use Amazon ECR to store the container images, Amazon EKS for orchestrating the containers, and AWS CodePipeline for automating the CI/CD pipeline, ensuring that updates to the model are seamlessly deployed

Amazon ECR is a fully managed container registry that integrates seamlessly with Amazon EKS, allowing you to securely store and manage container images. Amazon EKS provides a scalable and managed Kubernetes environment for orchestrating containers. AWS CodePipeline can be used to automate the continuous integration and delivery (CI/CD) process, ensuring that new versions of the model are automatically built, tested, and deployed without manual intervention. This combination ensures consistency, security, and scalability across the ML solution.

via –

https://docs.aws.amazon.com/whitepapers/latest/build-secure-enterprise-ml-platform/automation-pipelines.html

Incorrect options:

Use Amazon ECS to manage and deploy the containerized model, Amazon S3 to store container images, and manually push updates to the containers using the AWS CLI – While Amazon ECS is a powerful container management service, storing container images in Amazon S3 is not recommended since S3 is not optimized for container image storage. Additionally, manually pushing updates via the AWS CLI lacks automation and can introduce errors, making it less suitable for a production environment.

Use Amazon ECR to store container images, manually deploy containers on Amazon EC2 instances, and use AWS CloudFormation to manage the infrastructure configuration – Manually deploying containers on Amazon EC2 instances and managing infrastructure with AWS CloudFormation adds unnecessary complexity and management overhead. Using Amazon EKS for orchestration is more efficient and scalable for containerized workloads.

Use Docker Hub to store the container images, Amazon EKS for orchestrating the containers, and AWS Lambda to trigger updates to the containers when new images are pushed – Docker Hub is a widely used container registry, but for enterprise solutions on AWS, Amazon ECR is more secure and integrates better with other AWS services. Using AWS Lambda to trigger updates is unconventional and less efficient compared to using a CI/CD pipeline with AWS CodePipeline.

References:

https://docs.aws.amazon.com/whitepapers/latest/build-secure-enterprise-ml-platform/automation-pipelines.html

https://aws.amazon.com/blogs/machine-learning/build-a-ci-cd-pipeline-for-deploying-custom-machine-learning-models-using-aws-services/

Question #39

You are a machine learning engineer at a fintech company tasked with developing and deploying an end-to-end machine learning workflow for fraud detection. The workflow involves multiple steps, including data extraction, preprocessing, feature engineering, model training, hyperparameter tuning, and deployment. The company requires the solution to be scalable, support complex dependencies between tasks, and provide robust monitoring and versioning capabilities. Additionally, the workflow needs to integrate seamlessly with existing AWS services.

Which deployment orchestrator is the MOST SUITABLE for managing and automating your ML workflow?

A . Use AWS Step Functions to build a serverless workflow that integrates with SageMaker for model training and deployment, ensuring scalability and fault tolerance
B . Use AWS Lambda functions to manually trigger each step of the ML workflow, enabling flexible execution without needing a predefined orchestration tool
C . Use Amazon SageMaker Pipelines to orchestrate the entire ML workflow, leveraging its built-in integration with SageMaker features like training, tuning, and deployment
D . Use Apache Airflow to define and manage the workflow with custom DAGs (Directed Acyclic
Graphs), integrating with AWS services through operators and hooks

Lösung einblenden Lösung ausblenden

Richtige Antwort: C
C

Explanation:

Correct option:

Use Amazon SageMaker Pipelines to orchestrate the entire ML workflow, leveraging its built-in integration with SageMaker features like training, tuning, and deployment

Amazon SageMaker Pipelines is a purpose-built workflow orchestration service to automate machine learning (ML) development.

SageMaker Pipelines is specifically designed for orchestrating ML workflows. It provides native integration with SageMaker features like model training, tuning, and deployment. It also supports versioning, lineage tracking, and automatic execution of workflows, making it the ideal choice for managing end-to-end ML workflows in AWS.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html

Incorrect options:

Use Apache Airflow to define and manage the workflow with custom DAGs (Directed Acyclic Graphs), integrating with AWS services through operators and hooks – Apache Airflow is a powerful orchestration tool that allows you to define complex workflows using custom DAGs. However, it requires significant setup and maintenance, and while it can integrate with AWS services, it does not provide the seamless, built-in integration with SageMaker that SageMaker Pipelines offers.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA):

via – https://aws.amazon.com/managed-workflows-for-apache-airflow/

Use AWS Step Functions to build a serverless workflow that integrates with SageMaker for model training and deployment, ensuring scalability and fault tolerance – AWS Step Functions is a serverless orchestration service that can integrate with SageMaker and other AWS services. However, it is more general-purpose and lacks some of the ML-specific features, such as model lineage tracking and hyperparameter tuning, that are built into SageMaker Pipelines.

Use AWS Lambda functions to manually trigger each step of the ML workflow, enabling flexible execution without needing a predefined orchestration tool – AWS Lambda is useful for triggering specific tasks, but manually managing each step of a complex ML workflow without a comprehensive orchestration tool is not scalable or maintainable. It does not provide the task dependency management, monitoring, and versioning required for an end-to-end ML workflow.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html https://aws.amazon.com/managed-workflows-for-apache-airflow/

Question #40

What is Feature Engineering in the context of machine learning?

A . Feature Engineering involves selecting, modifying, or creating features from raw data to improve the performance of machine learning models, and it is important because it can significantly enhance model accuracy and efficiency
B . Feature Engineering refers to the visualization of data to understand patterns, and it is important because it helps in identifying trends in the dataset
C . Feature Engineering is the process of tuning hyperparameters in a machine learning model, and it is important because it optimizes the model’s performance
D . Feature Engineering is the process of collecting raw data, and it is important because it ensures the availability of data for model training

Lösung einblenden Lösung ausblenden