Free Google Associate Data Practitioner Übungsprüfungen - Seite 4 von 4

Question #31

You have a Dataproc cluster that performs batch processing on data stored in Cloud Storage. You need to schedule a daily Spark job to generate a report that will be emailed to stakeholders. You need a fully-managed solution that is easy to implement and minimizes complexity.

What should you do?

A . Use Cloud Composer to orchestrate the Spark job and email the report.
B . Use Dataproc workflow templates to define and schedule the Spark job, and to email the report.
C . Use Cloud Run functions to trigger the Spark job and email the report.
D . Use Cloud Scheduler to trigger the Spark job. and use Cloud Run functions to email the report.

Lösung einblenden Lösung ausblenden

Question #31

You have a Dataproc cluster that performs batch processing on data stored in Cloud Storage. You need to schedule a daily Spark job to generate a report that will be emailed to stakeholders. You need a fully-managed solution that is easy to implement and minimizes complexity.

What should you do?

A . Use Cloud Composer to orchestrate the Spark job and email the report.
B . Use Dataproc workflow templates to define and schedule the Spark job, and to email the report.
C . Use Cloud Run functions to trigger the Spark job and email the report.
D . Use Cloud Scheduler to trigger the Spark job. and use Cloud Run functions to email the report.

Lösung einblenden Lösung ausblenden

Question #33

You work for an online retail company. Your company collects customer purchase data in CSV files and pushes them to Cloud Storage every 10 minutes. The data needs to be transformed and loaded into BigQuery for analysis. The transformation involves cleaning the data, removing duplicates, and enriching it with product information from a separate table in BigQuery. You need to implement a low-overhead solution that initiates data processing as soon as the files are loaded into Cloud Storage.

What should you do?

A . Use Cloud Composer sensors to detect files loading in Cloud Storage. Create a Dataproc cluster, and use a Composer task to execute a job on the cluster to process and load the data into BigQuery.
B . Schedule a direct acyclic graph (DAG) in Cloud Composer to run hourly to batch load the data from Cloud Storage to BigQuery, and process the data in BigQuery using SQL.
C . Use Dataflow to implement a streaming pipeline using an OBJECT_FINALIZE notification from Pub/Sub to read the data from Cloud Storage, perform the transformations, and write the data to BigQuery.
D . Create a Cloud Data Fusion job to process and load the data from Cloud Storage into BigQuery. Create an OBJECT_FINALI ZE notification in Pub/Sub, and trigger a Cloud Run function to start the Cloud Data Fusion job as soon as new files are loaded.

Lösung einblenden Lösung ausblenden

Question #34

Your organization has several datasets in BigQuery. The datasets need to be shared with your external partners so that they can run SQL queries without needing to copy the data to their own projects. You have organized each partner’s data in its own BigQuery dataset. Each partner should be able to access only their data. You want to share the data while following Google-recommended practices.

What should you do?

A . Use Analytics Hub to create a listing on a private data exchange for each partner dataset. Allow each partner to subscribe to their respective listings.
B . Create a Dataflow job that reads from each BigQuery dataset and pushes the data into a dedicated Pub/Sub topic for each partner. Grant each partner the pubsub. subscriber IAM role.
C . Export the BigQuery data to a Cloud Storage bucket. Grant the partners the storage.objectUser IAM role on the bucket.
D . Grant the partners the bigquery.user IAM role on the BigQuery project.

Lösung einblenden Lösung ausblenden

Question #35

You are working on a data pipeline that will validate and clean incoming data before loading it into BigQuery for real-time analysis. You want to ensure that the data validation and cleaning is performed efficiently and can handle high volumes of data.

What should you do?

A . Write custom scripts in Python to validate and clean the data outside of Google Cloud. Load the cleaned data into BigQuery.
B . Use Cloud Run functions to trigger data validation and cleaning routines when new data arrives in Cloud Storage.
C . Use Dataflow to create a streaming pipeline that includes validation and transformation steps.
D . Load the raw data into BigQuery using Cloud Storage as a staging area, and use SQL queries in BigQuery to validate and clean the data.

Lösung einblenden Lösung ausblenden

Question #35

You are working on a data pipeline that will validate and clean incoming data before loading it into BigQuery for real-time analysis. You want to ensure that the data validation and cleaning is performed efficiently and can handle high volumes of data.

What should you do?

A . Write custom scripts in Python to validate and clean the data outside of Google Cloud. Load the cleaned data into BigQuery.
B . Use Cloud Run functions to trigger data validation and cleaning routines when new data arrives in Cloud Storage.
C . Use Dataflow to create a streaming pipeline that includes validation and transformation steps.
D . Load the raw data into BigQuery using Cloud Storage as a staging area, and use SQL queries in BigQuery to validate and clean the data.

Lösung einblenden Lösung ausblenden

Question #37

You are constructing a data pipeline to process sensitive customer data stored in a Cloud Storage bucket. You need to ensure that this data remains accessible, even in the event of a single-zone outage.

What should you do?

A . Set up a Cloud CDN in front of the bucket.
B . Enable Object Versioning on the bucket.
C . Store the data in a multi-region bucket.
D . Store the data in Nearline storaqe.

Lösung einblenden Lösung ausblenden

Question #38

Your organization uses scheduled queries to perform transformations on data stored in BigQuery. You discover that one of your scheduled queries has failed. You need to troubleshoot the issue as quickly as possible.

What should you do?

A . Navigate to the Logs Explorer page in Cloud Logging. Use filters to find the failed job, and analyze the error details.
B . Set up a log sink using the gcloud CLI to export BigQuery audit logs to BigQuery. Query those logs to identify the error associated with the failed job ID.
C . Request access from your admin to the BigQuery information_schema. Query the jobs view with
the failed job ID, and analyze error details.
D . Navigate to the Scheduled queries page in the Google Cloud console. Select the failed job, and analyze the error details.

Lösung einblenden Lösung ausblenden