Free Google Associate Data Practitioner Übungsprüfungen - Seite 3 von 4

Question #21

You need to design a data pipeline that ingests data from CSV, Avro, and Parquet files into Cloud Storage. The data includes raw user input. You need to remove all malicious SQL injections before storing the data in BigQuery.

Which data manipulation methodology should you choose?

A . EL
B . ELT
C . ETL
D . ETLT

Lösung einblenden Lösung ausblenden

Question #22

You work for a financial services company that handles highly sensitive data. Due to regulatory requirements, your company is required to have complete and manual control of data encryption.

Which type of keys should you recommend to use for data storage?

A . Use customer-supplied encryption keys (CSEK).
B . Use a dedicated third-party key management system (KMS) chosen by the company.
C . Use Google-managed encryption keys (GMEK).
D . Use customer-managed encryption keys (CMEK).

Lösung einblenden Lösung ausblenden

Question #23

Your organization’s business analysts require near real-time access to streaming data. However, they are reporting that their dashboard queries are loading slowly. After investigating BigQuery query performance, you discover the slow dashboard queries perform several joins and aggregations.

You need to improve the dashboard loading time and ensure that the dashboard data is as up-to-date as possible.

What should you do?

A . Disable BiqQuery query result caching.
B . Modify the schema to use parameterized data types.
C . Create a scheduled query to calculate and store intermediate results.
D . Create materialized views.

Lösung einblenden Lösung ausblenden

Question #24

You are a data analyst working with sensitive customer data in BigQuery. You need to ensure that only authorized personnel within your organization can query this data, while following the principle of least privilege.

What should you do?

A . Enable access control by using IAM roles.
B . Encrypt the data by using customer-managed encryption keys (CMEK).
C . Update dataset privileges by using the SQL GRANT statement.
D . Export the data to Cloud Storage, and use signed URLs to authorize access.

Lösung einblenden Lösung ausblenden

Question #25

Your team wants to create a monthly report to analyze inventory data that is updated daily. You need to aggregate the inventory counts by using only the most recent month of data, and save the results to be used in a Looker Studio dashboard.

What should you do?

A . Create a materialized view in BigQuery that uses the SUM( ) function and the DATE_SUB( ) function.
B . Create a saved query in the BigQuery console that uses the SUM( ) function and the DATE_SUB( ) function. Re-run the saved query every month, and save the results to a BigQuery table.
C . Create a BigQuery table that uses the SUM( ) function and the _PARTITIONDATE filter.
D . Create a BigQuery table that uses the SUM( ) function and the DATE_DIFF( ) function.

Lösung einblenden Lösung ausblenden

Question #26

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day.

Data processing is performed in stages, where the output of one stage becomes the input of the next.

Each stage takes a long time to run. Occasionally a stage fails, and you have to address

the problem. You need to ensure that the final output is generated as quickly as possible.

What should you do?

A . Design a Spark program that runs under Dataproc. Code the program to wait for user input when an error is detected. Rerun the last action after correcting any stage output data errors.
B . Design the pipeline as a set of PTransforms in Dataflow. Restart the pipeline after correcting any stage output data errors.
C . Design the workflow as a Cloud Workflow instance. Code the workflow to jump to a given stage based on an input parameter. Rerun the workflow after correcting any stage output data errors.
D . Design the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors.

Lösung einblenden Lösung ausblenden

Question #27

Your organization sends IoT event data to a Pub/Sub topic. Subscriber applications read and perform transformations on the messages before storing them in the data warehouse. During particularly busy times when more data is being written to the topic, you notice that the subscriber applications are not acknowledging messages within the deadline. You need to modify your pipeline to handle these activity spikes and continue to process the messages.

What should you do?

A . Retry messages until they are acknowledged.
B . Implement flow control on the subscribers
C . Forward unacknowledged messages to a dead-letter topic.
D . Seek back to the last acknowledged message.

Lösung einblenden Lösung ausblenden

Question #28

Your organization has a BigQuery dataset that contains sensitive employee information such as salaries and performance reviews. The payroll specialist in the HR department needs to have continuous access to aggregated performance data, but they do not need continuous access to other sensitive dat

a. You need to grant the payroll specialist access to the performance data without granting them access to the entire dataset using the simplest and most secure approach.

What should you do?

A . Use authorized views to share query results with the payroll specialist.
B . Create row-level and column-level permissions and policies on the table that contains performance data in the dataset. Provide the payroll specialist with the appropriate permission set.
C . Create a table with the aggregated performance data. Use table-level permissions to grant access to the payroll specialist.
D . Create a SQL query with the aggregated performance data. Export the results to an Avro file in a Cloud Storage bucket. Share the bucket with the payroll specialist.

Lösung einblenden Lösung ausblenden

Question #29

You are predicting customer churn for a subscription-based service. You have a 50 PB historical customer dataset in BigQuery that includes demographics, subscription information, and engagement metrics. You want to build a churn prediction model with minimal overhead. You want to follow the Google-recommended approach.

What should you do?

A . Export the data from BigQuery to a local machine. Use scikit-learn in a Jupyter notebook to build the churn prediction model.
B . Use Dataproc to create a Spark cluster. Use the Spark MLlib within the cluster to build the churn prediction model.
C . Create a Looker dashboard that is connected to BigQuery. Use LookML to predict churn.
D . Use the BigQuery Python client library in a Jupyter notebook to query and preprocess the data in BigQuery. Use the CREATE MODEL statement in BigQueryML to train the churn prediction model.

Lösung einblenden Lösung ausblenden

Question #30

Your company is building a near real-time streaming pipeline to process JSON telemetry data from small appliances. You need to process messages arriving at a Pub/Sub topic, capitalize letters in the serial number field, and write results to BigQuery. You want to use a managed service and write a minimal amount of code for underlying transformations.

What should you do?

A . Use a Pub/Sub to BigQuery subscription, write results directly to BigQuery, and schedule a transformation query to run every five minutes.
B . Use a Pub/Sub to Cloud Storage subscription, write a Cloud Run service that is triggered when objects arrive in the bucket, performs the transformations, and writes the results to BigQuery.
C . Use the “Pub/Sub to BigQuery” Dataflow template with a UDF, and write the results to BigQuery.
D . Use a Pub/Sub push subscription, write a Cloud Run service that accepts the messages, performs the transformations, and writes the results to BigQuery.

Lösung einblenden Lösung ausblenden