Let's imagine you have a BigQuery table that tracks real-time sensor data from a manufacturing plant. You want to trigger a downstream process whenever a sensor reading exceeds a certain threshold. This downstream process might be an alert system, an automated adjustment to the machinery, or a data pipeline for further analysis. Instead of constantly polling the BigQuery table, you can use the export_to_pubsub
function to push relevant data to a Pub/Sub topic in real-time.
Here's how it would work:
-
BigQuery Streaming Inserts: Sensor data is streamed into a BigQuery table.
-
BigQuery Scheduled Query: A scheduled query runs periodically (e.g., every minute) to check for sensor readings exceeding the threshold. The query might look something like this:
SELECT sensor_id, timestamp, reading FROM `your-project.your_dataset.sensor_data` WHERE reading > 1000 AND _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE)
-
Call
export_to_pubsub
within the query: If the query returns any results, you integrate theexport_to_pubsub
function within the query itself:SELECT bigfunctions.your_region.export_to_pubsub( 'your-project', 'sensor_alerts_topic', TO_JSON_STRING(t), -- Send the entire row as the message data '{"sensor_type": "temperature"}' -- Add attributes for context ) FROM ( SELECT sensor_id, timestamp, reading FROM `your-project.your_dataset.sensor_data` WHERE reading > 1000 AND _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) ) AS t;
-
Pub/Sub triggers downstream process: The
sensor_alerts_topic
is subscribed to by a service that handles the alerts or automated actions. When a message arrives on the topic, the subscriber is triggered, and the downstream process is initiated.
Benefits:
- Real-time responsiveness: Alerts and actions are triggered as soon as the threshold is crossed, without needing to constantly poll BigQuery.
- Scalability: Pub/Sub handles the message distribution, ensuring that the downstream systems can scale to handle a large volume of alerts.
- Decoupling: BigQuery is decoupled from the downstream processes, making the system more flexible and maintainable.
- Reduced cost: Avoids the cost and latency of repeatedly querying BigQuery for new data.
This is just one example. The export_to_pubsub
function can be used in various scenarios where you need to push data from BigQuery to other systems in a real-time or near real-time fashion. Other examples include:
- New user registration: Push new user data to a Pub/Sub topic for welcome email processing.
- Order fulfillment: Notify downstream systems about new orders placed.
- Fraud detection: Push suspicious transactions to a Pub/Sub topic for real-time analysis.
- Data synchronization: Keep other systems in sync with changes happening in BigQuery.