Skip to content

Latest commit

 

History

History
53 lines (34 loc) · 2.59 KB

percentile_value.md

File metadata and controls

53 lines (34 loc) · 2.59 KB

Let's illustrate a use case for the percentile_value BigQuery function.

Scenario: You have a table storing website session durations (in seconds) for different users. You want to analyze user engagement and identify the 95th percentile of session durations. This will help you understand how long highly engaged users typically spend on your site.

Table Schema:

CREATE OR REPLACE TABLE `your_project.your_dataset.session_durations` (
  user_id INT64,
  session_duration INT64
);

INSERT INTO `your_project.your_dataset.session_durations` (user_id, session_duration) VALUES
(1, 120), (2, 300), (3, 60), (4, 1800), (5, 45), (6, 900), (7, 240), (8, 30), (9, 600), (10, 150);

Query using percentile_value:

SELECT
    bigfunctions.us.percentile_value(ARRAY_AGG(session_duration), 0.95) AS p95_session_duration
  FROM
    `your_project.your_dataset.session_durations`;

Explanation:

  1. ARRAY_AGG(session_duration): This aggregates all session durations into an array.
  2. bigfunctions.us.percentile_value(..., 0.95): This calculates the 95th percentile value from the array of session durations. Remember to replace us with your BigQuery region if different.
  3. AS p95_session_duration: This aliases the result for clarity.

Result:

The query will return a single value representing the 95th percentile of session durations. This value indicates that 95% of sessions are shorter than or equal to this duration. Let's say the result is 1500 seconds. This tells you that highly engaged users tend to have sessions lasting around 1500 seconds or less.

Benefits of using percentile_value:

  • Simplified calculation: Instead of manually implementing percentile logic, you can use this function directly.
  • Efficiency: BigQuery functions are generally optimized for performance.
  • Flexibility: You can easily change the percentile value (e.g., to calculate the median (50th percentile) or other percentiles) by adjusting the second argument.

This is a simple example. You can apply this function to any scenario where you need to calculate percentiles from an array of values within BigQuery, such as:

  • E-commerce: Analyzing product prices, order values, or customer spending.
  • Gaming: Analyzing player scores, playtime, or in-game purchases.
  • Finance: Analyzing stock prices, transaction amounts, or customer balances.
  • Healthcare: Analyzing patient wait times, treatment costs, or lengths of stay.

By using percentile_value, you can gain valuable insights into the distribution of your data and identify important thresholds or outliers.