Skip to content

Latest commit

 

History

History
28 lines (18 loc) · 1.71 KB

frequent_values.md

File metadata and controls

28 lines (18 loc) · 1.71 KB

Let's say you have a BigQuery table storing customer product reviews. Each row represents a review and includes a column named keywords which is an array of strings representing keywords extracted from the review text.

You want to identify the most frequently occurring keywords across all reviews to understand trending topics or product features that customers frequently mention.

Here's how you can use the frequent_values function:

SELECT bigfunctions.us.frequent_values(ARRAY_AGG(keywords), 0.05) AS frequent_keywords
FROM `your_project.your_dataset.your_table`

This query does the following:

  1. ARRAY_AGG(keywords): Aggregates all the keywords arrays from each review into a single array of all keywords.
  2. bigfunctions.us.frequent_values(..., 0.05): Applies the frequent_values function to this aggregated array with a frequency_threshold of 0.05. This means that only keywords that appear in at least 5% of the reviews will be returned.
  3. AS frequent_keywords: Aliases the resulting array of frequent keywords as frequent_keywords.

This will give you an array of strings containing the keywords that occur most frequently in your customer reviews, allowing you to identify important themes and trends.

Other Use Cases:

  • Log analysis: Identify frequent error messages or user actions in log data.
  • E-commerce: Find frequently purchased items together (market basket analysis).
  • Social media analysis: Detect trending hashtags or topics.
  • Genomics: Identify frequently occurring gene mutations in a population.

Essentially, any time you need to find frequently occurring elements within a large dataset of arrays, the frequent_values function can be a useful tool.