title | description |
---|---|
Tables |
Catalog of open-source Tables |
!!! note ""
**✅ The following public tables** are
- deployed in `bigfunctions` GCP project in 39 datasets for all of the 39 BigQuery regions.
- They are public, so they can be read by anyone under the eventual license given in the description of the table.
- For any question or difficulties, please read [Getting Started](../).
- If you prefer to create the tables in your own project, read [Getting Started](../).
- Found a bug? Please raise an issue [here](https://github.com/unytics/bigfunctions/issues/new/choose)
??? info "All BigFunctions Datasets >"
| Region | Dataset |
|--------|---------|
| `EU` | `bigfunctions.eu` |
| `US` | `bigfunctions.us` |
| `asia-east1` | `bigfunctions.asia_east1` |
| `asia-east2` | `bigfunctions.asia_east2` |
| `asia-northeast1` | `bigfunctions.asia_northeast1` |
| `asia-northeast2` | `bigfunctions.asia_northeast2` |
| `asia-northeast3` | `bigfunctions.asia_northeast3` |
| `asia-south1` | `bigfunctions.asia_south1` |
| `asia-south2` | `bigfunctions.asia_south2` |
| `asia-southeast1` | `bigfunctions.asia_southeast1` |
| `asia-southeast2` | `bigfunctions.asia_southeast2` |
| `australia-southeast1` | `bigfunctions.australia_southeast1` |
| `australia-southeast2` | `bigfunctions.australia_southeast2` |
| `europe-central2` | `bigfunctions.europe_central2` |
| `europe-north1` | `bigfunctions.europe_north1` |
| `europe-southwest1` | `bigfunctions.europe_southwest1` |
| `europe-west1` | `bigfunctions.europe_west1` |
| `europe-west2` | `bigfunctions.europe_west2` |
| `europe-west3` | `bigfunctions.europe_west3` |
| `europe-west4` | `bigfunctions.europe_west4` |
| `europe-west6` | `bigfunctions.europe_west6` |
| `europe-west8` | `bigfunctions.europe_west8` |
| `europe-west9` | `bigfunctions.europe_west9` |
| `europe-west12` | `bigfunctions.europe_west12` |
| `me-central1` | `bigfunctions.me_central1` |
| `me-west1` | `bigfunctions.me_west1` |
| `northamerica-northeast1` | `bigfunctions.northamerica_northeast1` |
| `northamerica-northeast2` | `bigfunctions.northamerica_northeast2` |
| `southamerica-east1` | `bigfunctions.southamerica_east1` |
| `southamerica-west1` | `bigfunctions.southamerica_west1` |
| `us-central1` | `bigfunctions.us_central1` |
| `us-east1` | `bigfunctions.us_east1` |
| `us-east4` | `bigfunctions.us_east4` |
| `us-east5` | `bigfunctions.us_east5` |
| `us-south1` | `bigfunctions.us_south1` |
| `us-west1` | `bigfunctions.us_west1` |
| `us-west2` | `bigfunctions.us_west2` |
| `us-west3` | `bigfunctions.us_west3` |
| `us-west4` | `bigfunctions.us_west4` |
Description
Contains national public holidays for every country.
Dates were issued from holidays python package with:
import holidays
countries = holidays.utils.list_supported_countries()
for country in countries:
dates = holidays.country_holidays(country, years=range(1975, 2077))
dates = sorted([str(date) for date in dates.keys()])
print('\\n'.join(f'{country},{date}' for date in dates))
Schema
Column | Type | Description |
---|---|---|
country | string | country code |
date | date | public holiday local date |
Description
Copy of bigquery-public-data.samples.natality public table.
The table was created with:
CREATE TABLE `bigfunctions.us.natality`
(
source_year INT64 NOT NULL OPTIONS(description="Four-digit year of the birth. Example: 1975."),
year INT64 OPTIONS(description="Four-digit year of the birth. Example: 1975."),
month INT64 OPTIONS(description="Month index of the date of birth, where 1=January."),
day INT64 OPTIONS(description="Day of birth, starting from 1."),
wday INT64 OPTIONS(description="Day of the week, where 1 is Sunday and 7 is Saturday."),
state STRING OPTIONS(description="The two character postal code for the state. Entries after 2004 do not include this value."),
is_male BOOL NOT NULL OPTIONS(description="TRUE if the child is male, FALSE if female."),
child_race INT64 OPTIONS(description="The race of the child. One of the following numbers:\n\n1 - White\n2 - Black\n3 - American Indian\n4 - Chinese\n5 - Japanese\n6 - Hawaiian\n7 - Filipino\n9 - Unknown/Other\n18 - Asian Indian\n28 - Korean\n39 - Samoan\n48 - Vietnamese"),
weight_pounds FLOAT64 OPTIONS(description="Weight of the child, in pounds."),
plurality INT64 OPTIONS(description="How many children were born as a result of this pregnancy. twins=2, triplets=3, and so on."),
apgar_1min INT64 OPTIONS(description="Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 1 minute. Available from 1978-2002."),
apgar_5min INT64 OPTIONS(description="Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 5 minutes. Available from 1978-2002."),
mother_residence_state STRING OPTIONS(description="The two-letter postal code of the mother's state of residence when the child was born."),
mother_race INT64 OPTIONS(description="Race of the mother. Same values as child_race."),
mother_age INT64 OPTIONS(description="Reported age of the mother when giving birth."),
gestation_weeks INT64 OPTIONS(description="The number of weeks of the pregnancy."),
lmp STRING OPTIONS(description="Date of the last menstrual period in the format MMDDYYYY. Unknown values are recorded as \"99\" or \"9999\"."),
mother_married BOOL OPTIONS(description="True if the mother was married when she gave birth."),
mother_birth_state STRING OPTIONS(description="The two-letter postal code of the mother's birth state."),
cigarette_use BOOL OPTIONS(description="True if the mother smoked cigarettes. Available starting 2003."),
cigarettes_per_day INT64 OPTIONS(description="Number of cigarettes smoked by the mother per day. Available starting 2003."),
alcohol_use BOOL OPTIONS(description="True if the mother used alcohol. Available starting 1989."),
drinks_per_week INT64 OPTIONS(description="Number of drinks per week consumed by the mother. Available starting 1989."),
weight_gain_pounds INT64 OPTIONS(description="Number of pounds gained by the mother during pregnancy."),
born_alive_alive INT64 OPTIONS(description="Number of children previously born to the mother who are now living."),
born_alive_dead INT64 OPTIONS(description="Number of children previously born to the mother who are now dead."),
born_dead INT64 OPTIONS(description="Number of children who were born dead (i.e. miscarriages)"),
ever_born INT64 OPTIONS(description="Total number of children to whom the woman has ever given birth (includes the current birth)."),
father_race INT64 OPTIONS(description="Race of the father. Same values as child_race."),
father_age INT64 OPTIONS(description="Age of the father when the child was born."),
record_weight INT64 OPTIONS(description="1 or 2, where 1 is a row from a full-reporting area, and 2 is a row from a 50% sample area.")
)
as
select * from `bigquery-public-data.samples.natality` tablesample system (10 percent)
Schema
Column | Type | Description |
---|---|---|
source_year | INT64 | Four-digit year of the birth. Example: 1975. |
year | INT64 | Four-digit year of the birth. Example: 1975. |
month | INT64 | Month index of the date of birth, where 1=January. |
day | INT64 | Day of birth, starting from 1. |
wday | INT64 | Day of the week, where 1 is Sunday and 7 is Saturday. |
state | STRING | The two character postal code for the state. Entries after 2004 do not include this value. |
is_male | BOOL | TRUE if the child is male, FALSE if female. |
child_race | INT64 | The race of the child. One of the following numbers: 1 - White 2 - Black 3 - American Indian 4 - Chinese 5 - Japanese 6 - Hawaiian 7 - Filipino 9 - Unknown/Other 18 - Asian Indian 28 - Korean 39 - Samoan 48 - Vietnamese |
weight_pounds | FLOAT64 | Weight of the child, in pounds. |
plurality | INT64 | How many children were born as a result of this pregnancy. twins=2, triplets=3, and so on. |
apgar_1min | INT64 | Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 1 minute. Available from 1978-2002. |
apgar_5min | INT64 | Apgar scores measure the health of a newborn child on a scale from 0-10. Value after 5 minutes. Available from 1978-2002. |
mother_residence_state | STRING | The two-letter postal code of the mother's state of residence when the child was born. |
mother_race | INT64 | Race of the mother. Same values as child_race. |
mother_age | INT64 | Reported age of the mother when giving birth. |
gestation_weeks | INT64 | The number of weeks of the pregnancy. |
lmp | STRING | Date of the last menstrual period in the format MMDDYYYY. Unknown values are recorded as "99" or "9999". |
mother_married | BOOL | True if the mother was married when she gave birth. |
mother_birth_state | STRING | The two-letter postal code of the mother's birth state. |
cigarette_use | BOOL | True if the mother smoked cigarettes. Available starting 2003. |
cigarettes_per_day | INT64 | Number of cigarettes smoked by the mother per day. Available starting 2003. |
alcohol_use | BOOL | True if the mother used alcohol. Available starting 1989. |
drinks_per_week | INT64 | Number of drinks per week consumed by the mother. Available starting 1989. |
weight_gain_pounds | INT64 | Number of pounds gained by the mother during pregnancy. |
born_alive_alive | INT64 | Number of children previously born to the mother who are now living. |
born_alive_dead | INT64 | Number of children previously born to the mother who are now dead. |
born_dead | INT64 | Number of children who were born dead (i.e. miscarriages) |
ever_born | INT64 | Total number of children to whom the woman has ever given birth (includes the current birth). |
father_race | INT64 | Race of the father. Same values as child_race. |
father_age | INT64 | Age of the father when the child was born. |
record_weight | INT64 | 1 or 2, where 1 is a row from a full-reporting area, and 2 is a row from a 50% sample area. |
Description
Sales
Schema
Column | Type | Description |
---|---|---|
product_id | int64 | Product Identifier |
price | float64 | Unit Price of product |
quantity | int64 | Quantity of this product sold |
sale_date | date | Sale date |
Description
Example of a graph represented as a list of edges between pair of nodes
Schema
Column | Type | Description |
---|---|---|
node1 | int64 | node id connected to node2 |
node2 | int64 | node id connected to node1 |
Description
Days translations for each locale
This csv was generated by:
import datetime
import locale
languages = [locale for locale in locale.locale_alias if '.' not in locale and '@' not in locale]
dates = [datetime.datetime(2023, 1, day) for day in range(1, 8)]
with open('data/translated_days.csv', 'w', encoding='utf-8') as out:
for language in languages:
try:
locale.setlocale(locale.LC_TIME, language)
for k, date in enumerate(dates):
name = date.strftime("%A")
out.write(f'{language},{k + 1},{name}\n')
except:
pass
Schema
Column | Type | Description |
---|---|---|
locale | string | Locale such as 'en' or 'en_en' |
day_nb | int64 | Day as integer from 1 to 7 with Sunday as 1 |
translated_day | string | Translated day name for locale |
Description
Months translations for each locale
This csv was generated by:
import datetime
import locale
languages = [locale for locale in locale.locale_alias if '.' not in locale and '@' not in locale]
with open('translated_months.csv', 'w', encoding='utf-8') as out:
for language in languages:
try:
locale.setlocale(locale.LC_TIME, language)
for month in range(0, 12):
month = month + 1
date = datetime.datetime(2023, month, 1)
name = date.strftime("%B")
out.write(f'{language},{month},{name}\n')
except:
pass
Schema
Column | Type | Description |
---|---|---|
locale | string | Locale such as 'en' or 'en_en' |
month_nb | int64 | Month as integer from 1 to 12 |
translated_month | string | Translated month name for locale |