Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions packages/gcp/_dev/build/docs/dataproc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Dataproc

## Metrics

The `dataproc` dataset fetches metrics from [Dataproc](https://cloud.google.com/dataproc/) in Google Cloud Platform. It contains all metrics exported from the [GCP Dataproc Monitoring API](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-dataproc).

You can specify a single region to fetch metrics like `us-central1`. Be aware that GCP Dataproc is a regional service. If no region is specified, it will return metrics from all buckets.

## Sample Event

{{event "dataproc"}}

## Exported fields

{{fields "dataproc"}}
5 changes: 5 additions & 0 deletions packages/gcp/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "2.9.0"
changes:
- description: Add GCP Dataproc Data stream
type: enhancement
link: https://github.com/elastic/integrations/pull/3789
- version: "2.8.0"
changes:
- description: Add GCP GKE Data Stream
Expand Down
13 changes: 13 additions & 0 deletions packages/gcp/data_stream/dataproc/agent/stream/stream.yml.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
metricsets: ["dataproc"]
period: {{period}}
project_id: {{project_id}}
{{#if credentials_file}}
credentials_file_path: {{credentials_file}}
{{/if}}
{{#if credentials_json}}
credentials_json: '{{credentials_json}}'
{{/if}}
{{#if region}}
region: {{region}}
{{/if}}
exclude_labels: {{exclude_labels}}
198 changes: 198 additions & 0 deletions packages/gcp/data_stream/dataproc/fields/agent.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
- name: cloud
title: Cloud
group: 2
description: Fields related to the cloud or infrastructure the events are coming from.
footnote: 'Examples: If Metricbeat is running on an GCP Compute VM and fetches data from its host, the cloud info contains the data about this machine. If Metricbeat runs on a remote machine outside the cloud and fetches data from a service running in the cloud, the field contains cloud data from the machine the service is running on.'
type: group
fields:
- name: account.id
level: extended
type: keyword
ignore_above: 1024
description: 'The cloud account or organization id used to identify different entities in a multi-tenant environment.

Examples: AWS account id, Google Cloud ORG Id, or other unique identifier.'
example: 666777888999
- name: availability_zone
level: extended
type: keyword
ignore_above: 1024
description: Availability zone in which this host is running.
example: us-east-1c
- name: instance.id
level: extended
type: keyword
ignore_above: 1024
description: Instance ID of the host machine.
example: i-1234567890abcdef0
- name: instance.name
level: extended
type: keyword
ignore_above: 1024
description: Instance name of the host machine.
- name: machine.type
level: extended
type: keyword
ignore_above: 1024
description: Machine type of the host machine.
example: t2.medium
- name: provider
level: extended
type: keyword
ignore_above: 1024
description: Name of the cloud provider. Example values are aws, azure, gcp, or digitalocean.
example: aws
- name: region
level: extended
type: keyword
ignore_above: 1024
description: Region in which this host is running.
example: us-east-1
- name: project.id
type: keyword
description: Name of the project in Google Cloud.
- name: image.id
type: keyword
description: Image ID for the cloud instance.
- name: container
title: Container
group: 2
description: 'Container fields are used for meta information about the specific container that is the source of information.

These fields help correlate data based containers from any runtime.'
type: group
fields:
- name: id
level: core
type: keyword
ignore_above: 1024
description: Unique container id.
- name: image.name
level: extended
type: keyword
ignore_above: 1024
description: Name of the image the container was built on.
- name: labels
level: extended
type: object
object_type: keyword
description: Image labels.
- name: name
level: extended
type: keyword
ignore_above: 1024
description: Container name.
- name: host
title: Host
group: 2
description: 'A host is defined as a general computing instance.

ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.'
type: group
fields:
- name: architecture
level: core
type: keyword
ignore_above: 1024
description: Operating system architecture.
example: x86_64
- name: domain
level: extended
type: keyword
ignore_above: 1024
description: 'Name of the domain of which the host is a member.

For example, on Windows this could be the host''s Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host''s LDAP provider.'
example: CONTOSO
default_field: false
- name: hostname
level: core
type: keyword
ignore_above: 1024
description: 'Hostname of the host.

It normally contains what the `hostname` command returns on the host machine.'
- name: id
level: core
type: keyword
ignore_above: 1024
description: 'Unique host id.

As hostname is not always unique, use values that are meaningful in your environment.

Example: The current usage of `beat.name`.'
- name: ip
level: core
type: ip
description: Host ip addresses.
- name: mac
level: core
type: keyword
ignore_above: 1024
description: Host mac addresses.
- name: name
level: core
type: keyword
ignore_above: 1024
description: 'Name of the host.

It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use.'
- name: os.family
level: extended
type: keyword
ignore_above: 1024
description: OS family (such as redhat, debian, freebsd, windows).
example: debian
- name: os.kernel
level: extended
type: keyword
ignore_above: 1024
description: Operating system kernel version as a raw string.
example: 4.4.0-112-generic
- name: os.name
level: extended
type: keyword
ignore_above: 1024
multi_fields:
- name: text
type: text
norms: false
default_field: false
description: Operating system name, without the version.
example: Mac OS X
- name: os.platform
level: extended
type: keyword
ignore_above: 1024
description: Operating system platform (such centos, ubuntu, windows).
example: darwin
- name: os.version
level: extended
type: keyword
ignore_above: 1024
description: Operating system version as a raw string.
example: 10.14.1
- name: type
level: core
type: keyword
ignore_above: 1024
description: 'Type of host.

For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment.'
- name: containerized
type: boolean
description: >
If the host is a container.

- name: os.build
type: keyword
example: "18D109"
description: >
OS build information.

- name: os.codename
type: keyword
example: "stretch"
description: >
OS codename, if any.

20 changes: 20 additions & 0 deletions packages/gcp/data_stream/dataproc/fields/base-fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
- name: data_stream.type
type: constant_keyword
description: Data stream type.
- name: data_stream.dataset
type: constant_keyword
description: Data stream dataset.
- name: data_stream.namespace
type: constant_keyword
description: Data stream namespace.
- name: '@timestamp'
type: date
description: Event timestamp.
- name: event.module
type: constant_keyword
description: Event module
value: gcp
- name: event.dataset
type: constant_keyword
description: Event dataset
value: gcp.dataproc
24 changes: 24 additions & 0 deletions packages/gcp/data_stream/dataproc/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
- external: ecs
name: cloud
- external: ecs
name: cloud.account.id
- external: ecs
name: cloud.account.name
- external: ecs
name: cloud.availability_zone
- external: ecs
name: cloud.instance.id
- external: ecs
name: cloud.machine.type
- external: ecs
name: cloud.provider
- external: ecs
name: cloud.region
- external: ecs
name: ecs.version
- external: ecs
name: error
- external: ecs
name: error.message
- external: ecs
name: service.type
74 changes: 74 additions & 0 deletions packages/gcp/data_stream/dataproc/fields/fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
- name: gcp.dataproc
description: Google Cloud Dataproc metrics
type: group
fields:
- name: batch.spark.executors.count
type: long
description: Indicates the number of Batch Spark executors.
- name: cluster.hdfs.datanodes.count
type: long
description: Indicates the number of HDFS DataNodes that are running inside a cluster.
- name: cluster.hdfs.storage_capacity.value
type: double
description: Indicates capacity of HDFS system running on cluster in GB.
- name: cluster.hdfs.storage_utilization.value
type: double
description: The percentage of HDFS storage currently used.
- name: cluster.hdfs.unhealthy_blocks.count
type: long
description: Indicates the number of unhealthy blocks inside the cluster.
- name: cluster.job.failed.count
type: long
description: Indicates the number of jobs that have failed on a cluster.
- name: cluster.job.running.count
type: long
description: Indicates the number of jobs that are running on a cluster.
- name: cluster.job.submitted.count
type: long
description: Indicates the number of jobs that have been submitted to a cluster.
- name: cluster.operation.failed.count
type: long
description: Indicates the number of operations that have failed on a cluster.
- name: cluster.operation.running.count
type: long
description: Indicates the number of operations that are running on a cluster.
- name: cluster.operation.submitted.count
type: long
description: Indicates the number of operations that have been submitted to a cluster.
- name: cluster.yarn.allocated_memory_percentage.value
type: double
description: The percentage of YARN memory is allocated.
- name: cluster.yarn.apps.count
type: long
description: Indicates the number of active YARN applications.
- name: cluster.yarn.containers.count
type: long
description: Indicates the number of YARN containers.
- name: cluster.yarn.memory_size.value
type: double
description: Indicates the YARN memory size in GB.
- name: cluster.yarn.nodemanagers.count
type: long
description: Indicates the number of YARN NodeManagers running inside cluster.
- name: cluster.yarn.pending_memory_size.value
type: double
description: The current memory request, in GB, that is pending to be fulfilled by the scheduler.
- name: cluster.yarn.virtual_cores.count
type: long
description: Indicates the number of virtual cores in YARN.
- name: cluster.job.completion_time.value
type: object
object_type: histogram
description: The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.
- name: cluster.job.duration.value
type: object
object_type: histogram
description: The time jobs have spent in a given state.
- name: cluster.operation.completion_time.value
type: object
object_type: histogram
description: The time operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed.
- name: cluster.operation.duration.value
type: object
object_type: histogram
description: The time operations have spent in a given state.
31 changes: 31 additions & 0 deletions packages/gcp/data_stream/dataproc/fields/package-fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
- name: gcp
description: >-
GCP module
fields:
- name: labels
type: object
description: >-
GCP monitoring metrics labels
fields:
- name: user.*
type: object
object_type: keyword
- name: metadata.*
type: object
object_type: keyword
- name: metrics.*
type: object
object_type: keyword
- name: system.*
type: object
object_type: keyword
- name: resource.*
type: object
object_type: keyword
- name: "metrics.*.*.*.*"
type: object
object_type: double
object_type_mapping_type: "*"
description: >
Metrics that returned from Google Cloud API query.

Loading