Get
|
|||||
|
|
||||
Gets the resource representation for a cluster in a project
Authorization
To use this building block you will have to grant access to at least one of the following scopes:
- View and manage your data across Google Cloud Platform services
Input
This building block consumes 3 input parameters
| Name | Format | Description |
|---|---|---|
projectId Required |
STRING |
Required. The ID of the Google Cloud Platform project that the cluster belongs to |
region Required |
STRING |
Required. The Cloud Dataproc region in which to handle the request |
clusterName Required |
STRING |
Required. The cluster name |
= Parameter name
= Format
|
projectId STRING Required Required. The ID of the Google Cloud Platform project that the cluster belongs to |
|
region STRING Required Required. The Cloud Dataproc region in which to handle the request |
|
clusterName STRING Required Required. The cluster name |
Output
This building block provides 106 output parameters
| Name | Format | Description |
|---|---|---|
labels |
OBJECT |
Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster |
labels.customKey.value |
STRING |
Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster |
metrics |
OBJECT |
Contains cluster daemon metrics, such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only. It may be changed before final release |
metrics.hdfsMetrics |
OBJECT |
The HDFS metrics |
metrics.hdfsMetrics.customKey.value |
INTEGER |
The HDFS metrics |
metrics.yarnMetrics |
OBJECT |
The YARN metrics |
metrics.yarnMetrics.customKey.value |
INTEGER |
The YARN metrics |
status |
OBJECT |
The status of a cluster and its instances |
status.detail |
STRING |
Output only. Optional details of cluster's state |
status.state |
ENUMERATION |
Output only. The cluster's state |
status.stateStartTime |
ANY |
Output only. Time when this state was entered |
status.substate |
ENUMERATION |
Output only. Additional state information that includes status reported by the agent |
statusHistory[] |
OBJECT |
The status of a cluster and its instances |
statusHistory[].detail |
STRING |
Output only. Optional details of cluster's state |
statusHistory[].state |
ENUMERATION |
Output only. The cluster's state |
statusHistory[].stateStartTime |
ANY |
Output only. Time when this state was entered |
statusHistory[].substate |
ENUMERATION |
Output only. Additional state information that includes status reported by the agent |
config |
OBJECT |
The cluster config |
config.workerConfig |
OBJECT |
Optional. The config settings for Compute Engine resources in an instance group, such as a master or worker group |
config.workerConfig.instanceNames[] |
STRING |
|
config.workerConfig.accelerators[] |
OBJECT |
Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine |
config.workerConfig.accelerators[].acceleratorCount |
INTEGER |
The number of the accelerator cards of this type exposed to this instance |
config.workerConfig.accelerators[].acceleratorTypeUri |
STRING |
Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.Examples: https://www.googleapis.com/compute/beta/projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 nvidia-tesla-k80Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-k80 |
config.workerConfig.numInstances |
INTEGER |
Optional. The number of VM instances in the instance group. For master instance groups, must be set to 1 |
config.workerConfig.diskConfig |
OBJECT |
Specifies the config of disk options for a group of VM instances |
config.workerConfig.diskConfig.bootDiskType |
STRING |
Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive) |
config.workerConfig.diskConfig.numLocalSsds |
INTEGER |
Optional. Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries |
config.workerConfig.diskConfig.bootDiskSizeGb |
INTEGER |
Optional. Size in GB of the boot disk (default is 500GB) |
config.workerConfig.managedGroupConfig |
OBJECT |
Specifies the resources used to actively manage an instance group |
config.workerConfig.managedGroupConfig.instanceGroupManagerName |
STRING |
Output only. The name of the Instance Group Manager for this group |
config.workerConfig.managedGroupConfig.instanceTemplateName |
STRING |
Output only. The name of the Instance Template used for the Managed Instance Group |
config.workerConfig.isPreemptible |
BOOLEAN |
Optional. Specifies that this instance group contains preemptible instances |
config.workerConfig.imageUri |
STRING |
Optional. The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version |
config.workerConfig.machineTypeUri |
STRING |
Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 |
config.gceClusterConfig |
OBJECT |
Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster |
config.gceClusterConfig.tags[] |
STRING |
|
config.gceClusterConfig.serviceAccount |
STRING |
Optional. The service account of the instances. Defaults to the default Compute Engine service account. Custom service accounts need permissions equivalent to the following IAM roles: roles/logging.logWriter roles/storage.objectAdmin(see https://cloud.google.com/compute/docs/access/service-accounts#custom_service_accounts for more information). Example: [account_id]@[project_id].iam.gserviceaccount.com |
config.gceClusterConfig.subnetworkUri |
STRING |
Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/regions/us-east1/subnetworks/sub0 projects/[project_id]/regions/us-east1/subnetworks/sub0 sub0 |
config.gceClusterConfig.networkUri |
STRING |
Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither network_uri nor subnetwork_uri is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks for more information).A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/regions/global/default projects/[project_id]/regions/global/default default |
config.gceClusterConfig.zoneUri |
STRING |
Optional. The zone where the Compute Engine cluster will be located. On a create request, it is required in the "global" region. If omitted in a non-global Cloud Dataproc region, the service will pick a zone in the corresponding Compute Engine region. On a get request, zone will always be present.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone] projects/[project_id]/zones/[zone] us-central1-f |
config.gceClusterConfig.internalIpOnly |
BOOLEAN |
Optional. If true, all instances in the cluster will only have internal IP addresses. By default, clusters are not restricted to internal IP addresses, and will have ephemeral external IP addresses assigned to each instance. This internal_ip_only restriction can only be enabled for subnetwork enabled networks, and all off-cluster dependencies must be configured to be accessible without external IP addresses |
config.gceClusterConfig.metadata |
OBJECT |
The Compute Engine metadata entries to add to all instances (see Project and instance metadata (https://cloud.google.com/compute/docs/storing-retrieving-metadata#project_and_instance_metadata)) |
config.gceClusterConfig.metadata.customKey.value |
STRING |
The Compute Engine metadata entries to add to all instances (see Project and instance metadata (https://cloud.google.com/compute/docs/storing-retrieving-metadata#project_and_instance_metadata)) |
config.gceClusterConfig.serviceAccountScopes[] |
STRING |
|
config.softwareConfig |
OBJECT |
Specifies the selection and config of software inside the cluster |
config.softwareConfig.imageVersion |
STRING |
Optional. The version of software inside the cluster. It must be one of the supported Cloud Dataproc Versions, such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version. If unspecified, it defaults to the latest Debian version |
config.softwareConfig.properties |
OBJECT |
Optional. The properties to set on daemon config files.Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. The following are supported prefixes and their mappings: capacity-scheduler: capacity-scheduler.xml core: core-site.xml distcp: distcp-default.xml hdfs: hdfs-site.xml hive: hive-site.xml mapred: mapred-site.xml pig: pig.properties spark: spark-defaults.conf yarn: yarn-site.xmlFor more information, see Cluster properties |
config.softwareConfig.properties.customKey.value |
STRING |
Optional. The properties to set on daemon config files.Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. The following are supported prefixes and their mappings: capacity-scheduler: capacity-scheduler.xml core: core-site.xml distcp: distcp-default.xml hdfs: hdfs-site.xml hive: hive-site.xml mapred: mapred-site.xml pig: pig.properties spark: spark-defaults.conf yarn: yarn-site.xmlFor more information, see Cluster properties |
config.softwareConfig.optionalComponents[] |
ENUMERATION |
|
config.masterConfig |
OBJECT |
Optional. The config settings for Compute Engine resources in an instance group, such as a master or worker group |
config.masterConfig.instanceNames[] |
STRING |
|
config.masterConfig.accelerators[] |
OBJECT |
Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine |
config.masterConfig.accelerators[].acceleratorCount |
INTEGER |
The number of the accelerator cards of this type exposed to this instance |
config.masterConfig.accelerators[].acceleratorTypeUri |
STRING |
Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.Examples: https://www.googleapis.com/compute/beta/projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 nvidia-tesla-k80Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-k80 |
config.masterConfig.numInstances |
INTEGER |
Optional. The number of VM instances in the instance group. For master instance groups, must be set to 1 |
config.masterConfig.diskConfig |
OBJECT |
Specifies the config of disk options for a group of VM instances |
config.masterConfig.diskConfig.bootDiskType |
STRING |
Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive) |
config.masterConfig.diskConfig.numLocalSsds |
INTEGER |
Optional. Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries |
config.masterConfig.diskConfig.bootDiskSizeGb |
INTEGER |
Optional. Size in GB of the boot disk (default is 500GB) |
config.masterConfig.managedGroupConfig |
OBJECT |
Specifies the resources used to actively manage an instance group |
config.masterConfig.managedGroupConfig.instanceGroupManagerName |
STRING |
Output only. The name of the Instance Group Manager for this group |
config.masterConfig.managedGroupConfig.instanceTemplateName |
STRING |
Output only. The name of the Instance Template used for the Managed Instance Group |
config.masterConfig.isPreemptible |
BOOLEAN |
Optional. Specifies that this instance group contains preemptible instances |
config.masterConfig.imageUri |
STRING |
Optional. The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version |
config.masterConfig.machineTypeUri |
STRING |
Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 |
config.secondaryWorkerConfig |
OBJECT |
Optional. The config settings for Compute Engine resources in an instance group, such as a master or worker group |
config.secondaryWorkerConfig.instanceNames[] |
STRING |
|
config.secondaryWorkerConfig.accelerators[] |
OBJECT |
Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine |
config.secondaryWorkerConfig.accelerators[].acceleratorCount |
INTEGER |
The number of the accelerator cards of this type exposed to this instance |
config.secondaryWorkerConfig.accelerators[].acceleratorTypeUri |
STRING |
Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.Examples: https://www.googleapis.com/compute/beta/projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 nvidia-tesla-k80Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-k80 |
config.secondaryWorkerConfig.numInstances |
INTEGER |
Optional. The number of VM instances in the instance group. For master instance groups, must be set to 1 |
config.secondaryWorkerConfig.diskConfig |
OBJECT |
Specifies the config of disk options for a group of VM instances |
config.secondaryWorkerConfig.diskConfig.bootDiskType |
STRING |
Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive) |
config.secondaryWorkerConfig.diskConfig.numLocalSsds |
INTEGER |
Optional. Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries |
config.secondaryWorkerConfig.diskConfig.bootDiskSizeGb |
INTEGER |
Optional. Size in GB of the boot disk (default is 500GB) |
config.secondaryWorkerConfig.managedGroupConfig |
OBJECT |
Specifies the resources used to actively manage an instance group |
config.secondaryWorkerConfig.managedGroupConfig.instanceGroupManagerName |
STRING |
Output only. The name of the Instance Group Manager for this group |
config.secondaryWorkerConfig.managedGroupConfig.instanceTemplateName |
STRING |
Output only. The name of the Instance Template used for the Managed Instance Group |
config.secondaryWorkerConfig.isPreemptible |
BOOLEAN |
Optional. Specifies that this instance group contains preemptible instances |
config.secondaryWorkerConfig.imageUri |
STRING |
Optional. The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version |
config.secondaryWorkerConfig.machineTypeUri |
STRING |
Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 |
config.encryptionConfig |
OBJECT |
Encryption settings for the cluster |
config.encryptionConfig.gcePdKmsKeyName |
STRING |
Optional. The Cloud KMS key name to use for PD disk encryption for all instances in the cluster |
config.securityConfig |
OBJECT |
Security related configuration, including Kerberos |
config.securityConfig.kerberosConfig |
OBJECT |
Specifies Kerberos related configuration |
config.securityConfig.kerberosConfig.keystoreUri |
STRING |
Optional. The Cloud Storage URI of the keystore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate |
config.securityConfig.kerberosConfig.keyPasswordUri |
STRING |
Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key. For the self-signed certificate, this password is generated by Dataproc |
config.securityConfig.kerberosConfig.keystorePasswordUri |
STRING |
Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore. For the self-signed certificate, this password is generated by Dataproc |
config.securityConfig.kerberosConfig.crossRealmTrustAdminServer |
STRING |
Optional. The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship |
config.securityConfig.kerberosConfig.kdcDbKeyUri |
STRING |
Optional. The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database |
config.securityConfig.kerberosConfig.truststorePasswordUri |
STRING |
Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore. For the self-signed certificate, this password is generated by Dataproc |
config.securityConfig.kerberosConfig.enableKerberos |
BOOLEAN |
Optional. Flag to indicate whether to Kerberize the cluster |
config.securityConfig.kerberosConfig.truststoreUri |
STRING |
Optional. The Cloud Storage URI of the truststore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate |
config.securityConfig.kerberosConfig.crossRealmTrustRealm |
STRING |
Optional. The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust |
config.securityConfig.kerberosConfig.rootPrincipalPasswordUri |
STRING |
Required. The Cloud Storage URI of a KMS encrypted file containing the root principal password |
config.securityConfig.kerberosConfig.kmsKeyUri |
STRING |
Required. The uri of the KMS key used to encrypt various sensitive files |
config.securityConfig.kerberosConfig.crossRealmTrustKdc |
STRING |
Optional. The KDC (IP or hostname) for the remote trusted realm in a cross realm trust relationship |
config.securityConfig.kerberosConfig.crossRealmTrustSharedPasswordUri |
STRING |
Optional. The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship |
config.securityConfig.kerberosConfig.tgtLifetimeHours |
INTEGER |
Optional. The lifetime of the ticket granting ticket, in hours. If not specified, or user specifies 0, then default value 10 will be used |
config.initializationActions[] |
OBJECT |
Specifies an executable to run on a fully configured node and a timeout period for executable completion |
config.initializationActions[].executableFile |
STRING |
Required. Cloud Storage URI of executable file |
config.initializationActions[].executionTimeout |
ANY |
Optional. Amount of time executable has to complete. Default is 10 minutes. Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period |
config.configBucket |
STRING |
Optional. A Google Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Google Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Cloud Dataproc staging bucket) |
clusterName |
STRING |
Required. The cluster name. Cluster names within a project must be unique. Names of deleted clusters can be reused |
clusterUuid |
STRING |
Output only. A cluster UUID (Unique Universal Identifier). Cloud Dataproc generates this value when it creates the cluster |
projectId |
STRING |
Required. The Google Cloud Platform project ID that the cluster belongs to |
= Parameter name
= Format
|
labels OBJECT Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster |
|
labels.customKey.value STRING Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster |
|
metrics OBJECT Contains cluster daemon metrics, such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only. It may be changed before final release |
|
metrics.hdfsMetrics OBJECT The HDFS metrics |
|
metrics.hdfsMetrics.customKey.value INTEGER The HDFS metrics |
|
metrics.yarnMetrics OBJECT The YARN metrics |
|
metrics.yarnMetrics.customKey.value INTEGER The YARN metrics |
|
status OBJECT The status of a cluster and its instances |
|
status.detail STRING Output only. Optional details of cluster's state |
|
status.state ENUMERATION Output only. The cluster's state |
|
status.stateStartTime ANY Output only. Time when this state was entered |
|
status.substate ENUMERATION Output only. Additional state information that includes status reported by the agent |
|
statusHistory[] OBJECT The status of a cluster and its instances |
|
statusHistory[].detail STRING Output only. Optional details of cluster's state |
|
statusHistory[].state ENUMERATION Output only. The cluster's state |
|
statusHistory[].stateStartTime ANY Output only. Time when this state was entered |
|
statusHistory[].substate ENUMERATION Output only. Additional state information that includes status reported by the agent |
|
config OBJECT The cluster config |
|
config.workerConfig OBJECT Optional. The config settings for Compute Engine resources in an instance group, such as a master or worker group |
|
config.workerConfig.instanceNames[] STRING |
|
config.workerConfig.accelerators[] OBJECT Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine |
|
config.workerConfig.accelerators[].acceleratorCount INTEGER The number of the accelerator cards of this type exposed to this instance |
|
config.workerConfig.accelerators[].acceleratorTypeUri STRING Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.Examples: https://www.googleapis.com/compute/beta/projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 nvidia-tesla-k80Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-k80 |
|
config.workerConfig.numInstances INTEGER Optional. The number of VM instances in the instance group. For master instance groups, must be set to 1 |
|
config.workerConfig.diskConfig OBJECT Specifies the config of disk options for a group of VM instances |
|
config.workerConfig.diskConfig.bootDiskType STRING Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive) |
|
config.workerConfig.diskConfig.numLocalSsds INTEGER Optional. Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries |
|
config.workerConfig.diskConfig.bootDiskSizeGb INTEGER Optional. Size in GB of the boot disk (default is 500GB) |
|
config.workerConfig.managedGroupConfig OBJECT Specifies the resources used to actively manage an instance group |
|
config.workerConfig.managedGroupConfig.instanceGroupManagerName STRING Output only. The name of the Instance Group Manager for this group |
|
config.workerConfig.managedGroupConfig.instanceTemplateName STRING Output only. The name of the Instance Template used for the Managed Instance Group |
|
config.workerConfig.isPreemptible BOOLEAN Optional. Specifies that this instance group contains preemptible instances |
|
config.workerConfig.imageUri STRING Optional. The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version |
|
config.workerConfig.machineTypeUri STRING Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 |
|
config.gceClusterConfig OBJECT Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster |
|
config.gceClusterConfig.tags[] STRING |
|
config.gceClusterConfig.serviceAccount STRING Optional. The service account of the instances. Defaults to the default Compute Engine service account. Custom service accounts need permissions equivalent to the following IAM roles: roles/logging.logWriter roles/storage.objectAdmin(see https://cloud.google.com/compute/docs/access/service-accounts#custom_service_accounts for more information). Example: [account_id]@[project_id].iam.gserviceaccount.com |
|
config.gceClusterConfig.subnetworkUri STRING Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/regions/us-east1/subnetworks/sub0 projects/[project_id]/regions/us-east1/subnetworks/sub0 sub0 |
|
config.gceClusterConfig.networkUri STRING Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither network_uri nor subnetwork_uri is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks for more information).A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/regions/global/default projects/[project_id]/regions/global/default default |
|
config.gceClusterConfig.zoneUri STRING Optional. The zone where the Compute Engine cluster will be located. On a create request, it is required in the "global" region. If omitted in a non-global Cloud Dataproc region, the service will pick a zone in the corresponding Compute Engine region. On a get request, zone will always be present.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone] projects/[project_id]/zones/[zone] us-central1-f |
|
config.gceClusterConfig.internalIpOnly BOOLEAN Optional. If true, all instances in the cluster will only have internal IP addresses. By default, clusters are not restricted to internal IP addresses, and will have ephemeral external IP addresses assigned to each instance. This internal_ip_only restriction can only be enabled for subnetwork enabled networks, and all off-cluster dependencies must be configured to be accessible without external IP addresses |
|
config.gceClusterConfig.metadata OBJECT The Compute Engine metadata entries to add to all instances (see Project and instance metadata (https://cloud.google.com/compute/docs/storing-retrieving-metadata#project_and_instance_metadata)) |
|
config.gceClusterConfig.metadata.customKey.value STRING The Compute Engine metadata entries to add to all instances (see Project and instance metadata (https://cloud.google.com/compute/docs/storing-retrieving-metadata#project_and_instance_metadata)) |
|
config.gceClusterConfig.serviceAccountScopes[] STRING |
|
config.softwareConfig OBJECT Specifies the selection and config of software inside the cluster |
|
config.softwareConfig.imageVersion STRING Optional. The version of software inside the cluster. It must be one of the supported Cloud Dataproc Versions, such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version. If unspecified, it defaults to the latest Debian version |
|
config.softwareConfig.properties OBJECT Optional. The properties to set on daemon config files.Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. The following are supported prefixes and their mappings: capacity-scheduler: capacity-scheduler.xml core: core-site.xml distcp: distcp-default.xml hdfs: hdfs-site.xml hive: hive-site.xml mapred: mapred-site.xml pig: pig.properties spark: spark-defaults.conf yarn: yarn-site.xmlFor more information, see Cluster properties |
|
config.softwareConfig.properties.customKey.value STRING Optional. The properties to set on daemon config files.Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. The following are supported prefixes and their mappings: capacity-scheduler: capacity-scheduler.xml core: core-site.xml distcp: distcp-default.xml hdfs: hdfs-site.xml hive: hive-site.xml mapred: mapred-site.xml pig: pig.properties spark: spark-defaults.conf yarn: yarn-site.xmlFor more information, see Cluster properties |
|
config.softwareConfig.optionalComponents[] ENUMERATION |
|
config.masterConfig OBJECT Optional. The config settings for Compute Engine resources in an instance group, such as a master or worker group |
|
config.masterConfig.instanceNames[] STRING |
|
config.masterConfig.accelerators[] OBJECT Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine |
|
config.masterConfig.accelerators[].acceleratorCount INTEGER The number of the accelerator cards of this type exposed to this instance |
|
config.masterConfig.accelerators[].acceleratorTypeUri STRING Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.Examples: https://www.googleapis.com/compute/beta/projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 nvidia-tesla-k80Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-k80 |
|
config.masterConfig.numInstances INTEGER Optional. The number of VM instances in the instance group. For master instance groups, must be set to 1 |
|
config.masterConfig.diskConfig OBJECT Specifies the config of disk options for a group of VM instances |
|
config.masterConfig.diskConfig.bootDiskType STRING Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive) |
|
config.masterConfig.diskConfig.numLocalSsds INTEGER Optional. Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries |
|
config.masterConfig.diskConfig.bootDiskSizeGb INTEGER Optional. Size in GB of the boot disk (default is 500GB) |
|
config.masterConfig.managedGroupConfig OBJECT Specifies the resources used to actively manage an instance group |
|
config.masterConfig.managedGroupConfig.instanceGroupManagerName STRING Output only. The name of the Instance Group Manager for this group |
|
config.masterConfig.managedGroupConfig.instanceTemplateName STRING Output only. The name of the Instance Template used for the Managed Instance Group |
|
config.masterConfig.isPreemptible BOOLEAN Optional. Specifies that this instance group contains preemptible instances |
|
config.masterConfig.imageUri STRING Optional. The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version |
|
config.masterConfig.machineTypeUri STRING Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 |
|
config.secondaryWorkerConfig OBJECT Optional. The config settings for Compute Engine resources in an instance group, such as a master or worker group |
|
config.secondaryWorkerConfig.instanceNames[] STRING |
|
config.secondaryWorkerConfig.accelerators[] OBJECT Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine |
|
config.secondaryWorkerConfig.accelerators[].acceleratorCount INTEGER The number of the accelerator cards of this type exposed to this instance |
|
config.secondaryWorkerConfig.accelerators[].acceleratorTypeUri STRING Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.Examples: https://www.googleapis.com/compute/beta/projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 projects/[project_id]/zones/us-east1-a/acceleratorTypes/nvidia-tesla-k80 nvidia-tesla-k80Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-k80 |
|
config.secondaryWorkerConfig.numInstances INTEGER Optional. The number of VM instances in the instance group. For master instance groups, must be set to 1 |
|
config.secondaryWorkerConfig.diskConfig OBJECT Specifies the config of disk options for a group of VM instances |
|
config.secondaryWorkerConfig.diskConfig.bootDiskType STRING Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive) |
|
config.secondaryWorkerConfig.diskConfig.numLocalSsds INTEGER Optional. Number of attached SSDs, from 0 to 4 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries |
|
config.secondaryWorkerConfig.diskConfig.bootDiskSizeGb INTEGER Optional. Size in GB of the boot disk (default is 500GB) |
|
config.secondaryWorkerConfig.managedGroupConfig OBJECT Specifies the resources used to actively manage an instance group |
|
config.secondaryWorkerConfig.managedGroupConfig.instanceGroupManagerName STRING Output only. The name of the Instance Group Manager for this group |
|
config.secondaryWorkerConfig.managedGroupConfig.instanceTemplateName STRING Output only. The name of the Instance Template used for the Managed Instance Group |
|
config.secondaryWorkerConfig.isPreemptible BOOLEAN Optional. Specifies that this instance group contains preemptible instances |
|
config.secondaryWorkerConfig.imageUri STRING Optional. The Compute Engine image resource used for cluster instances. It can be specified or may be inferred from SoftwareConfig.image_version |
|
config.secondaryWorkerConfig.machineTypeUri STRING Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 projects/[project_id]/zones/us-east1-a/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Cloud Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2 |
|
config.encryptionConfig OBJECT Encryption settings for the cluster |
|
config.encryptionConfig.gcePdKmsKeyName STRING Optional. The Cloud KMS key name to use for PD disk encryption for all instances in the cluster |
|
config.securityConfig OBJECT Security related configuration, including Kerberos |
|
config.securityConfig.kerberosConfig OBJECT Specifies Kerberos related configuration |
|
config.securityConfig.kerberosConfig.keystoreUri STRING Optional. The Cloud Storage URI of the keystore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate |
|
config.securityConfig.kerberosConfig.keyPasswordUri STRING Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key. For the self-signed certificate, this password is generated by Dataproc |
|
config.securityConfig.kerberosConfig.keystorePasswordUri STRING Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore. For the self-signed certificate, this password is generated by Dataproc |
|
config.securityConfig.kerberosConfig.crossRealmTrustAdminServer STRING Optional. The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship |
|
config.securityConfig.kerberosConfig.kdcDbKeyUri STRING Optional. The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database |
|
config.securityConfig.kerberosConfig.truststorePasswordUri STRING Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore. For the self-signed certificate, this password is generated by Dataproc |
|
config.securityConfig.kerberosConfig.enableKerberos BOOLEAN Optional. Flag to indicate whether to Kerberize the cluster |
|
config.securityConfig.kerberosConfig.truststoreUri STRING Optional. The Cloud Storage URI of the truststore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate |
|
config.securityConfig.kerberosConfig.crossRealmTrustRealm STRING Optional. The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust |
|
config.securityConfig.kerberosConfig.rootPrincipalPasswordUri STRING Required. The Cloud Storage URI of a KMS encrypted file containing the root principal password |
|
config.securityConfig.kerberosConfig.kmsKeyUri STRING Required. The uri of the KMS key used to encrypt various sensitive files |
|
config.securityConfig.kerberosConfig.crossRealmTrustKdc STRING Optional. The KDC (IP or hostname) for the remote trusted realm in a cross realm trust relationship |
|
config.securityConfig.kerberosConfig.crossRealmTrustSharedPasswordUri STRING Optional. The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship |
|
config.securityConfig.kerberosConfig.tgtLifetimeHours INTEGER Optional. The lifetime of the ticket granting ticket, in hours. If not specified, or user specifies 0, then default value 10 will be used |
|
config.initializationActions[] OBJECT Specifies an executable to run on a fully configured node and a timeout period for executable completion |
|
config.initializationActions[].executableFile STRING Required. Cloud Storage URI of executable file |
|
config.initializationActions[].executionTimeout ANY Optional. Amount of time executable has to complete. Default is 10 minutes. Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period |
|
config.configBucket STRING Optional. A Google Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Google Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Cloud Dataproc staging bucket) |
|
clusterName STRING Required. The cluster name. Cluster names within a project must be unique. Names of deleted clusters can be reused |
|
clusterUuid STRING Output only. A cluster UUID (Unique Universal Identifier). Cloud Dataproc generates this value when it creates the cluster |
|
projectId STRING Required. The Google Cloud Platform project ID that the cluster belongs to |