Skip to content

Resource: awsEmrCluster

Provides an Elastic MapReduce Cluster, a web service that makes it easy to process large amounts of data efficiently. See Amazon Elastic MapReduce Documentation for more information.

To configure Instance Groups for task nodes, see the awsEmrInstanceGroup resource.

Example Usage

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
new aws.emrCluster.EmrCluster(this, "cluster", {
  additionalInfo:
    '{\n  "instanceAwsClientConfiguration": {\n    "proxyPort": 8099,\n    "proxyHost": "myproxy.example.com"\n  }\n}\n',
  applications: ["Spark"],
  bootstrapAction: [
    {
      args: ["instance.isMaster=true", "echo running on master node"],
      name: "runif",
      path: "s3://elasticmapreduce/bootstrap-actions/run-if",
    },
  ],
  configurationsJson:
    '  [\n    {\n      "Classification": "hadoop-env",\n      "Configurations": [\n        {\n          "Classification": "export",\n          "Properties": {\n            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"\n          }\n        }\n      ],\n      "Properties": {}\n    },\n    {\n      "Classification": "spark-env",\n      "Configurations": [\n        {\n          "Classification": "export",\n          "Properties": {\n            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"\n          }\n        }\n      ],\n      "Properties": {}\n    }\n  ]\n',
  coreInstanceGroup: {
    autoscalingPolicy:
      '{\n"Constraints": {\n  "MinCapacity": 1,\n  "MaxCapacity": 2\n},\n"Rules": [\n  {\n    "Name": "ScaleOutMemoryPercentage",\n    "Description": "Scale out if YARNMemoryAvailablePercentage is less than 15",\n    "Action": {\n      "SimpleScalingPolicyConfiguration": {\n        "AdjustmentType": "CHANGE_IN_CAPACITY",\n        "ScalingAdjustment": 1,\n        "CoolDown": 300\n      }\n    },\n    "Trigger": {\n      "CloudWatchAlarmDefinition": {\n        "ComparisonOperator": "LESS_THAN",\n        "EvaluationPeriods": 1,\n        "MetricName": "YARNMemoryAvailablePercentage",\n        "Namespace": "AWS/ElasticMapReduce",\n        "Period": 300,\n        "Statistic": "AVERAGE",\n        "Threshold": 15.0,\n        "Unit": "PERCENT"\n      }\n    }\n  }\n]\n}\n',
    bidPrice: "0.30",
    ebsConfig: [
      {
        size: "40",
        type: "gp2",
        volumesPerInstance: 1,
      },
    ],
    instanceCount: 1,
    instanceType: "c4.large",
  },
  ebsRootVolumeSize: 100,
  ec2Attributes: {
    emrManagedMasterSecurityGroup: "${aws_security_group.sg.id}",
    emrManagedSlaveSecurityGroup: "${aws_security_group.sg.id}",
    instanceProfile: "${aws_iam_instance_profile.emr_profile.arn}",
    subnetId: "${aws_subnet.main.id}",
  },
  keepJobFlowAliveWhenNoSteps: true,
  masterInstanceGroup: {
    instanceType: "m4.large",
  },
  name: "emr-test-arn",
  releaseLabel: "emr-4.6.0",
  serviceRole: "${aws_iam_role.iam_emr_service_role.arn}",
  tags: {
    env: "env",
    role: "rolename",
  },
  terminationProtection: false,
});

The awsEmrCluster resource typically requires two IAM roles, one for the EMR Cluster to use as a service, and another to place on your Cluster Instances to interact with AWS from those instances. The suggested role policy template for the EMR service is amazonElasticMapReduceRole, and amazonElasticMapReduceforEc2Role for the EC2 profile. See the Getting Started guide for more information on these IAM roles. There is also a fully-bootable example Terraform configuration at the bottom of this page.

Instance Fleet

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsEmrClusterExample = new aws.emrCluster.EmrCluster(this, "example", {
  coreInstanceFleet: {
    instanceTypeConfigs: [
      {
        bidPriceAsPercentageOfOnDemandPrice: 80,
        ebsConfig: [
          {
            size: 100,
            type: "gp2",
            volumesPerInstance: 1,
          },
        ],
        instanceType: "m3.xlarge",
        weightedCapacity: 1,
      },
      {
        bidPriceAsPercentageOfOnDemandPrice: 100,
        ebsConfig: [
          {
            size: 100,
            type: "gp2",
            volumesPerInstance: 1,
          },
        ],
        instanceType: "m4.xlarge",
        weightedCapacity: 1,
      },
      {
        bidPriceAsPercentageOfOnDemandPrice: 100,
        ebsConfig: [
          {
            size: 100,
            type: "gp2",
            volumesPerInstance: 1,
          },
        ],
        instanceType: "m4.2xlarge",
        weightedCapacity: 2,
      },
    ],
    launchSpecifications: {
      spotSpecification: [
        {
          allocationStrategy: "capacity-optimized",
          blockDurationMinutes: 0,
          timeoutAction: "SWITCH_TO_ON_DEMAND",
          timeoutDurationMinutes: 10,
        },
      ],
    },
    name: "core fleet",
    targetOnDemandCapacity: 2,
    targetSpotCapacity: 2,
  },
  masterInstanceFleet: {
    instanceTypeConfigs: [
      {
        instanceType: "m4.xlarge",
      },
    ],
    targetOnDemandCapacity: 1,
  },
});
new aws.emrInstanceFleet.EmrInstanceFleet(this, "task", {
  clusterId: awsEmrClusterExample.id,
  instanceTypeConfigs: [
    {
      bidPriceAsPercentageOfOnDemandPrice: 100,
      ebsConfig: [
        {
          size: 100,
          type: "gp2",
          volumesPerInstance: 1,
        },
      ],
      instanceType: "m4.xlarge",
      weightedCapacity: 1,
    },
    {
      bidPriceAsPercentageOfOnDemandPrice: 100,
      ebsConfig: [
        {
          size: 100,
          type: "gp2",
          volumesPerInstance: 1,
        },
      ],
      instanceType: "m4.2xlarge",
      weightedCapacity: 2,
    },
  ],
  launchSpecifications: {
    spotSpecification: [
      {
        allocationStrategy: "capacity-optimized",
        blockDurationMinutes: 0,
        timeoutAction: "TERMINATE_CLUSTER",
        timeoutDurationMinutes: 10,
      },
    ],
  },
  name: "task fleet",
  targetOnDemandCapacity: 1,
  targetSpotCapacity: 1,
});

Enable Debug Logging

Debug logging in EMR is implemented as a step. It is highly recommended that you utilize the lifecycle configuration block with ignoreChanges if other steps are being managed outside of Terraform.

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsEmrClusterExample = new aws.emrCluster.EmrCluster(this, "example", {
  step: [
    {
      actionOnFailure: "TERMINATE_CLUSTER",
      hadoopJarStep: [
        {
          args: ["state-pusher-script"],
          jar: "command-runner.jar",
        },
      ],
      name: "Setup Hadoop Debugging",
    },
  ],
});
awsEmrClusterExample.addOverride("lifecycle", [
  {
    ignore_changes: ["${step}"],
  },
]);

Multiple Node Master Instance Group

Available in EMR version 5.23.0 and later, an EMR Cluster can be launched with three master nodes for high availability. Additional information about this functionality and its requirements can be found in the EMR Management Guide.

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsSubnetExample = new aws.subnet.Subnet(this, "example", {
  mapPublicIpOnLaunch: true,
});
const awsEmrClusterExample = new aws.emrCluster.EmrCluster(this, "example_1", {
  coreInstanceGroup: {},
  ec2Attributes: {
    subnetId: awsSubnetExample.id,
  },
  masterInstanceGroup: {
    instanceCount: 3,
  },
  releaseLabel: "emr-5.24.1",
  terminationProtection: true,
});
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsEmrClusterExample.overrideLogicalId("example");

Bootable Cluster

NOTE: This configuration demonstrates a minimal configuration needed to boot an example EMR Cluster. It is not meant to display best practices. As with all examples, use at your own risk.

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsVpcMain = new aws.vpc.Vpc(this, "main", {
  cidrBlock: "168.31.0.0/16",
  enableDnsHostnames: true,
  tags: {
    name: "emr_test",
  },
});
const dataAwsIamPolicyDocumentEc2AssumeRole =
  new aws.dataAwsIamPolicyDocument.DataAwsIamPolicyDocument(
    this,
    "ec2_assume_role",
    {
      statement: [
        {
          actions: "sts:AssumeRole",
          effect: "Allow",
          principals: [
            {
              identifiers: ["ec2.amazonaws.com"],
              type: "Service",
            },
          ],
        },
      ],
    }
  );
const dataAwsIamPolicyDocumentEmrAssumeRole =
  new aws.dataAwsIamPolicyDocument.DataAwsIamPolicyDocument(
    this,
    "emr_assume_role",
    {
      statement: [
        {
          actions: "sts:AssumeRole",
          effect: "Allow",
          principals: [
            {
              identifiers: ["elasticmapreduce.amazonaws.com"],
              type: "Service",
            },
          ],
        },
      ],
    }
  );
const dataAwsIamPolicyDocumentIamEmrProfilePolicy =
  new aws.dataAwsIamPolicyDocument.DataAwsIamPolicyDocument(
    this,
    "iam_emr_profile_policy",
    {
      statement: [
        {
          actions: [
            "cloudwatch:*",
            "dynamodb:*",
            "ec2:Describe*",
            "elasticmapreduce:Describe*",
            "elasticmapreduce:ListBootstrapActions",
            "elasticmapreduce:ListClusters",
            "elasticmapreduce:ListInstanceGroups",
            "elasticmapreduce:ListInstances",
            "elasticmapreduce:ListSteps",
            "kinesis:CreateStream",
            "kinesis:DeleteStream",
            "kinesis:DescribeStream",
            "kinesis:GetRecords",
            "kinesis:GetShardIterator",
            "kinesis:MergeShards",
            "kinesis:PutRecord",
            "kinesis:SplitShard",
            "rds:Describe*",
            "s3:*",
            "sdb:*",
            "sns:*",
            "sqs:*",
          ],
          effect: "Allow",
          resources: ["*"],
        },
      ],
    }
  );
const dataAwsIamPolicyDocumentIamEmrServicePolicy =
  new aws.dataAwsIamPolicyDocument.DataAwsIamPolicyDocument(
    this,
    "iam_emr_service_policy",
    {
      statement: [
        {
          actions: [
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:CancelSpotInstanceRequests",
            "ec2:CreateNetworkInterface",
            "ec2:CreateSecurityGroup",
            "ec2:CreateTags",
            "ec2:DeleteNetworkInterface",
            "ec2:DeleteSecurityGroup",
            "ec2:DeleteTags",
            "ec2:DescribeAvailabilityZones",
            "ec2:DescribeAccountAttributes",
            "ec2:DescribeDhcpOptions",
            "ec2:DescribeInstanceStatus",
            "ec2:DescribeInstances",
            "ec2:DescribeKeyPairs",
            "ec2:DescribeNetworkAcls",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribePrefixLists",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSpotInstanceRequests",
            "ec2:DescribeSpotPriceHistory",
            "ec2:DescribeSubnets",
            "ec2:DescribeVpcAttribute",
            "ec2:DescribeVpcEndpoints",
            "ec2:DescribeVpcEndpointServices",
            "ec2:DescribeVpcs",
            "ec2:DetachNetworkInterface",
            "ec2:ModifyImageAttribute",
            "ec2:ModifyInstanceAttribute",
            "ec2:RequestSpotInstances",
            "ec2:RevokeSecurityGroupEgress",
            "ec2:RunInstances",
            "ec2:TerminateInstances",
            "ec2:DeleteVolume",
            "ec2:DescribeVolumeStatus",
            "ec2:DescribeVolumes",
            "ec2:DetachVolume",
            "iam:GetRole",
            "iam:GetRolePolicy",
            "iam:ListInstanceProfiles",
            "iam:ListRolePolicies",
            "iam:PassRole",
            "s3:CreateBucket",
            "s3:Get*",
            "s3:List*",
            "sdb:BatchPutAttributes",
            "sdb:Select",
            "sqs:CreateQueue",
            "sqs:Delete*",
            "sqs:GetQueue*",
            "sqs:PurgeQueue",
            "sqs:ReceiveMessage",
          ],
          effect: "Allow",
          resources: ["*"],
        },
      ],
    }
  );
const awsIamRoleIamEmrProfileRole = new aws.iamRole.IamRole(
  this,
  "iam_emr_profile_role",
  {
    assumeRolePolicy: dataAwsIamPolicyDocumentEc2AssumeRole.json,
    name: "iam_emr_profile_role",
  }
);
const awsIamRoleIamEmrServiceRole = new aws.iamRole.IamRole(
  this,
  "iam_emr_service_role",
  {
    assumeRolePolicy: dataAwsIamPolicyDocumentEmrAssumeRole.json,
    name: "iam_emr_service_role",
  }
);
const awsIamRolePolicyIamEmrProfilePolicy = new aws.iamRolePolicy.IamRolePolicy(
  this,
  "iam_emr_profile_policy_7",
  {
    name: "iam_emr_profile_policy",
    policy: dataAwsIamPolicyDocumentIamEmrProfilePolicy.json,
    role: awsIamRoleIamEmrProfileRole.id,
  }
);
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsIamRolePolicyIamEmrProfilePolicy.overrideLogicalId("iam_emr_profile_policy");
const awsIamRolePolicyIamEmrServicePolicy = new aws.iamRolePolicy.IamRolePolicy(
  this,
  "iam_emr_service_policy_8",
  {
    name: "iam_emr_service_policy",
    policy: dataAwsIamPolicyDocumentIamEmrServicePolicy.json,
    role: awsIamRoleIamEmrServiceRole.id,
  }
);
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsIamRolePolicyIamEmrServicePolicy.overrideLogicalId("iam_emr_service_policy");
const awsInternetGatewayGw = new aws.internetGateway.InternetGateway(
  this,
  "gw",
  {
    vpcId: awsVpcMain.id,
  }
);
const awsRouteTableR = new aws.routeTable.RouteTable(this, "r", {
  route: [
    {
      cidrBlock: "0.0.0.0/0",
      gatewayId: awsInternetGatewayGw.id,
    },
  ],
  vpcId: awsVpcMain.id,
});
const awsSubnetMain = new aws.subnet.Subnet(this, "main_11", {
  cidrBlock: "168.31.0.0/20",
  tags: {
    name: "emr_test",
  },
  vpcId: awsVpcMain.id,
});
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsSubnetMain.overrideLogicalId("main");
const awsIamInstanceProfileEmrProfile =
  new aws.iamInstanceProfile.IamInstanceProfile(this, "emr_profile", {
    name: "emr_profile",
    role: awsIamRoleIamEmrProfileRole.name,
  });
new aws.mainRouteTableAssociation.MainRouteTableAssociation(this, "a", {
  routeTableId: awsRouteTableR.id,
  vpcId: awsVpcMain.id,
});
const awsSecurityGroupAllowAccess = new aws.securityGroup.SecurityGroup(
  this,
  "allow_access",
  {
    depends_on: [`\${${awsSubnetMain.fqn}}`],
    description: "Allow inbound traffic",
    egress: [
      {
        cidrBlocks: ["0.0.0.0/0"],
        fromPort: 0,
        protocol: "-1",
        toPort: 0,
      },
    ],
    ingress: [
      {
        cidrBlocks: [awsVpcMain.cidrBlock],
        fromPort: 0,
        protocol: "-1",
        toPort: 0,
      },
    ],
    name: "allow_access",
    tags: {
      name: "emr_test",
    },
    vpcId: awsVpcMain.id,
  }
);
awsSecurityGroupAllowAccess.addOverride("lifecycle", [
  {
    ignore_changes: ["${ingress}", "${egress}"],
  },
]);
new aws.emrCluster.EmrCluster(this, "cluster", {
  applications: ["Spark"],
  bootstrapAction: [
    {
      args: ["instance.isMaster=true", "echo running on master node"],
      name: "runif",
      path: "s3://elasticmapreduce/bootstrap-actions/run-if",
    },
  ],
  configurationsJson:
    '  [\n    {\n      "Classification": "hadoop-env",\n      "Configurations": [\n        {\n          "Classification": "export",\n          "Properties": {\n            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"\n          }\n        }\n      ],\n      "Properties": {}\n    },\n    {\n      "Classification": "spark-env",\n      "Configurations": [\n        {\n          "Classification": "export",\n          "Properties": {\n            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"\n          }\n        }\n      ],\n      "Properties": {}\n    }\n  ]\n',
  coreInstanceGroup: {
    instanceCount: 1,
    instanceType: "m5.xlarge",
  },
  ec2Attributes: {
    emrManagedMasterSecurityGroup: awsSecurityGroupAllowAccess.id,
    emrManagedSlaveSecurityGroup: awsSecurityGroupAllowAccess.id,
    instanceProfile: awsIamInstanceProfileEmrProfile.arn,
    subnetId: awsSubnetMain.id,
  },
  masterInstanceGroup: {
    instanceType: "m5.xlarge",
  },
  name: "emr-test-arn",
  releaseLabel: "emr-4.6.0",
  serviceRole: awsIamRoleIamEmrServiceRole.arn,
  tags: {
    dns_zone: "env_zone",
    env: "env",
    name: "name-env",
    role: "rolename",
  },
});

Argument Reference

The following arguments are required:

  • name - (Required) Name of the job flow.
  • releaseLabel - (Required) Release label for the Amazon EMR release.
  • serviceRole - (Required) IAM role that will be assumed by the Amazon EMR service to access AWS resources.

The following arguments are optional:

  • additionalInfo - (Optional) JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore Terraform cannot detect drift from the actual EMR cluster if its value is changed outside Terraform.
  • applications - (Optional) A case-insensitive list of applications for Amazon EMR to install and configure when launching the cluster. For a list of applications available for each Amazon EMR release version, see the Amazon EMR Release Guide.
  • autoscalingRole - (Optional) IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group.
  • autoTerminationPolicy - (Optional) An auto-termination policy for an Amazon EMR cluster. An auto-termination policy defines the amount of idle time in seconds after which a cluster automatically terminates. See Auto Termination Policy Below.
  • bootstrapAction - (Optional) Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. See below.
  • configurations - (Optional) List of configurations supplied for the EMR cluster you are creating. Supply a configuration object for applications to override their default configuration. See AWS Documentation for more information.
  • configurationsJson - (Optional) JSON string for supplying list of configurations for the EMR cluster.

\~> NOTE on configurationsJson: If the configurations value is empty then you should skip the configurations field instead of providing an empty list as a value, "configurations": [].

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
new aws.emrCluster.EmrCluster(this, "cluster", {
  configurationsJson:
    '  [\n    {\n      "Classification": "hadoop-env",\n      "Configurations": [\n        {\n          "Classification": "export",\n          "Properties": {\n            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"\n          }\n        }\n      ],\n      "Properties": {}\n    }\n  ]\n',
});
  • coreInstanceFleet - (Optional) Configuration block to use an Instance Fleet for the core node type. Cannot be specified if any coreInstanceGroup configuration blocks are set. Detailed below.
  • coreInstanceGroup - (Optional) Configuration block to use an Instance Group for the core node type.
  • customAmiId - (Optional) Custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). Available in Amazon EMR version 5.7.0 and later.
  • ebsRootVolumeSize - (Optional) Size in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and later.
  • ec2Attributes - (Optional) Attributes for the EC2 instances running the job flow. See below.
  • keepJobFlowAliveWhenNoSteps - (Optional) Switch on/off run cluster with no steps or when all steps are complete (default is on)
  • kerberosAttributes - (Optional) Kerberos configuration for the cluster. See below.
  • listStepsStates - (Optional) List of step states used to filter returned steps
  • logEncryptionKmsKeyId - (Optional) AWS KMS customer master key (CMK) key ID or arn used for encrypting log files. This attribute is only available with EMR version 5.30.0 and later, excluding EMR 6.0.0.
  • logUri - (Optional) S3 bucket to write the log files of the job flow. If a value is not provided, logs are not created.
  • masterInstanceFleet - (Optional) Configuration block to use an Instance Fleet for the master node type. Cannot be specified if any masterInstanceGroup configuration blocks are set. Detailed below.
  • masterInstanceGroup - (Optional) Configuration block to use an Instance Group for the master node type.
  • scaleDownBehavior - (Optional) Way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instanceGroup is resized.
  • securityConfiguration - (Optional) Security configuration name to attach to the EMR cluster. Only valid for EMR clusters with releaseLabel 4.8.0 or greater.
  • step - (Optional) List of steps to run when creating the cluster. See below. It is highly recommended to utilize the lifecycle configuration block with ignoreChanges if other steps are being managed outside of Terraform. This argument is processed in attribute-as-blocks mode.
  • stepConcurrencyLevel - (Optional) Number of steps that can be executed concurrently. You can specify a maximum of 256 steps. Only valid for EMR clusters with releaseLabel 5.28.0 or greater (default is 1).
  • tags - (Optional) list of tags to apply to the EMR Cluster. If configured with a provider defaultTags configuration block present, tags with matching keys will overwrite those defined at the provider-level.
  • terminationProtection - (Optional) Switch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to false.
  • visibleToAllUsers - (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Default value is true.

bootstrapAction

  • args - (Optional) List of command line arguments to pass to the bootstrap action script.
  • name - (Required) Name of the bootstrap action.
  • path - (Required) Location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file system.

autoTerminationPolicy

  • idleTimeout - (Optional) Specifies the amount of idle time in seconds after which the cluster automatically terminates. You can specify a minimum of 60 seconds and a maximum of 604800 seconds (seven days).

configurations

A configuration classification that applies when provisioning cluster instances, which can include configurations for applications and software that run on the cluster. See Configuring Applications.

  • classification - (Optional) Classification within a configuration.
  • properties - (Optional) Map of properties specified within a configuration classification.

coreInstanceFleet

  • instanceTypeConfigs - (Optional) Configuration block for instance fleet.
  • launchSpecifications - (Optional) Configuration block for launch specification.
  • name - (Optional) Friendly name given to the instance fleet.
  • targetOnDemandCapacity - (Optional) The target capacity of On-Demand units for the instance fleet, which determines how many On-Demand instances to provision.
  • targetSpotCapacity - (Optional) Target capacity of Spot units for the instance fleet, which determines how many Spot instances to provision.

instanceTypeConfigs

  • bidPrice - (Optional) Bid price for each EC2 Spot instance type as defined by instanceType. Expressed in USD. If neither bidPrice nor bidPriceAsPercentageOfOnDemandPrice is provided, bidPriceAsPercentageOfOnDemandPrice defaults to 100%.
  • bidPriceAsPercentageOfOnDemandPrice - (Optional) Bid price, as a percentage of On-Demand price, for each EC2 Spot instance as defined by instanceType. Expressed as a number (for example, 20 specifies 20%). If neither bidPrice nor bidPriceAsPercentageOfOnDemandPrice is provided, bidPriceAsPercentageOfOnDemandPrice defaults to 100%.
  • configurations - (Optional) Configuration classification that applies when provisioning cluster instances, which can include configurations for applications and software that run on the cluster. List of configuration blocks.
  • ebsConfig - (Optional) Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.
  • instanceType - (Required) EC2 instance type, such as m4.xlarge.
  • weightedCapacity - (Optional) Number of units that a provisioned instance of this type provides toward fulfilling the target capacities defined in awsEmrInstanceFleet.

launchSpecifications

  • onDemandSpecification - (Optional) Configuration block for on demand instances launch specifications.
  • spotSpecification - (Optional) Configuration block for spot instances launch specifications.
onDemandSpecification

The launch specification for On-Demand instances in the instance fleet, which determines the allocation strategy. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions. On-Demand instances allocation strategy is available in Amazon EMR version 5.12.1 and later.

  • allocationStrategy - (Required) Specifies the strategy to use in launching On-Demand instance fleets. Currently, the only option is lowestPrice (the default), which launches the lowest price first.
spotSpecification

The launch specification for Spot instances in the fleet, which determines the defined duration, provisioning timeout behavior, and allocation strategy.

  • allocationStrategy - (Required) Specifies the strategy to use in launching Spot instance fleets. Currently, the only option is capacityOptimized (the default), which launches instances from Spot instance pools with optimal capacity for the number of instances that are launching.
  • blockDurationMinutes - (Optional) Defined duration for Spot instances (also known as Spot blocks) in minutes. When specified, the Spot instance does not terminate before the defined duration expires, and defined duration pricing for Spot instances applies. Valid values are 60, 120, 180, 240, 300, or 360. The duration period starts as soon as a Spot instance receives its instance ID. At the end of the duration, Amazon EC2 marks the Spot instance for termination and provides a Spot instance termination notice, which gives the instance a two-minute warning before it terminates.
  • timeoutAction - (Required) Action to take when TargetSpotCapacity has not been fulfilled when the TimeoutDurationMinutes has expired; that is, when all Spot instances could not be provisioned within the Spot provisioning timeout. Valid values are TERMINATE_CLUSTER and SWITCH_TO_ON_DEMAND. SWITCH_TO_ON_DEMAND specifies that if no Spot instances are available, On-Demand Instances should be provisioned to fulfill any remaining Spot capacity.
  • timeoutDurationMinutes - (Required) Spot provisioning timeout period in minutes. If Spot instances are not provisioned within this time period, the TimeOutAction is taken. Minimum value is 5 and maximum value is 1440. The timeout applies only during initial provisioning, when the cluster is first created.

coreInstanceGroup

  • autoscalingPolicy - (Optional) String containing the EMR Auto Scaling Policy JSON.
  • bidPrice - (Optional) Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.
  • ebsConfig - (Optional) Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.
  • instanceCount - (Optional) Target number of instances for the instance group. Must be at least 1. Defaults to 1.
  • instanceType - (Required) EC2 instance type for all instances in the instance group.
  • name - (Optional) Friendly name given to the instance group.

ebsConfig

  • iops - (Optional) Number of I/O operations per second (IOPS) that the volume supports.
  • size - (Required) Volume size, in gibibytes (GiB).
  • type - (Required) Volume type. Valid options are gp3, gp2, io1, standard, st1 and sc1. See EBS Volume Types.
  • throughput - (Optional) The throughput, in mebibyte per second (MiB/s).
  • volumesPerInstance - (Optional) Number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1).

ec2Attributes

Attributes for the Amazon EC2 instances running the job flow:

  • additionalMasterSecurityGroups - (Optional) String containing a comma separated list of additional Amazon EC2 security group IDs for the master node.
  • additionalSlaveSecurityGroups - (Optional) String containing a comma separated list of additional Amazon EC2 security group IDs for the slave nodes as a comma separated string.
  • emrManagedMasterSecurityGroup - (Optional) Identifier of the Amazon EC2 EMR-Managed security group for the master node.
  • emrManagedSlaveSecurityGroup - (Optional) Identifier of the Amazon EC2 EMR-Managed security group for the slave nodes.
  • instanceProfile - (Required) Instance Profile for EC2 instances of the cluster assume this role.
  • keyName - (Optional) Amazon EC2 key pair that can be used to ssh to the master node as the user called hadoop.
  • serviceAccessSecurityGroup - (Optional) Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet.
  • subnetId - (Optional) VPC subnet id where you want the job flow to launch. Cannot specify the cc14Xlarge instance type for nodes of a job flow launched in an Amazon VPC.
  • subnetIds - (Optional) List of VPC subnet id-s where you want the job flow to launch. Amazon EMR identifies the best Availability Zone to launch instances according to your fleet specifications.

\~> NOTE on EMR-Managed security groups: These security groups will have any missing inbound or outbound access rules added and maintained by AWS, to ensure proper communication between instances in a cluster. The EMR service will maintain these rules for groups provided in emrManagedMasterSecurityGroup and emrManagedSlaveSecurityGroup; attempts to remove the required rules may succeed, only for the EMR service to re-add them in a matter of minutes. This may cause Terraform to fail to destroy an environment that contains an EMR cluster, because the EMR service does not revoke rules added on deletion, leaving a cyclic dependency between the security groups that prevents their deletion. To avoid this, use the revokeRulesOnDelete optional attribute for any Security Group used in emrManagedMasterSecurityGroup and emrManagedSlaveSecurityGroup. See Amazon EMR-Managed Security Groups for more information about the EMR-managed security group rules.

kerberosAttributes

  • adDomainJoinPassword - (Optional) Active Directory password for adDomainJoinUser. Terraform cannot perform drift detection of this configuration.
  • adDomainJoinUser - (Optional) Required only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. Terraform cannot perform drift detection of this configuration.
  • crossRealmTrustPrincipalPassword - (Optional) Required only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. Terraform cannot perform drift detection of this configuration.
  • kdcAdminPassword - (Required) Password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. Terraform cannot perform drift detection of this configuration.
  • realm - (Required) Name of the Kerberos realm to which all nodes in a cluster belong. For example, ec2Internal

masterInstanceFleet

  • instanceTypeConfigs - (Optional) Configuration block for instance fleet.
  • launchSpecifications - (Optional) Configuration block for launch specification.
  • name - (Optional) Friendly name given to the instance fleet.
  • targetOnDemandCapacity - (Optional) Target capacity of On-Demand units for the instance fleet, which determines how many On-Demand instances to provision.
  • targetSpotCapacity - (Optional) Target capacity of Spot units for the instance fleet, which determines how many Spot instances to provision.

instanceTypeConfigs

See instanceTypeConfigs above, under coreInstanceFleet.

launchSpecifications

See launchSpecifications above, under coreInstanceFleet.

masterInstanceGroup

Supported nested arguments for the masterInstanceGroup configuration block:

  • bidPrice - (Optional) Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.
  • ebsConfig - (Optional) Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.
  • instanceCount - (Optional) Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource's coreInstanceGroup to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and Terraform must have the terminationProtection =False configuration applied before destroying this resource.
  • instanceType - (Required) EC2 instance type for all instances in the instance group.
  • name - (Optional) Friendly name given to the instance group.

ebsConfig

See ebsConfig under coreInstanceGroup above.

step

This argument is processed in attribute-as-blocks mode.

  • actionOnFailure - (Required) Action to take if the step fails. Valid values: TERMINATE_JOB_FLOW, TERMINATE_CLUSTER, CANCEL_AND_WAIT, and continue
  • hadoopJarStep - (Required) JAR file used for the step. See below.
  • name - (Required) Name of the step.

hadoopJarStep

This argument is processed in attribute-as-blocks mode.

  • args - (Optional) List of command line arguments passed to the JAR file's main function when executed.
  • jar - (Required) Path to a JAR file run during the step.
  • mainClass - (Optional) Name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.
  • properties - (Optional) Key-Value map of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function.

Attributes Reference

In addition to all arguments above, the following attributes are exported:

  • applications - Applications installed on this cluster.
  • arn- ARN of the cluster.
  • bootstrapAction - List of bootstrap actions that will be run before Hadoop is started on the cluster nodes.
  • configurations - List of Configurations supplied to the EMR cluster.
  • coreInstanceGroup0Id - Core node type Instance Group ID, if using Instance Group for this node type.
  • ec2Attributes - Provides information about the EC2 instances in a cluster grouped by category: key name, subnet ID, IAM instance profile, and so on.
  • id - ID of the cluster.
  • logUri - Path to the Amazon S3 location where logs for this cluster are stored.
  • masterInstanceGroup0Id - Master node type Instance Group ID, if using Instance Group for this node type.
  • masterPublicDns - The DNS name of the master node. If the cluster is on a private subnet, this is the private DNS name. On a public subnet, this is the public DNS name.
  • name - Name of the cluster.
  • releaseLabel - Release label for the Amazon EMR release.
  • serviceRole - IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.
  • tagsAll - Map of tags assigned to the resource, including those inherited from the provider defaultTags configuration block.
  • visibleToAllUsers - Indicates whether the job flow is visible to all IAM users of the AWS account associated with the job flow.

Import

EMR clusters can be imported using the id, e.g.,

$ terraform import aws_emr_cluster.cluster j-123456ABCDEF

Since the API does not return the actual values for Kerberos configurations, environments with those Terraform configurations will need to use the lifecycle configuration block ignoreChanges argument available to all Terraform resources to prevent perpetual differences, e.g.,

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsEmrClusterExample = new aws.emrCluster.EmrCluster(this, "example", {});
awsEmrClusterExample.addOverride("lifecycle", [
  {
    ignore_changes: ["${kerberos_attributes}"],
  },
]);