Skip to content

Resource: awsGlueMlTransform

Provides a Glue ML Transform resource.

Example Usage

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsGlueCatalogDatabaseTest =
  new aws.glueCatalogDatabase.GlueCatalogDatabase(this, "test", {
    name: "example",
  });
const awsGlueCatalogTableTest = new aws.glueCatalogTable.GlueCatalogTable(
  this,
  "test_1",
  {
    databaseName: awsGlueCatalogDatabaseTest.name,
    name: "example",
    owner: "my_owner",
    parameters: {
      param1: "param1_val",
    },
    partitionKeys: [
      {
        comment: "my_column_1_comment",
        name: "my_column_1",
        type: "int",
      },
      {
        comment: "my_column_2_comment",
        name: "my_column_2",
        type: "string",
      },
    ],
    retention: 1,
    storageDescriptor: {
      bucketColumns: ["bucket_column_1"],
      columns: [
        {
          comment: "my_column1_comment",
          name: "my_column_1",
          type: "int",
        },
        {
          comment: "my_column2_comment",
          name: "my_column_2",
          type: "string",
        },
      ],
      compressed: false,
      inputFormat: "SequenceFileInputFormat",
      location: "my_location",
      numberOfBuckets: 1,
      outputFormat: "SequenceFileInputFormat",
      parameters: {
        param1: "param1_val",
      },
      serDeInfo: {
        name: "ser_de_name",
        parameters: {
          param1: "param_val_1",
        },
        serializationLibrary:
          "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe",
      },
      skewedInfo: {
        skewedColumnNames: ["my_column_1"],
        skewedColumnValueLocationMaps: {
          myColumn1: "my_column_1_val_loc_map",
        },
        skewedColumnValues: ["skewed_val_1"],
      },
      sortColumns: [
        {
          column: "my_column_1",
          sortOrder: 1,
        },
      ],
      storedAsSubDirectories: false,
    },
    tableType: "VIRTUAL_VIEW",
    viewExpandedText: "view_expanded_text_1",
    viewOriginalText: "view_original_text_1",
  }
);
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsGlueCatalogTableTest.overrideLogicalId("test");
const awsGlueMlTransformTest = new aws.glueMlTransform.GlueMlTransform(
  this,
  "test_2",
  {
    depends_on: ["${aws_iam_role_policy_attachment.test}"],
    inputRecordTables: [
      {
        databaseName: awsGlueCatalogTableTest.databaseName,
        tableName: awsGlueCatalogTableTest.name,
      },
    ],
    name: "example",
    parameters: {
      findMatchesParameters: {
        primaryKeyColumnName: "my_column_1",
      },
      transformType: "FIND_MATCHES",
    },
    roleArn: "${aws_iam_role.test.arn}",
  }
);
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsGlueMlTransformTest.overrideLogicalId("test");

Argument Reference

The following arguments are supported:

  • name – (Required) The name you assign to this ML Transform. It must be unique in your account.
  • inputRecordTables - (Required) A list of AWS Glue table definitions used by the transform. see Input Record Tables.
  • parameters - (Required) The algorithmic parameters that are specific to the transform type used. Conditionally dependent on the transform type. see Parameters.
  • roleArn – (Required) The ARN of the IAM role associated with this ML Transform.
  • description – (Optional) Description of the ML Transform.
  • glueVersion - (Optional) The version of glue to use, for example "1.0". For information about available versions, see the AWS Glue Release Notes.
  • maxCapacity – (Optional) The number of AWS Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. maxCapacity is a mutually exclusive option with numberOfWorkers and workerType.
  • maxRetries – (Optional) The maximum number of times to retry this ML Transform if it fails.
  • tags - (Optional) Key-value map of resource tags. If configured with a provider defaultTags configuration block present, tags with matching keys will overwrite those defined at the provider-level.
  • timeout – (Optional) The ML Transform timeout in minutes. The default is 2880 minutes (48 hours).
  • workerType - (Optional) The type of predefined worker that is allocated when an ML Transform runs. Accepts a value of standard, g1X, or g2X. Required with numberOfWorkers.
  • numberOfWorkers - (Optional) The number of workers of a defined workerType that are allocated when an ML Transform runs. Required with workerType.

inputRecordTables

  • databaseName - (Required) A database name in the AWS Glue Data Catalog.
  • tableName - (Required) A table name in the AWS Glue Data Catalog.
  • catalogId - (Optional) A unique identifier for the AWS Glue Data Catalog.
  • connectionName- (Optional) The name of the connection to the AWS Glue Data Catalog.

parameters

findMatchesParameters

  • accuracyCostTradeOff - (Optional) The value that is selected when tuning your transform for a balance between accuracy and cost.
  • enforceProvidedLabels - (Optional) The value to switch on or off to force the output to match the provided labels from users.
  • precisionRecallTradeOff - (Optional) The value selected when tuning your transform for a balance between precision and recall.
  • primaryKeyColumnName - (Optional) The name of a column that uniquely identifies rows in the source table.

Attributes Reference

In addition to all arguments above, the following attributes are exported:

  • arn - Amazon Resource Name (ARN) of Glue ML Transform.
  • id - Glue ML Transform ID.
  • labelCount - The number of labels available for this transform.
  • schema - The object that represents the schema that this transform accepts. see Schema.
  • tagsAll - A map of tags assigned to the resource, including those inherited from the provider defaultTags configuration block.

schema

  • name - The name of the column.
  • dataType - The type of data in the column.

Import

Glue ML Transforms can be imported using id, e.g.,

$ terraform import aws_glue_ml_transform.example tfm-c2cafbe83b1c575f49eaca9939220e2fcd58e2d5