Resource: awsGlueMlTransform
Provides a Glue ML Transform resource.
Example Usage
/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsGlueCatalogDatabaseTest =
new aws.glueCatalogDatabase.GlueCatalogDatabase(this, "test", {
name: "example",
});
const awsGlueCatalogTableTest = new aws.glueCatalogTable.GlueCatalogTable(
this,
"test_1",
{
databaseName: awsGlueCatalogDatabaseTest.name,
name: "example",
owner: "my_owner",
parameters: {
param1: "param1_val",
},
partitionKeys: [
{
comment: "my_column_1_comment",
name: "my_column_1",
type: "int",
},
{
comment: "my_column_2_comment",
name: "my_column_2",
type: "string",
},
],
retention: 1,
storageDescriptor: {
bucketColumns: ["bucket_column_1"],
columns: [
{
comment: "my_column1_comment",
name: "my_column_1",
type: "int",
},
{
comment: "my_column2_comment",
name: "my_column_2",
type: "string",
},
],
compressed: false,
inputFormat: "SequenceFileInputFormat",
location: "my_location",
numberOfBuckets: 1,
outputFormat: "SequenceFileInputFormat",
parameters: {
param1: "param1_val",
},
serDeInfo: {
name: "ser_de_name",
parameters: {
param1: "param_val_1",
},
serializationLibrary:
"org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe",
},
skewedInfo: {
skewedColumnNames: ["my_column_1"],
skewedColumnValueLocationMaps: {
myColumn1: "my_column_1_val_loc_map",
},
skewedColumnValues: ["skewed_val_1"],
},
sortColumns: [
{
column: "my_column_1",
sortOrder: 1,
},
],
storedAsSubDirectories: false,
},
tableType: "VIRTUAL_VIEW",
viewExpandedText: "view_expanded_text_1",
viewOriginalText: "view_original_text_1",
}
);
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsGlueCatalogTableTest.overrideLogicalId("test");
const awsGlueMlTransformTest = new aws.glueMlTransform.GlueMlTransform(
this,
"test_2",
{
depends_on: ["${aws_iam_role_policy_attachment.test}"],
inputRecordTables: [
{
databaseName: awsGlueCatalogTableTest.databaseName,
tableName: awsGlueCatalogTableTest.name,
},
],
name: "example",
parameters: {
findMatchesParameters: {
primaryKeyColumnName: "my_column_1",
},
transformType: "FIND_MATCHES",
},
roleArn: "${aws_iam_role.test.arn}",
}
);
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
awsGlueMlTransformTest.overrideLogicalId("test");
Argument Reference
The following arguments are supported:
name
– (Required) The name you assign to this ML Transform. It must be unique in your account.inputRecordTables
- (Required) A list of AWS Glue table definitions used by the transform. see Input Record Tables.parameters
- (Required) The algorithmic parameters that are specific to the transform type used. Conditionally dependent on the transform type. see Parameters.roleArn
– (Required) The ARN of the IAM role associated with this ML Transform.description
– (Optional) Description of the ML Transform.glueVersion
- (Optional) The version of glue to use, for example "1.0". For information about available versions, see the AWS Glue Release Notes.maxCapacity
– (Optional) The number of AWS Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from2
to100
DPUs; the default is10
.maxCapacity
is a mutually exclusive option withnumberOfWorkers
andworkerType
.maxRetries
– (Optional) The maximum number of times to retry this ML Transform if it fails.tags
- (Optional) Key-value map of resource tags. If configured with a providerdefaultTags
configuration block present, tags with matching keys will overwrite those defined at the provider-level.timeout
– (Optional) The ML Transform timeout in minutes. The default is 2880 minutes (48 hours).workerType
- (Optional) The type of predefined worker that is allocated when an ML Transform runs. Accepts a value ofstandard
,g1X
, org2X
. Required withnumberOfWorkers
.numberOfWorkers
- (Optional) The number of workers of a definedworkerType
that are allocated when an ML Transform runs. Required withworkerType
.
inputRecordTables
databaseName
- (Required) A database name in the AWS Glue Data Catalog.tableName
- (Required) A table name in the AWS Glue Data Catalog.catalogId
- (Optional) A unique identifier for the AWS Glue Data Catalog.connectionName
- (Optional) The name of the connection to the AWS Glue Data Catalog.
parameters
transformType
- (Required) The type of machine learning transform. For information about the types of machine learning transforms, see Creating Machine Learning Transforms.findMatchesParameters
- (Required) The parameters for the find matches algorithm. see Find Matches Parameters.
findMatchesParameters
accuracyCostTradeOff
- (Optional) The value that is selected when tuning your transform for a balance between accuracy and cost.enforceProvidedLabels
- (Optional) The value to switch on or off to force the output to match the provided labels from users.precisionRecallTradeOff
- (Optional) The value selected when tuning your transform for a balance between precision and recall.primaryKeyColumnName
- (Optional) The name of a column that uniquely identifies rows in the source table.
Attributes Reference
In addition to all arguments above, the following attributes are exported:
arn
- Amazon Resource Name (ARN) of Glue ML Transform.id
- Glue ML Transform ID.labelCount
- The number of labels available for this transform.schema
- The object that represents the schema that this transform accepts. see Schema.tagsAll
- A map of tags assigned to the resource, including those inherited from the providerdefaultTags
configuration block.
schema
name
- The name of the column.dataType
- The type of data in the column.
Import
Glue ML Transforms can be imported using id
, e.g.,