Skip to content

Resource: awsComprehendDocumentClassifier

Terraform resource for managing an AWS Comprehend Document Classifier.

Example Usage

Basic Usage

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as aws from "./.gen/providers/aws";
const awsS3ObjectDocuments = new aws.s3Object.S3Object(this, "documents", {});
new aws.s3Object.S3Object(this, "entities", {});
new aws.comprehendDocumentClassifier.ComprehendDocumentClassifier(
  this,
  "example",
  {
    dataAccessRoleArn: "${aws_iam_role.example.arn}",
    depends_on: ["${aws_iam_role_policy.example}"],
    inputDataConfig: {
      s3Uri: `s3://\${aws_s3_bucket.test.bucket}/\${${awsS3ObjectDocuments.id}}`,
    },
    languageCode: "en",
    name: "example",
  }
);

Argument Reference

The following arguments are required:

  • dataAccessRoleArn - (Required) The ARN for an IAM Role which allows Comprehend to read the training and testing data.
  • inputDataConfig - (Required) Configuration for the training and testing data. See the inputDataConfig Configuration Block section below.
  • languageCode - (Required) Two-letter language code for the language. One of en, es, fr, it, de, or pt.
  • name - (Required) Name for the Document Classifier. Has a maximum length of 63 characters. Can contain upper- and lower-case letters, numbers, and hypen (-).

The following arguments are optional:

  • mode - (Optional, Default: MULTI_CLASS) The document classification mode. One of MULTI_CLASS or MULTI_LABEL. MULTI_CLASS is also known as "Single Label" in the AWS Console.
  • modelKmsKeyId - (Optional) KMS Key used to encrypt trained Document Classifiers. Can be a KMS Key ID or a KMS Key ARN.
  • outputDataConfig - (Optional) Configuration for the output results of training. See the outputDataConfig Configuration Block section below.
  • tags - (Optional) A map of tags to assign to the resource. If configured with a provider defaultTags Configuration Block present, tags with matching keys will overwrite those defined at the provider-level.
  • versionName - (Optional) Name for the version of the Document Classifier. Each version must have a unique name within the Document Classifier. If omitted, Terraform will assign a random, unique version name. If explicitly set to "", no version name will be set. Has a maximum length of 63 characters. Can contain upper- and lower-case letters, numbers, and hypen (-). Conflicts with versionNamePrefix.
  • versionNamePrefix - (Optional) Creates a unique version name beginning with the specified prefix. Has a maximum length of 37 characters. Can contain upper- and lower-case letters, numbers, and hypen (-). Conflicts with versionName.
  • volumeKmsKeyId - (Optional) KMS Key used to encrypt storage volumes during job processing. Can be a KMS Key ID or a KMS Key ARN.
  • vpcConfig - (Optional) Configuration parameters for VPC to contain Document Classifier resources. See the vpcConfig Configuration Block section below.

inputDataConfig Configuration Block

  • augmentedManifests - (Optional) List of training datasets produced by Amazon SageMaker Ground Truth. Used if dataFormat is AUGMENTED_MANIFEST. See the augmentedManifests Configuration Block section below.
  • dataFormat - (Optional, Default: COMPREHEND_CSV) The format for the training data. One of COMPREHEND_CSV or AUGMENTED_MANIFEST.
  • labelDelimiter - (Optional) Delimiter between labels when training a multi-label classifier. Valid values are |, ~, !, @, #, $, %, ^, *, -, _, +, =, \, :, ;, >, ?, /, <space>, and <tab>. Default is |.
  • s3Uri - (Optional) Location of training documents. Used if dataFormat is COMPREHEND_CSV.
  • testS3Uri - (Optional) Location of test documents.

augmentedManifests Configuration Block

  • annotationDataS3Uri - (Optional) Location of annotation files.
  • attributeNames - (Required) The JSON attribute that contains the annotations for the training documents.
  • documentType - (Optional, Default: PLAIN_TEXT_DOCUMENT) Type of augmented manifest. One of PLAIN_TEXT_DOCUMENT or SEMI_STRUCTURED_DOCUMENT.
  • s3Uri - (Required) Location of augmented manifest file.
  • sourceDocumentsS3Uri - (Optional) Location of source PDF files.
  • split - (Optional, Default: train) Purpose of data in augmented manifest. One of train or test.

outputDataConfig Configuration Block

  • kmsKeyId - (Optional) KMS Key used to encrypt the output documents. Can be a KMS Key ID, a KMS Key ARN, a KMS Alias name, or a KMS Alias ARN.
  • outputS3Uri - (Computed) Full path for the output documents.
  • s3Uri - (Required) Destination path for the output documents. The full path to the output file will be returned in outputS3Uri.

vpcConfig Configuration Block

  • securityGroupIds - (Required) List of security group IDs.
  • subnets - (Required) List of VPC subnets.

Attributes Reference

In addition to all arguments above, the following attributes are exported:

  • arn - ARN of the Document Classifier version.
  • tagsAll - A map of tags assigned to the resource, including those inherited from the provider defaultTags configuration block.

Timeouts

awsComprehendDocumentClassifier provides the following Timeouts configuration options:

  • create - (Optional, Default: 60M)
  • update - (Optional, Default: 60M)
  • delete - (Optional, Default: 30M)

Import

Comprehend Document Classifier can be imported using the ARN, e.g.,

$ terraform import aws_comprehend_document_classifier.example arn:aws:comprehend:us-west-2:123456789012:document_classifier/example