CleanRoomsML / Client / create_training_dataset

create_training_dataset#

CleanRoomsML.Client.create_training_dataset(**kwargs)#

Defines the information necessary to create a training dataset. In Clean Rooms ML, the TrainingDataset is metadata that points to a Glue table, which is read only during AudienceModel creation.

See also: AWS API Documentation

Request Syntax

response = client.create_training_dataset(
    description='string',
    name='string',
    roleArn='string',
    tags={
        'string': 'string'
    },
    trainingData=[
        {
            'inputConfig': {
                'dataSource': {
                    'glueDataSource': {
                        'catalogId': 'string',
                        'databaseName': 'string',
                        'tableName': 'string'
                    }
                },
                'schema': [
                    {
                        'columnName': 'string',
                        'columnTypes': [
                            'USER_ID'|'ITEM_ID'|'TIMESTAMP'|'CATEGORICAL_FEATURE'|'NUMERICAL_FEATURE',
                        ]
                    },
                ]
            },
            'type': 'INTERACTIONS'
        },
    ]
)
Parameters:
  • description (string) – The description of the training dataset.

  • name (string) –

    [REQUIRED]

    The name of the training dataset. This name must be unique in your account and region.

  • roleArn (string) –

    [REQUIRED]

    The ARN of the IAM role that Clean Rooms ML can assume to read the data referred to in the dataSource field of each dataset.

    Passing a role across AWS accounts is not allowed. If you pass a role that isn’t in your account, you get an AccessDeniedException error.

  • tags (dict) –

    The optional metadata that you apply to the resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.

    The following basic restrictions apply to tags:

    • Maximum number of tags per resource - 50.

    • For each resource, each tag key must be unique, and each tag key can have only one value.

    • Maximum key length - 128 Unicode characters in UTF-8.

    • Maximum value length - 256 Unicode characters in UTF-8.

    • If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.

    • Tag keys and values are case sensitive.

    • Do not use aws:, AWS:, or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Clean Rooms ML considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.

    • (string) –

      • (string) –

  • trainingData (list) –

    [REQUIRED]

    An array of information that lists the Dataset objects, which specifies the dataset type and details on its location and schema. You must provide a role that has read access to these tables.

    • (dict) –

      Defines where the training dataset is located, what type of data it contains, and how to access the data.

      • inputConfig (dict) – [REQUIRED]

        A DatasetInputConfig object that defines the data source and schema mapping.

        • dataSource (dict) – [REQUIRED]

          A DataSource object that specifies the Glue data source for the training data.

          • glueDataSource (dict) – [REQUIRED]

            A GlueDataSource object that defines the catalog ID, database name, and table name for the training data.

            • catalogId (string) –

              The Glue catalog that contains the training data.

            • databaseName (string) – [REQUIRED]

              The Glue database that contains the training data.

            • tableName (string) – [REQUIRED]

              The Glue table that contains the training data.

        • schema (list) – [REQUIRED]

          The schema information for the training data.

          • (dict) –

            Metadata for a column.

            • columnName (string) – [REQUIRED]

              The name of a column.

            • columnTypes (list) – [REQUIRED]

              The data type of column.

              • (string) –

      • type (string) – [REQUIRED]

        What type of information is found in the dataset.

Return type:

dict

Returns:

Response Syntax

{
    'trainingDatasetArn': 'string'
}

Response Structure

  • (dict) –

    • trainingDatasetArn (string) –

      The Amazon Resource Name (ARN) of the training dataset resource.

Exceptions