LookoutEquipment / Client / describe_data_ingestion_job

describe_data_ingestion_job#

LookoutEquipment.Client.describe_data_ingestion_job(**kwargs)#

Provides information on a specific data ingestion job such as creation time, dataset ARN, and status.

See also: AWS API Documentation

Request Syntax

response = client.describe_data_ingestion_job(
    JobId='string'
)
Parameters:

JobId (string) –

[REQUIRED]

The job ID of the data ingestion job.

Return type:

dict

Returns:

Response Syntax

{
    'JobId': 'string',
    'DatasetArn': 'string',
    'IngestionInputConfiguration': {
        'S3InputConfiguration': {
            'Bucket': 'string',
            'Prefix': 'string',
            'KeyPattern': 'string'
        }
    },
    'RoleArn': 'string',
    'CreatedAt': datetime(2015, 1, 1),
    'Status': 'IN_PROGRESS'|'SUCCESS'|'FAILED'|'IMPORT_IN_PROGRESS',
    'FailedReason': 'string',
    'DataQualitySummary': {
        'InsufficientSensorData': {
            'MissingCompleteSensorData': {
                'AffectedSensorCount': 123
            },
            'SensorsWithShortDateRange': {
                'AffectedSensorCount': 123
            }
        },
        'MissingSensorData': {
            'AffectedSensorCount': 123,
            'TotalNumberOfMissingValues': 123
        },
        'InvalidSensorData': {
            'AffectedSensorCount': 123,
            'TotalNumberOfInvalidValues': 123
        },
        'UnsupportedTimestamps': {
            'TotalNumberOfUnsupportedTimestamps': 123
        },
        'DuplicateTimestamps': {
            'TotalNumberOfDuplicateTimestamps': 123
        }
    },
    'IngestedFilesSummary': {
        'TotalNumberOfFiles': 123,
        'IngestedNumberOfFiles': 123,
        'DiscardedFiles': [
            {
                'Bucket': 'string',
                'Key': 'string'
            },
        ]
    },
    'StatusDetail': 'string',
    'IngestedDataSize': 123,
    'DataStartTime': datetime(2015, 1, 1),
    'DataEndTime': datetime(2015, 1, 1),
    'SourceDatasetArn': 'string'
}

Response Structure

  • (dict) –

    • JobId (string) –

      Indicates the job ID of the data ingestion job.

    • DatasetArn (string) –

      The Amazon Resource Name (ARN) of the dataset being used in the data ingestion job.

    • IngestionInputConfiguration (dict) –

      Specifies the S3 location configuration for the data input for the data ingestion job.

      • S3InputConfiguration (dict) –

        The location information for the S3 bucket used for input data for the data ingestion.

        • Bucket (string) –

          The name of the S3 bucket used for the input data for the data ingestion.

        • Prefix (string) –

          The prefix for the S3 location being used for the input data for the data ingestion.

        • KeyPattern (string) –

          The pattern for matching the Amazon S3 files that will be used for ingestion. If the schema was created previously without any KeyPattern, then the default KeyPattern {prefix}/{component_name}/* is used to download files from Amazon S3 according to the schema. This field is required when ingestion is being done for the first time.

          Valid Values: {prefix}/{component_name}_* | {prefix}/{component_name}/* | {prefix}/{component_name}[DELIMITER]* (Allowed delimiters : space, dot, underscore, hyphen)

    • RoleArn (string) –

      The Amazon Resource Name (ARN) of an IAM role with permission to access the data source being ingested.

    • CreatedAt (datetime) –

      The time at which the data ingestion job was created.

    • Status (string) –

      Indicates the status of the DataIngestionJob operation.

    • FailedReason (string) –

      Specifies the reason for failure when a data ingestion job has failed.

    • DataQualitySummary (dict) –

      Gives statistics about a completed ingestion job. These statistics primarily relate to quantifying incorrect data such as MissingCompleteSensorData, MissingSensorData, UnsupportedDateFormats, InsufficientSensorData, and DuplicateTimeStamps.

      • InsufficientSensorData (dict) –

        Parameter that gives information about insufficient data for sensors in the dataset. This includes information about those sensors that have complete data missing and those with a short date range.

        • MissingCompleteSensorData (dict) –

          Parameter that describes the total number of sensors that have data completely missing for it.

          • AffectedSensorCount (integer) –

            Indicates the number of sensors that have data missing completely.

        • SensorsWithShortDateRange (dict) –

          Parameter that describes the total number of sensors that have a short date range of less than 14 days of data overall.

          • AffectedSensorCount (integer) –

            Indicates the number of sensors that have less than 14 days of data.

      • MissingSensorData (dict) –

        Parameter that gives information about data that is missing over all the sensors in the input data.

        • AffectedSensorCount (integer) –

          Indicates the number of sensors that have atleast some data missing.

        • TotalNumberOfMissingValues (integer) –

          Indicates the total number of missing values across all the sensors.

      • InvalidSensorData (dict) –

        Parameter that gives information about data that is invalid over all the sensors in the input data.

        • AffectedSensorCount (integer) –

          Indicates the number of sensors that have at least some invalid values.

        • TotalNumberOfInvalidValues (integer) –

          Indicates the total number of invalid values across all the sensors.

      • UnsupportedTimestamps (dict) –

        Parameter that gives information about unsupported timestamps in the input data.

        • TotalNumberOfUnsupportedTimestamps (integer) –

          Indicates the total number of unsupported timestamps across the ingested data.

      • DuplicateTimestamps (dict) –

        Parameter that gives information about duplicate timestamps in the input data.

        • TotalNumberOfDuplicateTimestamps (integer) –

          Indicates the total number of duplicate timestamps.

    • IngestedFilesSummary (dict) –

      Gives statistics about how many files have been ingested, and which files have not been ingested, for a particular ingestion job.

      • TotalNumberOfFiles (integer) –

        Indicates the total number of files that were submitted for ingestion.

      • IngestedNumberOfFiles (integer) –

        Indicates the number of files that were successfully ingested.

      • DiscardedFiles (list) –

        Indicates the number of files that were discarded. A file could be discarded because its format is invalid (for example, a jpg or pdf) or not readable.

        • (dict) –

          Contains information about an S3 bucket.

          • Bucket (string) –

            The name of the specific S3 bucket.

          • Key (string) –

            The Amazon Web Services Key Management Service (KMS key) key being used to encrypt the S3 object. Without this key, data in the bucket is not accessible.

    • StatusDetail (string) –

      Provides details about status of the ingestion job that is currently in progress.

    • IngestedDataSize (integer) –

      Indicates the size of the ingested dataset.

    • DataStartTime (datetime) –

      Indicates the earliest timestamp corresponding to data that was successfully ingested during this specific ingestion job.

    • DataEndTime (datetime) –

      Indicates the latest timestamp corresponding to data that was successfully ingested during this specific ingestion job.

    • SourceDatasetArn (string) –

      The Amazon Resource Name (ARN) of the source dataset from which the data used for the data ingestion job was imported from.

Exceptions