kendra / Client / retrieve

retrieve#

kendra.Client.retrieve(**kwargs)#

Retrieves relevant passages or text excerpts given an input query.

This API is similar to the Query API. However, by default, the Query API only returns excerpt passages of up to 100 token words. With the Retrieve API, you can retrieve longer passages of up to 200 token words and up to 100 semantically relevant passages. This doesn’t include question-answer or FAQ type responses from your index. The passages are text excerpts that can be semantically extracted from multiple documents and multiple parts of the same document. If in extreme cases your documents produce zero passages using the Retrieve API, you can alternatively use the Query API and its types of responses.

You can also do the following:

  • Override boosting at the index level

  • Filter based on document fields or attributes

  • Filter based on the user or their group access to documents

  • View the confidence score bucket for a retrieved passage result. The confidence bucket provides a relative ranking that indicates how confident Amazon Kendra is that the response is relevant to the query.

Note

Confidence score buckets are currently available only for English.

You can also include certain fields in the response that might provide useful additional information.

The Retrieve API shares the number of query capacity units that you set for your index. For more information on what’s included in a single capacity unit and the default base capacity for an index, see Adjusting capacity.

See also: AWS API Documentation

Request Syntax

response = client.retrieve(
    IndexId='string',
    QueryText='string',
    AttributeFilter={
        'AndAllFilters': [
            {'... recursive ...'},
        ],
        'OrAllFilters': [
            {'... recursive ...'},
        ],
        'NotFilter': {'... recursive ...'},
        'EqualsTo': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        },
        'ContainsAll': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        },
        'ContainsAny': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        },
        'GreaterThan': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        },
        'GreaterThanOrEquals': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        },
        'LessThan': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        },
        'LessThanOrEquals': {
            'Key': 'string',
            'Value': {
                'StringValue': 'string',
                'StringListValue': [
                    'string',
                ],
                'LongValue': 123,
                'DateValue': datetime(2015, 1, 1)
            }
        }
    },
    RequestedDocumentAttributes=[
        'string',
    ],
    DocumentRelevanceOverrideConfigurations=[
        {
            'Name': 'string',
            'Relevance': {
                'Freshness': True|False,
                'Importance': 123,
                'Duration': 'string',
                'RankOrder': 'ASCENDING'|'DESCENDING',
                'ValueImportanceMap': {
                    'string': 123
                }
            }
        },
    ],
    PageNumber=123,
    PageSize=123,
    UserContext={
        'Token': 'string',
        'UserId': 'string',
        'Groups': [
            'string',
        ],
        'DataSourceGroups': [
            {
                'GroupId': 'string',
                'DataSourceId': 'string'
            },
        ]
    }
)
Parameters:
  • IndexId (string) –

    [REQUIRED]

    The identifier of the index to retrieve relevant passages for the search.

  • QueryText (string) –

    [REQUIRED]

    The input query text to retrieve relevant passages for the search. Amazon Kendra truncates queries at 30 token words, which excludes punctuation and stop words. Truncation still applies if you use Boolean or more advanced, complex queries. For example, Timeoff AND October AND Category:HR is counted as 3 tokens: timeoff, october, hr. For more information, see Searching with advanced query syntax in the Amazon Kendra Developer Guide.

  • AttributeFilter (dict) –

    Filters search results by document fields/attributes. You can only provide one attribute filter; however, the AndAllFilters, NotFilter, and OrAllFilters parameters contain a list of other filters.

    The AttributeFilter parameter means you can create a set of filtering rules that a document must satisfy to be included in the query results.

    • AndAllFilters (list) –

      Performs a logical AND operation on all filters that you specify.

      • (dict) –

        Filters the search results based on document attributes or fields.

        You can filter results using attributes for your particular documents. The attributes must exist in your index. For example, if your documents include the custom attribute “Department”, you can filter documents that belong to the “HR” department. You would use the EqualsTo operation to filter results or documents with “Department” equals to “HR”.

        You can use AndAllFilters and AndOrFilters in combination with each other or with other operations such as EqualsTo. For example:

        AndAllFilters

        • EqualsTo: “Department”, “HR”

        • AndOrFilters

          • ContainsAny: “Project Name”, [“new hires”, “new hiring”]

        This example filters results or documents that belong to the HR department and belong to projects that contain “new hires” or “new hiring” in the project name (must use ContainAny with StringListValue). This example is filtering with a depth of 2.

        You cannot filter more than a depth of 2, otherwise you receive a ValidationException exception with the message “AttributeFilter cannot have a depth of more than 2.” Also, if you use more than 10 attribute filters in a given list for AndAllFilters or OrAllFilters, you receive a ValidationException with the message “AttributeFilter cannot have a length of more than 10”.

        For examples of using AttributeFilter, see Using document attributes to filter search results.

    • OrAllFilters (list) –

      Performs a logical OR operation on all filters that you specify.

      • (dict) –

        Filters the search results based on document attributes or fields.

        You can filter results using attributes for your particular documents. The attributes must exist in your index. For example, if your documents include the custom attribute “Department”, you can filter documents that belong to the “HR” department. You would use the EqualsTo operation to filter results or documents with “Department” equals to “HR”.

        You can use AndAllFilters and AndOrFilters in combination with each other or with other operations such as EqualsTo. For example:

        AndAllFilters

        • EqualsTo: “Department”, “HR”

        • AndOrFilters

          • ContainsAny: “Project Name”, [“new hires”, “new hiring”]

        This example filters results or documents that belong to the HR department and belong to projects that contain “new hires” or “new hiring” in the project name (must use ContainAny with StringListValue). This example is filtering with a depth of 2.

        You cannot filter more than a depth of 2, otherwise you receive a ValidationException exception with the message “AttributeFilter cannot have a depth of more than 2.” Also, if you use more than 10 attribute filters in a given list for AndAllFilters or OrAllFilters, you receive a ValidationException with the message “AttributeFilter cannot have a length of more than 10”.

        For examples of using AttributeFilter, see Using document attributes to filter search results.

    • NotFilter (dict) –

      Performs a logical NOT operation on all filters that you specify.

    • EqualsTo (dict) –

      Performs an equals operation on document attributes/fields and their values.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • ContainsAll (dict) –

      Returns true when a document contains all of the specified document attributes/fields. This filter is only applicable to StringListValue.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • ContainsAny (dict) –

      Returns true when a document contains any of the specified document attributes/fields. This filter is only applicable to StringListValue.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • GreaterThan (dict) –

      Performs a greater than operation on document attributes/fields and their values. Use with the document attribute type Date or Long.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • GreaterThanOrEquals (dict) –

      Performs a greater or equals than operation on document attributes/fields and their values. Use with the document attribute type Date or Long.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LessThan (dict) –

      Performs a less than operation on document attributes/fields and their values. Use with the document attribute type Date or Long.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LessThanOrEquals (dict) –

      Performs a less than or equals operation on document attributes/fields and their values. Use with the document attribute type Date or Long.

      • Key (string) – [REQUIRED]

        The identifier for the attribute.

      • Value (dict) – [REQUIRED]

        The value of the attribute.

        • StringValue (string) –

          A string, such as “department”.

        • StringListValue (list) –

          A list of strings. The default maximum length or number of strings is 10.

          • (string) –

        • LongValue (integer) –

          A long integer value.

        • DateValue (datetime) –

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

  • RequestedDocumentAttributes (list) –

    A list of document fields/attributes to include in the response. You can limit the response to include certain document fields. By default, all document fields are included in the response.

    • (string) –

  • DocumentRelevanceOverrideConfigurations (list) –

    Overrides relevance tuning configurations of fields/attributes set at the index level.

    If you use this API to override the relevance tuning configured at the index level, but there is no relevance tuning configured at the index level, then Amazon Kendra does not apply any relevance tuning.

    If there is relevance tuning configured for fields at the index level, and you use this API to override only some of these fields, then for the fields you did not override, the importance is set to 1.

    • (dict) –

      Overrides the document relevance properties of a custom index field.

      • Name (string) – [REQUIRED]

        The name of the index field.

      • Relevance (dict) – [REQUIRED]

        Provides information for tuning the relevance of a field in a search. When a query includes terms that match the field, the results are given a boost in the response based on these tuning parameters.

        • Freshness (boolean) –

          Indicates that this field determines how “fresh” a document is. For example, if document 1 was created on November 5, and document 2 was created on October 31, document 1 is “fresher” than document 2. Only applies to DATE fields.

        • Importance (integer) –

          The relative importance of the field in the search. Larger numbers provide more of a boost than smaller numbers.

        • Duration (string) –

          Specifies the time period that the boost applies to. For example, to make the boost apply to documents with the field value within the last month, you would use “2628000s”. Once the field value is beyond the specified range, the effect of the boost drops off. The higher the importance, the faster the effect drops off. If you don’t specify a value, the default is 3 months. The value of the field is a numeric string followed by the character “s”, for example “86400s” for one day, or “604800s” for one week.

          Only applies to DATE fields.

        • RankOrder (string) –

          Determines how values should be interpreted.

          When the RankOrder field is ASCENDING, higher numbers are better. For example, a document with a rating score of 10 is higher ranking than a document with a rating score of 1.

          When the RankOrder field is DESCENDING, lower numbers are better. For example, in a task tracking application, a priority 1 task is more important than a priority 5 task.

          Only applies to LONG fields.

        • ValueImportanceMap (dict) –

          A list of values that should be given a different boost when they appear in the result list. For example, if you are boosting a field called “department”, query terms that match the department field are boosted in the result. However, you can add entries from the department field to boost documents with those values higher.

          For example, you can add entries to the map with names of departments. If you add “HR”,5 and “Legal”,3 those departments are given special attention when they appear in the metadata of a document. When those terms appear they are given the specified importance instead of the regular importance for the boost.

          • (string) –

            • (integer) –

  • PageNumber (integer) – Retrieved relevant passages are returned in pages the size of the PageSize parameter. By default, Amazon Kendra returns the first page of results. Use this parameter to get result pages after the first one.

  • PageSize (integer) – Sets the number of retrieved relevant passages that are returned in each page of results. The default page size is 10. The maximum number of results returned is 100. If you ask for more than 100 results, only 100 are returned.

  • UserContext (dict) –

    The user context token or user and group information.

    • Token (string) –

      The user context token for filtering search results for a user. It must be a JWT or a JSON token.

    • UserId (string) –

      The identifier of the user you want to filter search results based on their access to documents.

    • Groups (list) –

      The list of groups you want to filter search results based on the groups’ access to documents.

      • (string) –

    • DataSourceGroups (list) –

      The list of data source groups you want to filter search results based on groups’ access to documents in that data source.

      • (dict) –

        Data source information for user context filtering.

        • GroupId (string) – [REQUIRED]

          The identifier of the group you want to add to your list of groups. This is for filtering search results based on the groups’ access to documents.

        • DataSourceId (string) – [REQUIRED]

          The identifier of the data source group you want to add to your list of data source groups. This is for filtering search results based on the groups’ access to documents in that data source.

Return type:

dict

Returns:

Response Syntax

{
    'QueryId': 'string',
    'ResultItems': [
        {
            'Id': 'string',
            'DocumentId': 'string',
            'DocumentTitle': 'string',
            'Content': 'string',
            'DocumentURI': 'string',
            'DocumentAttributes': [
                {
                    'Key': 'string',
                    'Value': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
            ],
            'ScoreAttributes': {
                'ScoreConfidence': 'VERY_HIGH'|'HIGH'|'MEDIUM'|'LOW'|'NOT_AVAILABLE'
            }
        },
    ]
}

Response Structure

  • (dict) –

    • QueryId (string) –

      The identifier of query used for the search. You also use QueryId to identify the search when using the Submitfeedback API.

    • ResultItems (list) –

      The results of the retrieved relevant passages for the search.

      • (dict) –

        A single retrieved relevant passage result.

        • Id (string) –

          The identifier of the relevant passage result.

        • DocumentId (string) –

          The identifier of the document.

        • DocumentTitle (string) –

          The title of the document.

        • Content (string) –

          The contents of the relevant passage.

        • DocumentURI (string) –

          The URI of the original location of the document.

        • DocumentAttributes (list) –

          An array of document fields/attributes assigned to a document in the search results. For example, the document author ( _author) or the source URI ( _source_uri) of the document.

          • (dict) –

            A document attribute or metadata field. To create custom document attributes, see Custom attributes.

            • Key (string) –

              The identifier for the attribute.

            • Value (dict) –

              The value of the attribute.

              • StringValue (string) –

                A string, such as “department”.

              • StringListValue (list) –

                A list of strings. The default maximum length or number of strings is 10.

                • (string) –

              • LongValue (integer) –

                A long integer value.

              • DateValue (datetime) –

                A date expressed as an ISO 8601 string.

                It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

        • ScoreAttributes (dict) –

          The confidence score bucket for a retrieved passage result. The confidence bucket provides a relative ranking that indicates how confident Amazon Kendra is that the response is relevant to the query.

          • ScoreConfidence (string) –

            A relative ranking for how relevant the response is to the query.

Exceptions