Annotate

Run image detection and annotation for a batch of images

21 variables
86 variables

Run image detection and annotation for a batch of images

Authorization

To use this building block you will have to grant access to at least one of the following scopes:

  • View and manage your data across Google Cloud Platform services
  • Apply machine learning models to understand and label images

Input

This building block consumes 21 input parameters

  = Parameter name
  = Format

requests[] OBJECT

Request for performing Google Cloud Vision API tasks over a user-provided image, with user-requested features, and with context information

requests[].image OBJECT

Client image to perform Google Cloud Vision API tasks over

requests[].image.content BINARY

Image content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64

requests[].image.source OBJECT

External image source (Google Cloud Storage or web URL image location)

requests[].image.source.gcsImageUri STRING

Use image_uri instead.

The Google Cloud Storage URI of the form gs://bucket_name/object_name. Object versioning is not supported. See Google Cloud Storage Request URIs for more info

requests[].image.source.imageUri STRING

The URI of the source image. Can be either:

  1. A Google Cloud Storage URI of the form gs://bucket_name/object_name. Object versioning is not supported. See Google Cloud Storage Request URIs for more info.

  2. A publicly-accessible image HTTP/HTTPS URL. When fetching images from HTTP/HTTPS URLs, Google cannot guarantee that the request will be completed. Your request may fail if the specified host denies the request (e.g. due to request throttling or DOS prevention), or if Google throttles requests to the site for abuse prevention. You should not depend on externally-hosted images for production applications.

When both gcs_image_uri and image_uri are specified, image_uri takes precedence

requests[].features[] OBJECT

The type of Google Cloud Vision API detection to perform, and the maximum number of results to return for that type. Multiple Feature objects can be specified in the features list

requests[].features[].type ENUMERATION

The feature type

requests[].features[].maxResults INTEGER

Maximum number of results of this type. Does not apply to TEXT_DETECTION, DOCUMENT_TEXT_DETECTION, or CROP_HINTS

requests[].features[].model STRING

Model to use for the feature. Supported values: "builtin/stable" (the default if unset) and "builtin/latest"

requests[].imageContext OBJECT

Image context and/or feature-specific parameters

requests[].imageContext.cropHintsParams OBJECT

Parameters for crop hints annotation request

requests[].imageContext.cropHintsParams.aspectRatios[] FLOAT

requests[].imageContext.productSearchParams OBJECT

Parameters for a product search request

requests[].imageContext.productSearchParams.productCategories[] STRING

requests[].imageContext.productSearchParams.filter STRING

The filtering expression. This can be used to restrict search results based on Product labels. We currently support an AND of OR of key-value expressions, where each expression within an OR must have the same key. An '=' should be used to connect the key and value.

For example, "(color = red OR color = blue) AND brand = Google" is acceptable, but "(color = red OR brand = Google)" is not acceptable. "color: red" is not acceptable because it uses a ':' instead of an '='

requests[].imageContext.productSearchParams.productSet STRING

The resource name of a ProductSet to be searched for similar images.

Format is: projects/PROJECT_ID/locations/LOC_ID/productSets/PRODUCT_SET_ID

requests[].imageContext.languageHints[] STRING

requests[].imageContext.webDetectionParams OBJECT

Parameters for web detection request

requests[].imageContext.webDetectionParams.includeGeoResults BOOLEAN

Whether to include results derived from the geo information in the image

requests[].imageContext.latLongRect OBJECT

Rectangle determined by min and max LatLng pairs

Output

This building block provides 86 output parameters

  = Parameter name
  = Format

responses[] OBJECT

Response to an image annotation request

responses[].localizedObjectAnnotations[] OBJECT

Set of detected objects with bounding boxes

responses[].localizedObjectAnnotations[].languageCode STRING

The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier

responses[].localizedObjectAnnotations[].mid STRING

Object ID that should align with EntityAnnotation mid

responses[].localizedObjectAnnotations[].name STRING

Object name, expressed in its language_code language

responses[].localizedObjectAnnotations[].boundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].localizedObjectAnnotations[].score FLOAT

Score of the result. Range [0, 1]

responses[].productSearchResults OBJECT

Results for a product search request

responses[].productSearchResults.productGroupedResults[] OBJECT

Information about the products similar to a single product in a query image

responses[].productSearchResults.results[] OBJECT

Information about a product

responses[].productSearchResults.indexTime ANY

Timestamp of the index which provided these results. Products added to the product set and products removed from the product set after this time are not reflected in the current results

responses[].error OBJECT

The Status type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by gRPC. Each Status message contains three pieces of data: error code, error message, and error details.

You can find out more about this error model and how to work with it in the API Design Guide

responses[].error.code INTEGER

The status code, which should be an enum value of google.rpc.Code

responses[].error.message STRING

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client

responses[].error.details[] OBJECT

responses[].error.details[].customKey.value ANY

responses[].fullTextAnnotation OBJECT

TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol Each structural component, starting from Page, may further have their own properties. Properties describe detected languages, breaks etc.. Please refer to the TextAnnotation.TextProperty message definition below for more detail

responses[].fullTextAnnotation.text STRING

UTF-8 text detected on the pages

responses[].fullTextAnnotation.pages[] OBJECT

Detected page from OCR

responses[].textAnnotations[] OBJECT

Set of detected entity features

responses[].textAnnotations[].locale STRING

The language code for the locale in which the entity textual description is expressed

responses[].textAnnotations[].boundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].textAnnotations[].description STRING

Entity textual description, expressed in its locale language

responses[].textAnnotations[].topicality FLOAT

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1]

responses[].textAnnotations[].score FLOAT

Overall score of the result. Range [0, 1]

responses[].textAnnotations[].mid STRING

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API

responses[].textAnnotations[].confidence FLOAT

Deprecated. Use score instead. The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1]

responses[].imagePropertiesAnnotation OBJECT

Stores image properties, such as dominant colors

responses[].imagePropertiesAnnotation.dominantColors OBJECT

Set of dominant colors and their corresponding scores

responses[].logoAnnotations[] OBJECT

Set of detected entity features

responses[].logoAnnotations[].locale STRING

The language code for the locale in which the entity textual description is expressed

responses[].logoAnnotations[].boundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].logoAnnotations[].description STRING

Entity textual description, expressed in its locale language

responses[].logoAnnotations[].topicality FLOAT

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1]

responses[].logoAnnotations[].score FLOAT

Overall score of the result. Range [0, 1]

responses[].logoAnnotations[].mid STRING

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API

responses[].logoAnnotations[].confidence FLOAT

Deprecated. Use score instead. The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1]

responses[].context OBJECT

If an image was produced from a file (e.g. a PDF), this message gives information about the source of that image

responses[].context.uri STRING

The URI of the file used to produce the image

responses[].context.pageNumber INTEGER

If the file was a PDF or TIFF, this field gives the page number within the file used to produce the image

responses[].webDetection OBJECT

Relevant information for the image from the Internet

responses[].webDetection.visuallySimilarImages[] OBJECT

Metadata for online images

responses[].webDetection.bestGuessLabels[] OBJECT

Label to provide extra metadata for the web detection

responses[].webDetection.fullMatchingImages[] OBJECT

Metadata for online images

responses[].webDetection.webEntities[] OBJECT

Entity deduced from similar images on the Internet

responses[].webDetection.pagesWithMatchingImages[] OBJECT

Metadata for web pages

responses[].webDetection.partialMatchingImages[] OBJECT

Metadata for online images

responses[].safeSearchAnnotation OBJECT

Set of features pertaining to the image, computed by computer vision methods over safe-search verticals (for example, adult, spoof, medical, violence)

responses[].safeSearchAnnotation.adult ENUMERATION

Represents the adult content likelihood for the image. Adult content may contain elements such as nudity, pornographic images or cartoons, or sexual activities

responses[].safeSearchAnnotation.spoof ENUMERATION

Spoof likelihood. The likelihood that an modification was made to the image's canonical version to make it appear funny or offensive

responses[].safeSearchAnnotation.medical ENUMERATION

Likelihood that this is a medical image

responses[].safeSearchAnnotation.racy ENUMERATION

Likelihood that the request image contains racy content. Racy content may include (but is not limited to) skimpy or sheer clothing, strategically covered nudity, lewd or provocative poses, or close-ups of sensitive body areas

responses[].safeSearchAnnotation.violence ENUMERATION

Likelihood that this image contains violent content

responses[].landmarkAnnotations[] OBJECT

Set of detected entity features

responses[].landmarkAnnotations[].locale STRING

The language code for the locale in which the entity textual description is expressed

responses[].landmarkAnnotations[].boundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].landmarkAnnotations[].description STRING

Entity textual description, expressed in its locale language

responses[].landmarkAnnotations[].topicality FLOAT

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1]

responses[].landmarkAnnotations[].score FLOAT

Overall score of the result. Range [0, 1]

responses[].landmarkAnnotations[].mid STRING

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API

responses[].landmarkAnnotations[].confidence FLOAT

Deprecated. Use score instead. The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1]

responses[].faceAnnotations[] OBJECT

A face annotation object contains the results of face detection

responses[].faceAnnotations[].tiltAngle FLOAT

Pitch angle, which indicates the upwards/downwards angle that the face is pointing relative to the image's horizontal plane. Range [-180,180]

responses[].faceAnnotations[].fdBoundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].faceAnnotations[].surpriseLikelihood ENUMERATION

Surprise likelihood

responses[].faceAnnotations[].angerLikelihood ENUMERATION

Anger likelihood

responses[].faceAnnotations[].landmarkingConfidence FLOAT

Face landmarking confidence. Range [0, 1]

responses[].faceAnnotations[].joyLikelihood ENUMERATION

Joy likelihood

responses[].faceAnnotations[].underExposedLikelihood ENUMERATION

Under-exposed likelihood

responses[].faceAnnotations[].panAngle FLOAT

Yaw angle, which indicates the leftward/rightward angle that the face is pointing relative to the vertical plane perpendicular to the image. Range [-180,180]

responses[].faceAnnotations[].detectionConfidence FLOAT

Detection confidence. Range [0, 1]

responses[].faceAnnotations[].blurredLikelihood ENUMERATION

Blurred likelihood

responses[].faceAnnotations[].headwearLikelihood ENUMERATION

Headwear likelihood

responses[].faceAnnotations[].boundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].faceAnnotations[].rollAngle FLOAT

Roll angle, which indicates the amount of clockwise/anti-clockwise rotation of the face relative to the image vertical about the axis perpendicular to the face. Range [-180,180]

responses[].faceAnnotations[].sorrowLikelihood ENUMERATION

Sorrow likelihood

responses[].cropHintsAnnotation OBJECT

Set of crop hints that are used to generate new crops when serving images

responses[].cropHintsAnnotation.cropHints[] OBJECT

Single crop hint that is used to generate a new crop when serving an image

responses[].labelAnnotations[] OBJECT

Set of detected entity features

responses[].labelAnnotations[].locale STRING

The language code for the locale in which the entity textual description is expressed

responses[].labelAnnotations[].boundingPoly OBJECT

A bounding polygon for the detected image annotation

responses[].labelAnnotations[].description STRING

Entity textual description, expressed in its locale language

responses[].labelAnnotations[].topicality FLOAT

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1]

responses[].labelAnnotations[].score FLOAT

Overall score of the result. Range [0, 1]

responses[].labelAnnotations[].mid STRING

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API

responses[].labelAnnotations[].confidence FLOAT

Deprecated. Use score instead. The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1]