HTTP Client

The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field in the record. You can use the HTTP Client processor to perform a range of standard requests or you can use an expression to determine the request for each record.

When you configure the HTTP Client, you define the resource URL, header attributes, and method to use. For some methods, you can specify the request body and default content type.

You can configure the processor to include response header fields in the record as a set of record header attributes or as a map in a record field. You can configure the processor to log request and response information. And you can write the resolved request URL to the Data Collector log.

You can also configure the timeout, request transfer encoding, maximum number of parallel requests, and authentication type. You can optionally use an HTTP proxy and configure SSL/TLS properties.

You can also configure the processor to use the OAuth 2 protocol to connect to an HTTP service.

HTTP Method

You can use the following methods with the HTTP Client processor:

GET
PUT
POST
DELETE
HEAD
PATCH
Expression - An expression that evaluates to one of the other methods.

Expression Method

The Expression method allows you to write an expression that evaluates to a standard HTTP method. Use the Expression method to generate a workflow. For example, you can use an expression that performs a lookup (GET) or passes data to the server (PUT) based on the data in a field.

Parallel Requests

The HTTP Client processor sends multiple requests at a time. To preserve record order, the processor waits until all requests for the entire batch are completed before processing the next batch.

You can specify the maximum number of parallel requests. Default is 1. Increasing the number of parallel requests can improve performance but increases the load on the server. Network latency can also significantly impact the performance of this processor.

OAuth 2 Authorization

You can configure the HTTP Client processor to use the OAuth 2 protocol to connect to an HTTP service that uses basic, digest, or universal authentication, OAuth 2 client credentials, OAuth 2 username and password, or OAuth 2 JSON Web Tokens (JWT).

The OAuth 2 protocol authorizes third-party access to HTTP service resources without sharing credentials. The HTTP Client processor uses credentials to request an access token from the service. The service returns the token to the processor, and then the processor includes the token in a header in each request to the resource URL.

The credentials that you enter to request an access token depend on the credentials grant type required by the HTTP service. You can define the following OAuth 2 credentials grant types for HTTP Client:

Client credentials grant

HTTP Client sends its own credentials - the client ID and client secret or the basic, digest, or universal authentication credentials - to the HTTP service. For example, use the client credentials grant to process data from the Twitter API or from the Microsoft Azure Active Directory (Azure AD) API.

For more information about the client credentials grant, see https://tools.ietf.org/html/rfc6749#section-4.4.

Resource owner password credentials grant

HTTP Client sends the credentials for the resource owner - the resource owner username and password - to the HTTP service. Or, you can use this grant type to migrate existing clients using basic, digest, or universal authentication to OAuth 2 by converting the stored credentials to an access token.

For example, use this grant to process data from the Getty Images API. For more information about using OAuth 2 to connect to the Getty Images API, see http://developers.gettyimages.com/api/docs/v3/oauth2.html.

For more information about the resource owner password credentials grant, see https://tools.ietf.org/html/rfc6749#section-4.3.

JSON Web Tokens (JWT)

HTTP Client sends a JSON-based security token encoding to the HTTP service. For example, use JSON Web Tokens to process data from the Google API.

Let’s look at some examples of how to configure authentication and OAuth 2 authorization to process data from Twitter, Microsoft Azure AD, and Google APIs.

Example for Twitter

To use OAuth 2 authorization to read from Twitter, configure HTTP Client to use basic authentication and the client credentials grant.

For more information about configuring OAuth 2 authorization for Twitter, see https://dev.twitter.com/oauth/application-only.

On the HTTP tab, set Authentication Type to Basic, and then select Use OAuth 2.
On the Credentials tab, enter the Twitter consumer key and consumer secret for the Username and Password properties.

Tip: To secure sensitive information such as the consumer key and secret, you can use runtime resources or credential stores.
On the OAuth 2 tab, select Client Credentials Grant for the grant type.
In the Token URL property, enter the following URL used to request the access token:
```
https://api.twitter.com/oauth2/token
```

The following image shows the OAuth 2 tab configured for Twitter:

Example for Microsoft Azure AD

To use OAuth 2 authorization to read from Microsoft Azure AD, configure HTTP Client to use no authentication and the client credentials grant.

Note: This example uses Microsoft Azure AD version 1.0.

For more information about configuring OAuth 2 authorization for Microsoft Azure AD, see https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-protocols-oauth-code.

On the HTTP tab, set Authentication Type to None, and then select Use OAuth 2.
On the OAuth 2 tab, select Client Credentials Grant for the grant type.
In the Token URL property, enter the following URL used to request the access token:
```
https://login.microsoftonline.com/<tenant identifier>/oauth2/token
```
Where <tenant identifier> is the Azure AD tenant identifier.
Enter the OAuth 2 client ID and secret.

The client ID is the Application Id assigned to your app when you registered it with Azure AD, found in the Azure Classic Portal.

The client secret is the application secret that you created in the app registration portal for your app.

Tip: To secure sensitive information such as the client ID and secret, you can use runtime resources or credential stores.
Add any key-value pairs that the HTTP service requires in the token request.
In our example, we are accessing the graph.microsoft.com API in our resource URL, so we need to add the following key-value pair:
```
resource : https://graph.microsoft.com/
```

The following image shows the OAuth 2 tab configured for Microsoft Azure AD version 1.0:

Example for Google

To use OAuth 2 authorization to read from Google service accounts, configure HTTP Client to use no authentication and the JSON Web Tokens grant.

For more information about configuring OAuth 2 authorization for Google, see https://developers.google.com/identity/protocols/OAuth2.

On the HTTP tab, set Authentication Type to None, and then select Use OAuth 2.
On the OAuth 2 tab, select JSON Web Tokens for the grant type.
In the Token URL property, enter the following URL used to request the access token:
```
https://www.googleapis.com/oauth2/v4/token
```
Select the following algorithm to sign the JWT: RSASSA-PKCS-v1_5 using SHA-256.
Enter the Base64 encoded key used to sign the JWT.

To access the key, download the JSON key file when you generate the Google credentials. Locate the "private_key" field in the file, which contains a string version of the key. Copy the string into the JWT Signing Key property, and then replace all "\n" literals with new lines.

Tip: To secure sensitive information such as the JWT signing key, you can use runtime resources or credential stores.
In the JWT Claims property, enter the required claims to use with the JWT token request, in JSON format.
For a list of the required claims for Google service accounts, see https://developers.google.com/identity/protocols/OAuth2ServiceAccount#creatingjwt.
For example, enter the claims in the following JSON format:
```
{
   "iss":"my_name@my_account.iam.gserviceaccount.com",
   "scope":"https://www.googleapis.com/auth/drive",
   "aud":"https://www.googleapis.com/oauth2/v4/token",
   "exp":${(time:dateTimeToMilliseconds(time:now())/1000) + 50 * MINUTES},
   "iat":${time:dateTimeToMilliseconds(time:now())/1000}
}
```
You can include the expression language in the JWT claims. For example, in the sample claim above, both the "exp" (expiration time) claim and the "iat" (issued at) claim include Data Collector time functions to set the expiration time and the issue time.

Tip: Google access tokens expire after 60 minutes. As a result, set the expiration time claim to be slightly less than 60 minutes so that HTTP Client can request a new token within the time limit.

The following image shows the OAuth 2 tab configured for Google service accounts:

Generated Records

The HTTP Client processor generates records based on the responses it receives.

Data in the response body is parsed based on the selected data format. For HEAD responses, when the response body contains no data, the processor creates an empty record. Information returned from the HEAD appear in record header attributes. For all other methods, when the response body contains no data, no records are created.

Including Response Headers

The HTTP Client processor can include response headers in records. You can write the data to a record as follows:

Record header attributes: The processor writes data in response headers to corresponding record header attributes.; When writing to record header attributes, you define a prefix for the attributes to avoid name conflicts. The processor then appends the response header name to the prefix.; For example, if you define "info-" as the prefix, then data in the content-encoding response header is written to the "info-content-encoding" record header attribute.; For general information about record header attributes, see Record Header Attributes.
Record field: The processor can also write the response headers to a field in the record. The processor writes the response headers to the record field as a map of key-value pairs where the key is the response header name.; When writing to a field, you specify the field to use. If the field includes data, the processor overwrites the data. If the field does not exist, the processor creates the field.

Logging Request and Response Data

The HTTP Client processor can log request and response data to the Data Collector log.

When enabling logging, you configure the following properties:

Verbosity

The type of data to include in logged messages:

Headers_Only - Includes request and response headers.
Payload_Text - Includes request and response headers as well as any text payloads.
Payload_Any - Includes request and response headers and the payload, regardless of type.

Log Level

The level of messages to include in the Data Collector log. When you select a level, higher level messages are also logged. That is, if you select the Warning log level, then Severe and Warning messages are written to the Data Collector log.

Note: The configured log level for Data Collector can limit the level of detail that is logged. For example, if you set the log level to Finest to log detailed trace information, but Data Collector is configured for ERROR, then only Severe level messages are written by the origin.

The following table describes the stage log levels and the corresponding Data Collector log levels needed to enable the logging:

Stage Log Level	Data Collector	Description
Severe	ERROR	Only messages indicating serious failures.
Warning	WARN	Messages warning of potential problems.
Info	INFO	Informational messages.
Fine	DEBUG	Basic tracing information.
Finer	DEBUG	Detailed tracing information.
Finest	TRACE	Highly detailed tracing information.

The name of this stage logger is com.streamsets.http.RequestLogger.

Maximum entity size

The maximum size of message data to write to the log. Use to limit the volume of data written to the Data Collector log for any single message.

Logging the Resolved Resource URL

You can write the resolved resource URL to the Data Collector log.

The resolved resource URL is the URL that is defined in the Resource URL property after resolving any expressions included in the URL.

For example, say you configure the Resource URL property with the following URL:

https://api.twitter.com/1.1/search/tweets.json?q=${record:value('/text')}

This allows building the URL based on information in the /text field of each record. So, when a record contains %23DataOps in the /text field, then the resolved URL is:

https://api.twitter.com/1.1/search/tweets.json?q=%23DataOps

To write the resolved resource URL to the Data Collector log, set the Data Collector log level to DEBUG or higher. You do not need to use the Enable Request Logging property in the processor to log the resolved resource URL.

Data Formats

The HTTP Client processor writes the server response to the specified field in the record.

The processor treats the response as the specified data format and processes data differently based on the data format:

JSON: Writes a parsed JSON response to the specified field.; When an object exceeds the specified maximum object length, the processor processes the object based on the error handling configured for the stage.
Text: Writes the response as a string to the specified field.; When a line exceeds the specified maximum line length, the processor truncates the line. The processor adds a boolean field named Truncated to indicate if the line was truncated.
XML: Writes the XML response to the specified field.; When specified, the processor uses the delimiter element to nest additional elements in the field.; When a record exceeds the specified maximum record length, the processor skips the record and continues processing with the next record. It sends the skipped record to the pipeline for error handling.

Configuring HTTP Client Processor

Configure an HTTP Client processor to perform requests against a resource URL.

In the Properties panel, on the General tab, configure the following properties:

General Property	Description
Name	Stage name.
Description	Optional description.
Required Fields	Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses. Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions	Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the pipeline for error handling. Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.

On the HTTP tab, configure the following properties:

HTTP Property	Description
Output Field	Field to use for the response. You can use a new or existing field.
Header Output Location	Location to write response header field information.
Header Attribute Prefix	Prefix to use when writing response header field information to record header attributes.
Header Output Field	Field to use when writing response header field information to a field in the record.
Resource URL	HTTP resource URL.
Headers	The headers to include in the request. Using simple or bulk edit mode, click the Add icon to add additional headers.
HTTP Method	HTTP request method. Use one of the standard methods or use Expression to enter an expression.
HTTP Method Expression	Expression that evaluates to a standard HTTP method. Used for the Expression method only.
Request Data	Request data to use with the specified method. Available for the PUT, POST, DELETE, and Expression methods.
Default Request Content Type	Content-Type header to include in the request. Used only when the Content-Type header is not present. Available for the PUT, POST, DELETE, and Expression methods. Default is application/json.
Request Transfer Encoding	Use one of the following encoding types: Buffered - The standard transfer encoding type. Chunked - Transfers data in chunks. Not supported by all servers. Default is Buffered.
Connect Timeout	Maximum number of milliseconds to wait for a connection. Use 0 to wait indefinitely.
Read Timeout	Maximum number of milliseconds to wait for data. Use 0 to wait indefinitely.
Maximum Parallel Requests	Maximum number of requests to send to the server at one time.
Authentication Type	Determines the authentication type used to connect to the server: None - Performs no authentication. Basic - Uses basic authentication. Requires a username and password. Use with HTTPS to avoid passing unencrypted credentials. Digest - Uses digest authentication. Requires a username and password. Universal - Makes an anonymous connection, then provides authentication credentials upon receiving a 401 status and a WWW-Authenticate header request. Requires a username and password associated with basic or digest authentication. Use only with servers that respond to this workflow. OAuth - Uses OAuth 1.0 authentication. Requires OAuth credentials.
Use OAuth 2	Enables using OAuth 2 authorization to request access tokens. You can use OAuth 2 authorization with none, basic, digest, or universal authentication.
Use Proxy	Enables using an HTTP proxy to connect to the system.
Maximum Request Time	Maximum number of seconds to wait for a request to complete.
Rate Limit	Minimum amount of time between requests in milliseconds. Set a rate limit when sending requests to a rate-limited API. Default is 0, which means there is no delay between requests.

When using authentication, on the Credentials tab, configure the following properties:

Credentials Property	Description
Username	User name for basic, digest, or universal authentication.
Password	Password for basic, digest, or universal authentication. Tip: To secure sensitive information such as usernames and passwords, you can use runtime resources or credential stores.
Consumer Key	Consumer key for OAuth 1.0 authentication.
Consumer Secret	Consumer secret for OAuth 1.0 authentication.
Token	Consumer token for OAuth 1.0 authentication.
Token Secret	Token secret for OAuth 1.0 authentication.

When using OAuth 2 authorization, on the OAuth 2 tab, configure the following properties.

For more information about OAuth 2 and for example OAuth 2 configurations to read from Twitter, Microsoft Azure AD, or Google APIs, see OAuth 2 Authorization.

OAuth 2 Property	Description
Credentials Grant Type	Type of client credentials grant type required by the HTTP service: Client credentials grant Resource owner password credentials grant JSON Web Tokens (JWT)
Token URL	URL to request the access token.
Client ID	Client ID that the HTTP service uses to identify the HTTP client. Enter for the client credentials grant that uses a client ID and secret for authentication. Or, for the resource owner password credentials grant that requires a client ID and secret.
Client Secret	Client secret that the HTTP service uses to authenticate the HTTP client. Enter for the client credentials grant that uses a client ID and secret for authentication. Or, for the resource owner password credentials grant that requires a client ID and secret. Tip: To secure sensitive information such as the client ID and secret, you can use runtime resources or credential stores.
User Name	Resource owner user name. Enter for the resource owner password credentials grant.
Password	Resource owner password. Enter for the resource owner password credentials grant. Tip: To secure sensitive information such as usernames and passwords, you can use runtime resources or credential stores.
JWT Signing Algorithm	Algorithm used to sign the JSON Web Token (JWT). Default is none. Enter for the JSON Web Tokens grant.
JWT Signing Key	Base64 encoded key used to sign the JSON Web Token, if you selected a signing algorithm. Tip: To secure sensitive information such as the JWT signing key, you can use runtime resources or credential stores. Enter for the JSON Web Tokens grant.
JWT Claims	Claims to use in the JSON Web Token request, entered in JSON format. Enter each claim required by the HTTP service. You can include the expression language in the JWT claims. For example, to read from Google service accounts, enter the following claims with the appropriate values: `{ "iss":"my_name@my_account.iam.gserviceaccount.com", "scope":"https://www.googleapis.com/auth/drive", "aud":"https://www.googleapis.com/oauth2/v4/token", "exp":${(time:dateTimeToMilliseconds(time:now())/1000) + 50 * 60}, "iat":${time:dateTimeToMilliseconds(time:now())/1000} }` Enter for the JSON Web Tokens grant.
Request Transfer Encoding	Form of encoding to use when the stage requests an access token: buffered or chunked. Default is chunked.
Additional Key-Value Pairs	Optional key-value pairs to send to the token URL when requesting an access token. For example, you can define the OAuth 2 `scope` request parameter. Using simple or bulk edit mode, click the Add icon to add additional key-value pairs.

To use an HTTP proxy, on the Proxy tab, configure the following properties:

HTTP Proxy Property	Description
Proxy URI	Proxy URI.
Username	Proxy user name.
Password	Proxy password. Tip: To secure sensitive information such as usernames and passwords, you can use runtime resources or credential stores.

To use SSL/TLS, on the TLS tab, configure the following properties:

TLS Property	Description
Use TLS	Enables the use of TLS.
Keystore File	The path to the keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES. For more information about environment variables, see Data Collector Environment Configuration. By default, no keystore is used.
Keystore Type	Type of keystore to use. Use one of the following types: Java Keystore File (JKS) PKCS #12 (p12 file) Default is Java Keystore File (JKS).
Keystore Password	Password to the keystore file. A password is optional, but recommended. Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
Keystore Key Algorithm	The algorithm used to manage the keystore. Default is SunX509.
Truststore File	The path to the truststore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES. For more information about environment variables, see Data Collector Environment Configuration. By default, no truststore is used.
Truststore Type	Type of truststore to use. Use one of the following types: Java Keystore File (JKS) PKCS #12 (p12 file) Default is Java Keystore File (JKS).
Truststore Password	Password to the truststore file. A password is optional, but recommended. Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
Truststore Trust Algorithm	The algorithm used to manage the truststore. Default is SunX509.
Use Default Protocols	Determines the transport layer security (TLS) protocol to use. The default protocol is TLSv1.2. To use a different protocol, clear this option.
Transport Protocols	The TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name. You can use simple or bulk edit mode to add protocols. Note: Older protocols are not as secure as TLSv1.2.
Use Default Cipher Suites	Determines the cipher suite to use when performing the SSL/TLS handshake. Data Collector provides a set of cipher suites that it can use by default. For a full list, see Cipher Suites.
Cipher Suites	Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites. Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.

On the Data Format tab, configure the following property:
Data Format Property Description
Data Format
Data format of the response:

JSON

Text

XML

Data Format Property	Description
Data Format	Data format of the response: JSON Text XML

For JSON data, on the Data Format tab, configure the following properties:

JSON Property	Description
Maximum Object Length (chars)	Maximum number of characters in a JSON object. Longer objects are diverted to the pipeline for error handling. This property can be limited by the Data Collector parser buffer size. For more information, see Maximum Record Size.
Charset	Character encoding of the data to be processed.
Ignore Ctrl Characters	Removes all ASCII control characters except for the tab, line feed, and carriage return characters.

JSON Property

Description

Maximum Object Length (chars)

Maximum number of characters in a JSON object.

Longer objects are diverted to the pipeline for error handling.

This property can be limited by the Data Collector parser buffer size. For more information, see Maximum Record Size.

Charset

Character encoding of the data to be processed.

Ignore Ctrl Characters

Removes all ASCII control characters except for the tab, line feed, and carriage return characters.

For text data, on the Data Format tab, configure the following properties:

Text Property	Description
Max Line Length	Maximum number of characters allowed for a line. Longer lines are truncated. Adds a boolean field to the record to indicate if it was truncated. The field name is Truncated. This property can be limited by the Data Collector parser buffer size. For more information, see Maximum Record Size.
Use Custom Delimiter	Uses custom delimiters to define records instead of line breaks.
Custom Delimiter	One or more characters to use to define records.
Include Custom Delimiter	Includes delimiter characters in the record.
Charset	Character encoding of the files to be processed.
Ignore Ctrl Characters	Removes all ASCII control characters except for the tab, line feed, and carriage return characters.

For XML data, on the Data Format tab, configure the following properties:

XML Property	Description
Delimiter Element	XML element that acts as a delimiter. Omit a delimiter to treat the entire XML document as one field.
Max Record Length (chars)	The maximum number of characters in a field. Longer fields are diverted to the pipeline for error handling. This property can be limited by the Data Collector parser buffer size. For more information, see Maximum Record Size.
Charset	Character encoding of the files to be processed.
Ignore Ctrl Characters	Removes all ASCII control characters except for the tab, line feed, and carriage return characters.

On the Logging tab, configure the following properties to log request and response data:

Logging Property	Description
Enable Request Logging	Enables logging request and response data. For information about logging the resolved resource URL, see Logging the Resolved Resource URL.
Log Level	The level of detail to be logged. Choose one of the available options. The following list is in order of lowest to highest level of logging. When you select a level, messages generated by the levels above the selected level are also written to the log: Severe - Only messages indicating serious failures. Warning - Messages warning of potential problems. Info - Informational messages. Fine - Basic tracing information. Finer - Detailed tracing information. Finest - Highly detailed tracing information. Note: The log level configured for Data Collector can limit the level of messages that the stage writes. Verify that the Data Collector log level supports the level that you want to use.
Verbosity	The type of data to include in logged messages: Headers_Only - Includes request and response headers. Payload_Text - Includes request and response headers as well as any text payloads. Payload_Any - Includes request and response headers and the payload, regardless of type.
Max Entity Size	The maximum size of message data to write to the log. Use to limit the volume of data written to the Data Collector log for any single message.