Index Terms

A
- ADLS Gen1 destination
  - configuring[1]
  - data formats[1]
  - overview[1]
  - partitions[1][2]
  - prerequisites[1]
  - retrieve authentication information[1]
- ADLS Gen1 origin
  - configuring[1]
  - data formats[1]
  - overview[1]
  - partitions[1]
  - prerequisites[1]
  - retrieve authentication information[1]
- ADLS Gen2 destination
  - configuring[1]
  - data formats[1]
  - overview[1]
  - prerequisites[1]
  - retrieve configuration details[1]
- ADLS Gen2 origin
  - configuring[1]
  - data formats[1]
  - overview[1]
  - partitions[1]
  - prerequisites[1]
  - retrieve configuration details[1]
- ADLS stages
  - local pipeline prerequisites[1]
- Aggregate processor
  - aggregate functions[1]
  - configuring[1]
  - default output fields[1]
  - example[1]
  - overview[1]
  - shuffling of data[1]
- Amazon S3 destination
  - AWS credentials[1]
  - configuring[1]
  - data formats[1]
  - overview[1]
  - partitions[1]
- Amazon S3 origin
  - data formats[1]
  - overview[1]
  - partitions[1]
  - security[1]
- Amazon S3 stages
  - local pipeline prerequisites[1]
B
- batch pipelines
  - case study[1]
  - description[1]
- browser
  - requirements[1]
C
- caching
  - for origins and processors[1]
  - ludicrous mode[1]
- case study
  - batch pipelines[1]
  - streaming pipelines[1]
- client deployment mode
  - Hadoop YARN cluster[1]
- cluster
  - Databricks[1]
  - Hadoop YARN[1]
  - requirements[1]
- cluster deployment mode
  - Hadoop YARN cluster[1]
- conditions
  - Delta Lake destination[1]
  - Filter processor[1]
  - Join processor[1]
  - Stream Selector processor[1]
  - Window processor[1]
- cross join
  - Join processor[1]
- custom schemas
  - application to JSON and delimited data[1]
  - DDL schema format[1]
  - error handling[1]
  - JSON schema format[1]
  - origins[1]
D
- Databricks
  - cluster[1]
  - cluster configuration[1]
- Databricks pipelines
  - staging directory[1]
- data formats
  - ADLS Gen1 destination[1]
  - ADLS Gen1 origin[1]
  - ADLS Gen2 destination[1]
  - ADLS Gen2 origin[1]
  - Amazon S3 destination[1]
  - Amazon S3 origin[1]
  - File destination[1]
  - File origin[1]
  - Hive destination[1]
  - Kafka destination[1]
  - Kafka origin[1]
- data preview
  - data type display[1]
  - overview[1]
- data types
  - [1]
  - in preview[1]
- Deduplicate processor
  - configuring[1]
  - overview[1]
- default output fields
  - Aggregate processor[1]
- default stream
  - Stream Selector[1]
- Delta Lake destination
  - configuring[1]
  - overview[1]
  - overwrite condition[1]
  - partitions[1]
  - schema updates[1]
  - write mode[1]
- deployment mode
  - Hadoop YARN cluster[1]
- destinations
  - ADLS G1[1]
  - ADLS G2[1]
  - Amazon S3[1]
  - Delta Lake[1]
  - File[1]
  - Hive[1]
  - JDBC[1]
  - Kafka[1]
- directory path
  - File destination[1]
  - File origin[1]
- drivers
  - JDBC destination[1]
  - JDBC origin[1]
E
- encryption zones
  - using KMS to access HDFS encryption zones[1]
- environment variables
  - PySpark processor[1]
- execution mode
  - pipelines[1]
- expressions
  - Spark SQL Expression processor[1]
F
- Field Remover processor
  - configuring[1]
  - overview[1]
- fields
  - referencing[1]
- File destination
  - configuring[1]
  - data formats[1]
  - directory path[1]
  - overview[1]
  - partitions[1]
- File origin
  - configuring[1]
  - custom schema[1]
  - data formats[1]
  - directory path[1]
  - overview[1]
  - partitions[1]
- Filter processor
  - configuring[1]
  - filter condition[1]
  - overview[1]
- full outer join
  - Join processor[1]
H
- Hadoop impersonation mode
  - configuring KMS for encryption zones[1]
  - lowercasing user names[1]
  - overview[1]
- Hadoop YARN
  - cluster[1]
  - deployment mode[1]
  - directory requirements[1]
  - impersonation[1]
  - Kerberos authentication[1]
- history
  - pipeline run[1]
- Hive destination
  - additional Hive configuration properties[1]
  - configuring[1]
  - data formats[1]
  - overview[1]
  - partitions[1]
- Hive origin
  - additional Hive configuration properties[1]
  - configuring[1]
  - full mode query guidelines[1]
  - incremental and full query mode[1]
  - incremental mode query guidelines[1]
  - overview[1]
  - partitions[1]
  - SQL query[1]
I
- impersonation mode
  - Hadoop[1]
- inner join
  - Join processor[1]
- inputs variable
  - PySpark processor[1]
- installation
  - overview[1]
  - requirements[1]
  - Spark cluster mode[1]
  - Spark local mode[1]
J
- JDBC destination
  - configuring[1]
  - driver installation[1]
  - overview[1]
  - partitions[1]
  - tested versions and drivers[1]
  - write mode[1]
- JDBC origin
  - configuring[1]
  - driver installation[1]
  - offset column[1]
  - overview[1]
  - partitions[1]
  - tested versions and drivers[1]
- job cluster
  - Databricks[1]
- Join processor
  - condition[1]
  - configuring[1]
  - criteria[1]
  - cross join[1]
  - full outer join[1]
  - inner join[1]
  - join types[1]
  - left anti join[1]
  - left outer join[1]
  - left semi join[1]
  - matching fields[1]
  - overview[1]
  - right outer join[1]
  - shuffling of data[1]
- join types
  - Join processor[1]
K
- Kafka destination
  - configuring[1]
  - data formats[1]
  - Kerberos authentication[1]
  - message[1]
  - overview[1]
  - security[1]
  - SSL/TLS encryption[1]
- Kafka origin
  - configuring[1]
  - custom schemas[1]
  - data formats[1]
  - Kerberos authentication[1]
  - offsets[1]
  - overview[1]
  - partitions[1]
  - security[1]
  - SSL/TLS encryption[1]
- Kerberos authentication
  - Hadoop YARN cluster[1]
  - Kafka destination[1]
  - Kafka origin[1]
L
- left anti join
  - Join processor[1]
- left outer join
  - Join processor[1]
- left semi join
  - Join processor[1]
- local pipelines
  - configuring[1]
- lookups
  - overview[1]
  - streaming example[1]
- ludicrous mode
  - caching[1]
  - enabling[1]
  - optimizing pipeline performance[1]
  - pipeline statistics[1]
M
- message
  - Kafka destination[1]
- monitoring
  - overview[1]
  - pausing[1]
  - Spark web UI[1]
  - viewing statistics[1]
O
- offset column
  - JDBC[1]
- offsets
  - Kafka origin[1]
- origins
  - ADLS Gen1[1]
  - ADLS Gen2[1]
  - Amazon S3[1]
  - caching[1]
  - File[1]
  - Hive[1]
  - JDBC[1]
  - Kafka[1]
  - multiple[1]
- output order
  - preview[1]
- output variable
  - PySpark processor[1]
P
- partitioning
  - overview[1]
- partitions
  - ADLS Gen1 destination[1][2]
  - ADLS Gen1 origin[1]
  - ADLS Gen2 origin[1]
  - Amazon S3 destination[1]
  - Amazon S3 origin[1]
  - based on origins[1]
  - changing[1]
  - Delta Lake destination[1]
  - File destination[1]
  - File origin[1]
  - Hive destination[1]
  - Hive origin[1]
  - initial[1]
  - initial number[1]
  - JDBC destination[1]
  - JDBC origin[1]
  - Kafka origin[1]
  - Rank processor[1]
- performing lookups
  - overview[1]
- pipeline performance
  - ludicrous mode[1]
- pipeline run
  - history[1]
  - summary[1]
- pipelines
  - comparison with Data Collector[1]
  - configuring[1]
  - monitoring[1]
  - pause monitoring[1]
  - previewing[1]
  - run history[1]
  - Spark configuration[1]
  - stage library match requirement[1]
- ports
  - default[1]
- prerequisites
  - ADLS and Amazon S3 stages[1]
  - PySpark processsor[1]
  - stage-related[1]
- preview
  - availability[1]
  - color codes[1]
  - editing properties[1]
  - output order[1]
  - overview[1]
  - pipeline[1]
  - writing to destinations[1]
- processor
  - output order[1]
- processors
  - Aggregate[1]
  - caching[1]
  - Deduplicate[1]
  - Field Remover[1]
  - Filter[1]
  - Join[1]
  - Profile[1]
  - PySpark[1]
  - Rank[1]
  - referencing fields[1]
  - Repartition[1]
  - shuffling of data[1]
  - Sort[1]
  - Spark SQL Expression[1]
  - Spark SQL Query[1]
  - Stream Selector[1]
  - Type Converter[1]
  - Window[1]
- Profile processor
  - configuring[1]
  - output records[1]
  - overview[1]
  - statistics[1]
- proxy users
  - Transformer[1]
- PySpark processor
  - configuring[1]
  - custom code[1]
  - environment variables[1]
  - examples[1]
  - inputs variable[1]
  - output variable[1]
  - overview[1]
  - prerequisites[1][2]
  - Python requirements[1]
  - referencing fields[1]
Q
- query mode
  - Hive origin[1]
R
- Rank processor
  - configuring[1]
  - example[1]
  - order by[1]
  - overview[1]
  - partition by[1]
  - rank functions[1]
  - shuffling of data[1]
- repartitioning
  - overview[1]
  - types[1]
- Repartition processor
  - configuring[1]
  - overview[1]
  - shuffling of data[1]
  - types[1]
  - use cases[1]
- right outer join
  - Join processor[1]
S
- schema updates
  - Delta Lake destination[1]
- security
  - Kafka destination[1]
  - Kafka origin[1]
- shuffling
  - overview[1]
- sorting
  - multiple fields[1]
- Sort processor
  - configuring[1]
  - multiple fields[1]
  - overview[1]
- Spark
  - run locally[1]
  - run on cluster[1]
- Spark configuration
  - local mode[1]
  - pipelines[1]
- Spark history server
  - monitoring[1]
- Spark processing
  - description[1]
- Spark SQL Expression processor
  - expressions[1]
  - overview[1]
- Spark SQL processor
  - configuring[1]
- Spark SQL query
  - syntax[1]
- Spark SQL Query processor
  - configuring[1]
  - examples[1]
  - overview[1]
  - query syntax[1]
  - referencing fields[1]
- Spark web UI
  - monitoring[1]
- SQL query
  - Hive origin[1]
- SSL/TLS encryption
  - Kafka destination[1]
  - Kafka origin[1]
- stage library match requirement
  - in a pipeline[1]
- staging directory
  - Databricks pipelines[1]
- statistics
  - pipeline[1]
  - Profile processor[1]
  - stages[1]
- streaming pipelines
  - case study[1]
  - description[1]
- Stream Selector processor
  - conditions[1]
  - configuring[1]
  - default stream[1]
  - overview[1]
T
- Technology Preview functionality
  - description[1]
- Transformer
  - architecture[1]
  - description[1]
  - for Data Collector users[1]
  - launching[1]
  - proxy users[1]
  - spark-submit[1]
  - starting[1]
- Type Converter processor
  - configuring[1]
  - field type conversion[1]
  - overview[1]
W
- Window processor
  - conditions[1]
  - configuring[1]
  - overview[1]
  - window types[1]
- window types
  - Window processor[1]
- write mode
  - Delta Lake destination[1]
  - JDBC destination[1]