Skip to main content
Skip to main content

HiveText

InputOutputAlias

Description

HiveText reads the text serialization format used by Apache Hive tables (the format produced by Hive's LazySimpleSerDe). It is a delimited text format, similar to CSV, in which fields are separated by the Hive default \x01 (Ctrl-A) delimiter. The field delimiter is configurable via input_format_hive_text_fields_delimiter.

HiveText is an input-only format. The data has no header row: values are mapped positionally onto the columns of the destination table, so the column names and types are taken from the table (or from an explicitly provided structure) rather than inferred from the data. While reading, ClickHouse parses dates and times in best-effort mode (see date_time_input_format), fills omitted trailing fields with column defaults, and skips fields it does not recognize.

Within a field, values are parsed using the same escaping rules as CSV rather than Hive's nested delimiters. In particular, a column of type Array is read from the bracketed representation (for example, "['a','b','c']"), not from values separated by the Hive collection delimiter \x02.

Nested delimiter settings have no effect

The input_format_hive_text_collection_items_delimiter and input_format_hive_text_map_keys_delimiter settings are accepted for compatibility but are currently not used during parsing.

By default, rows are allowed to have a variable number of fields (see input_format_hive_text_allow_variable_number_of_columns): rows with fewer fields than the table have the missing columns filled with default values, and rows with extra trailing fields have the extras skipped.

Example usage

The examples below override the default field delimiter with a comma (,) using input_format_hive_text_fields_delimiter so that the input files are easy to read.

Reading a HiveText file

Given a file hive_data.txt with comma-separated fields:

1,3
3,5,9

We create a table that defines the column names and types, and insert the file into it with FORMAT HiveText:

CREATE TABLE test_tbl (a UInt16, b UInt32, c UInt32) ENGINE = MergeTree ORDER BY a;

INSERT INTO test_tbl FROM INFILE 'hive_data.txt'
SETTINGS input_format_hive_text_fields_delimiter = ','
FORMAT HiveText;

SELECT * FROM test_tbl;
┌─a─┬─b─┬─c─┐
│ 1 │ 3 │ 0 │
│ 3 │ 5 │ 9 │
└───┴───┴───┘

Note that the first row, 1,3, has only two fields, so the missing column c is filled with its default value 0.

Variable number of columns

With the default input_format_hive_text_allow_variable_number_of_columns = 1, rows that have more fields than the table simply have the extra trailing fields skipped:

1,2,3,4,5
6,7,8
CREATE TABLE test_extras (a UInt16, b UInt32, c UInt32) ENGINE = MergeTree ORDER BY a;

INSERT INTO test_extras FROM INFILE 'hive_extras.txt'
SETTINGS input_format_hive_text_fields_delimiter = ','
FORMAT HiveText;

SELECT * FROM test_extras ORDER BY a;
┌─a─┬─b─┬─c─┐
│ 1 │ 2 │ 3 │
│ 6 │ 7 │ 8 │
└───┴───┴───┘

Setting input_format_hive_text_allow_variable_number_of_columns = 0 instead enforces a strict field count, and a row with fewer fields than the table raises a parsing exception.

Format settings

SettingDescriptionDefault
input_format_hive_text_fields_delimiterDelimiter between fields in Hive Text File\x01
input_format_hive_text_collection_items_delimiterDelimiter between collection (array or map) items in Hive Text File. Accepted but currently not used during parsing.\x02
input_format_hive_text_map_keys_delimiterDelimiter between a pair of map key/values in Hive Text File. Accepted but currently not used during parsing.\x03
input_format_hive_text_allow_variable_number_of_columnsIgnore extra columns in Hive Text input (if file has more columns than expected) and treat missing fields as default values1