Error 1261 01000 row 1 doesn t contain data for all columns

Я пытаюсь сделать infile данных загрузки базы данных. Единственная проблема, с которой я столкнулся, это получить сообщение об ошибке 1261. Раньше я получал неправильную ошибку значения даты и ...

Я пытаюсь сделать infile данных загрузки базы данных. Единственная проблема, с которой я столкнулся, это получить сообщение об ошибке 1261. Раньше я получал неправильную ошибку значения даты и времени, но я решил, что с кодом в infile load infile ниже (установите date_time =). Сейчас моя проблема заключается в том, что у меня недостаточно данных для всех столбцов. Я знаю, что вы должны указывать столбцы после имени таблицы, но я не могу заставить его работать.

Существует одна таблица и имеет 15 столбцов. Первый столбец — первичный ключ, остальные четырнадцать — обычные столбцы.

Вот инструкция файла загрузки:

load data infile 'c:/proj/test.csv' into table base (@var1,event,failure,ue,mc,mn,cell,durat,cause,ne,ims,hier,hier3,hier32)  
set date_time = STR_TO_DATE(@var1, '%Y%m%d %H%i%s')
;

Дополнительные примечания: столбец pk называется dataId и является INT. Это автоматический приращение.

Вот данные из файла csv:

2013-03-20 14:55:22,4098,1,21060800,344,930,4,1000,0,11B,344930000000011,4809532081614990000,8226896360947470000,1150444940909480000 

Important

SingleStoreDB Cloud only supports LOAD DATA with the LOCAL option. As this SQL command will not work in the SQL Editor, LOAD DATA LOCAL must be run from a SQL client running on a computer that can access your SingleStoreDB Cloud instance, such as the MySQL client (mysql-client) or SingleStoreDBe client. Refer to Connect to Your Workspace for more information on SQL clients.

Import data stored in a CSV, JSON, or Avro file into a SingleStoreDB table (referred to as the destination table in this topic).

CSV LOAD DATA

Syntax

LOAD DATA [LOCAL] INFILE '<file_name>'
  [REPLACE | IGNORE | SKIP { ALL | CONSTRAINT | DUPLICATE KEY | PARSER } ERRORS]
  INTO TABLE <table_name>
  [CHARACTER SET <character_set_name>]
  [{FIELDS | COLUMNS}
  [TERMINATED BY '<string>']
      [[OPTIONALLY] ENCLOSED BY '<char>']
      [ESCAPED BY '<char>']
  ]
  [LINES
  [STARTING BY '<string>']
  [TERMINATED BY '<string>']
  ]
  [TRAILING NULLCOLS]
  [NULL DEFINED BY <string> [OPTIONALLY ENCLOSED]]
  [IGNORE <number> LINES]
  [ ({<column_name> | @<variable_name>}, ...) ]
  [SET <column_name> = <expression>,...]
  [WHERE <expression>,...]
  [MAX_ERRORS <number>]
  [ERRORS HANDLE <string>]

Examples

Loading Data when the Order of the Columns in the Destination Table and Source File are Different

If the order of columns in the table is different from the order in the source file, you can name them explicitly. In this example, the columns are loaded in reverse order:

LOAD DATA INFILE 'foo.tsv'
  INTO TABLE foo (fourth, third, second, first);

Skipping Columns in the Source File

You can skip columns in the source file using the @ sign. In this example only the first and fourth columns are imported into table foo:

LOAD DATA INFILE 'foo.tsv'
  INTO TABLE foo (bar, @, @, baz);

Specifying the Column Delimiter

The default column delimiter is the tab (t) character, ASCII code 09. You can specify a different delimiter, even multi-character delimiters, with the COLUMNS TERMINATED BY clause:

LOAD DATA INFILE 'foo.csv'
  INTO TABLE foo
  COLUMNS TERMINATED BY ',';

In the following example, field and line delimiters are used to read a file that contains fields separated by commas and lines terminated by carriage return/newline pairs:

LOAD DATA INFILE 'foo.csv'  INTO TABLE foo  FIELDS TERMINATED BY ','  LINES TERMINATED BY 'rn';

Source File with Unusual Column Separators

The following example demonstrates loading a file that has unusual column separators (|||):

LOAD DATA INFILE 'foo.oddformat'
  INTO TABLE foo
  COLUMNS TERMINATED BY '|||';

Loading Data from Multiple Files

Using globbing, you can load data from multiple files in a single LOAD DATA query.

The following query loads data from all the .csv files with names starting with a digit:

LOAD DATA INFILE "[0-9]*.csv"
INTO TABLE cust(ID,NAME,ORDERS);

The following query loads data from all the .csv files with filenames having four characters:

LOAD DATA INFILE "????.csv"
INTO TABLE cust(ID,NAME,ORDERS);

The following query loads data from all the .csv files with filenames not starting with a number:

LOAD DATA INFILE "[!0-9]*.csv"
INTO TABLE cust(ID,NAME,ORDERS);

Note

LOAD DATA LOCAL INFILE does not support globbing.

LOAD DATA INFILE supports globbing in filenames, but not in directory names.

CREATE PIPELINE contains a LOAD DATA clause. Here, LOAD DATA supports globbing, both in directory names and filenames.

Using the TRAILING NULLCOLS Clause

The following example demonstrates how to use the TRAILING NULLCOLS clause using the file numbers.csv , with the following content:

1,2,3
4,5
6

Run the following commands:

CREATE TABLE foo(a INT, b INT, c INT);

LOAD DATA INFILE 'numbers.csv' INTO TABLE foo COLUMNS TERMINATED BY ',' TRAILING NULLCOLS;

SELECT * FROM foo;
****
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    2 |    3 |
|    4 |    5 | NULL |
|    6 | NULL | NULL |
+------+------+------+

Using the NULL DEFINED BY Clause

The following example demonstrates how to use the NULL DEFINED BY clause using the data.csv file.

cat data.csv
****
DTB,'',25
SPD,,40
SELECT * FROM stockN;
****
+------+-------------+-------+
| ID   | City        | Count |
+------+-------------+-------+
| XCN  | new york    |    45 |
| ZDF  | washington  |    20 |
| XCN  | chicago     |    32 |
+------+-------------+-------+

The following query inserts the un-enclosed empty field as a NULL value and the enclosed empty field as an empty string.

LOAD DATA INFILE '/data.csv'
INTO TABLE stockN
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY "'"
NULL DEFINED BY '';

SELECT * FROM stockN;
****
+------+-------------+-------+
| ID   | City        | Count |
+------+-------------+-------+
| XCN  | new york    |    45 |
| ZDF  | washington  |    20 |
| XCN  | chicago     |    32 |
| DTB  |             |    25 |
| SPD  | NULL        |    40 |
+------+-------------+-------+

If you add the OPTIONALLY ENCLOSED option to the NULL DEFINED BY clause in the query above, and run the following query instead, both the empty fields are inserted as a NULL value:

LOAD DATA INFILE '/data.csv'
INTO TABLE stockN
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY "'"
NULL DEFINED BY '' OPTIONALLY ENCLOSED;

SELECT * FROM stockN;
****
+------+-------------+-------+
| ID   | City        | Count |
+------+-------------+-------+
| XCN  | new york    |    45 |
| ZDF  | washington  |    20 |
| XCN  | chicago     |    32 |
| DTB  | NULL        |    25 |
| SPD  | NULL        |    40 |
+------+-------------+-------+

Using the IGNORE LINES Clause

In the following example, the IGNORE LINES clause is used to skip the header line that contains column names in the source file:

LOAD DATA INFILE '/tmp/data.txt' INTO City IGNORE 1 LINES;

Using the ESCAPED BY Clause

The following example demonstrates how to load data into the loadEsc table using the ESCAPED BY clause from the file contacts.csv, whose contents are shown below.

GALE, ADAM, Brooklyn
FLETCHER, RON, New York
WAKEFIELD, CLARA, DC
DESC loadEsc;
****
+-------+-------------+------+------+---------+-------+
| Field | Type        | Null | Key  | Default | Extra |
+-------+-------------+------+------+---------+-------+
| Name  | varchar(40) | YES  |      | NULL    |       |
| City  | varchar(40) | YES  |      | NULL    |       |
+-------+-------------+------+------+---------+-------+

Execute the following query:

LOAD DATA INFILE '/contacts.csv'
INTO TABLE loadEsc COLUMNS TERMINATED BY ',' ESCAPED BY '\' ;
SELECT * FROM loadEsc;
****
+-------------------+-----------+
| Name              | City      |
+-------------------+-----------+
| GALE, ADAM        |  Brooklyn |
| FLETCHER, RON     |  New York |
| WAKEFIELD, CLARA  |  DC       |
+-------------------+-----------+

In this query, the character escapes the comma (,) between the first two fields of the contacts.csv file. (The (backslash) is the default escape character in a SQL query. Hence, the \ (double backslash) is used escape the backslash itself inside the query.)

Warning

If you (accidentally) escape the TERMINATED BY character in a file, the SQL query may return an error. For example, if you escape both the commas in any row of the contacts.csv file mentioned above, as:

GALE, ADAM, Brooklyn
FLETCHER, RON, New York
WAKEFIELD, CLARA, DC

and then execute the following query

LOAD DATA INFILE '/contacts.csv'
INTO TABLE loadEsc COLUMNS TERMINATED BY ',' ESCAPED BY '\' ;

it returns the following error: ERROR 1261 (01000): Row 1 doesn’t contain data for all columns. Because, the (backslash) escapes both the commas and LOAD DATA perceives the first row as a single column.

Using the STARTING BY Clause

The following example demonstrates how to skip the prefix ### in the stockUpd.txt data file using the STARTING BY clause.

cat stockUpd.txt
****
###1,"xcg",
3,"dfg"
new product###4,"rfk",5
LOAD DATA INFILE 'stockUpd.txt'
INTO TABLE stock
  FIELDS TERMINATED BY ','
  LINES STARTING BY '###';

SELECT * FROM stock;
****
+----+------+----------+
| ID | Code | Quantity |
+----+------+----------+
|  1 |  xcg |       10 |
|  4 |  rfk |        5 |
+----+------+----------+

In this example, the STARTING BY clause skips the prefix ### in the first and third lines and anything before it. It skips the second line, because it does not contain ###.

Filtering out Rows from the Source File

You can also filter out unwanted rows using the WHERE clause. In this example, only rows where bar is equal to 5 will be loaded. All other rows will be discarded:

LOAD DATA INFILE 'foo.oddformat'
  INTO TABLE foo (bar, baz)
  WHERE bar = 5;

Filtering out and Transforming Rows From the Source File

Complex transformations can be performed in both the SET and WHERE clauses. For example, if you have an input file with a EventDate field and an EventId field:

10-1-2016,1
4-15-2016,2
1-10-2017,3
4-10-2017,4

You want to only load the rows with a date that is within three months from a certain date, 10/15/2016, for instance. This can be accomplished by the following:

CREATE TABLE foo (EventDate date, EventId int);

LOAD DATA INFILE 'date_event.csv'
  INTO TABLE foo
  FIELDS TERMINATED BY ','
  (@EventDate, EventId)
  SET EventDate = STR_TO_DATE(@EventDate, '%m-%d-%Y')
  WHERE ABS(MONTHS_BETWEEN(EventDate, date('2016-10-15'))) < 3;

SELECT * FROM t;
****
+------------+---------+
| EventDate  | EventId |
+------------+---------+
| 2016-10-01 |       1 |
| 2017-01-10 |       3 |
+------------+---------+

While both column names and variables can be referenced in the WHERE clause column names can only be assigned to in the SET clause. The scope of these clauses is restricted to the current row and therefore SELECT statements cannot be evaluated.

Using REPLACE

This example uses the cust table, which is defined as a columnstore table as follows:

CREATE TABLE cust(name VARCHAR(32), id INT(11), orders INT(11), SORT KEY(id), UNIQUE KEY(id) USING HASH, SHARD KEY(id));

Assume the directory /order_files has one file orders.csv, which contains the following data:

Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9

Create a LOAD DATA statement with a REPLACE clause:

LOAD DATA INFILE '/order_files/orders.csv' REPLACE INTO TABLE cust FIELDS TERMINATED BY ',';

As LOAD DATA ingests the data from orders.csv into the cust table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301 and 3412. The second and third records containing those duplicate keys (which have already been imported into cust), are replaced with the fifth and second records.

SELECT * FROM cust ORDER BY name;
****
+--------+------+--------+
| name   | id   | orders |
+--------+------+--------+
| Chris  | 7214 |      6 |
| George | 3412 |      9 |
| Rachel | 9125 |      2 |
| Susan  | 8301 |      7 |
+--------+------+--------+

Loading a Fixed Length File

This example demonstrates how to load the contents of the file fixed_length.csv, whose contents are shown below.

APE602020-06-01
TR 252019-08-07
HSW8 2019-10-11
YTR122020-09-02

LOAD DATA inserts each extracted row from fixed_length.csv into the table foo. Define the table as follows:

CREATE TABLE foo(a CHAR(3), b INT, c DATETIME);

Run the LOAD DATA statement:

LOAD DATA INFILE '/fixed_length.csv'
  INTO TABLE foo (@current_row)
  SET a = TRIM(SUBSTR(@current_row,1,3)),
      b = TRIM(SUBSTR(@current_row,4,2)),
      c = TRIM(SUBSTR(@current_row,6,10));

SUBSTR() extracts a substring from a string and TRIM() removes the padding (spaces in this case) from the beginning and the ending of a string. For example, after the LOAD DATA statement extracts the line HSW8 2019-10-11 in fixed_length.csv, it does the following to set b: * It extracts, from HSW8 2019-10-11, the substring starting at position 4 having a length of 2. The resulting substring is 8. * It removes the leading whitespace from 8 to yield 8.

Retrieve the data from foo:

SELECT * from foo ORDER BY a;
****
+------+------+---------------------+
| a    | b    | c                   |
+------+------+---------------------+
| APE  |   60 | 2020-06-01 00:00:00 |
| HSW  |    8 | 2019-10-11 00:00:00 |
| TR   |   25 | 2019-08-07 00:00:00 |
| YTR  |   12 | 2020-09-02 00:00:00 |
+------+------+---------------------+

Loading Data using Hex Field Terminator Syntax

Loading data into a table via a pipeline can be performed using a hexadecimal field terminator. The example below uses an AWS S3 bucket for its data source.

Syntax

CREATE TABLE <table name>(a int, b int);

CREATE PIPELINE <pipeline name> AS 
LOAD DATA S3 's3://<bucket name>/<file name>.csv'
CONFIG '{"region":"us-west-2"}'
CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXXXXXXXXXX",
              "aws_secret_access_key": "XXXXXXXXXXXXX"'
INTO TABLE <table name>(a, b) fields terminated by 0x2c;

START PIPELINE <pipeline name>;

SELECT * FROM <table name>;

******
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
+------+------+

JSON LOAD DATA

Syntax

LOAD DATA [LOCAL] INFILE 'file_name'
  [REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT JSON
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path [DEFAULT literal_expr], ...)

subvalue_path:
  {% | [%::]ident [::ident ...]}

Semantics

Error logging and error handling are discussed at the end of this topic.

Extract specified subvalues from each JSON value in file_name. Assign them to specified columns of a new row in tbl_name, or to variables used for a column assignment in a SET clause. If a specified subvalue can’t be found in an input JSON, assign the DEFAULT clause literal instead. Discard rows which don’t match the WHERE clause.

To specify the compression type of an input file, use the COMPRESSION clause. See Handling Data Compression for more information.

The file named by file_name must consist of concatenated UTF-8 encoded JSON values, optionally separated by whitespace. Newline-delimited JSON is accepted, for example.

Non-standard JSON values like NaN, Infinity, and -Infinity must not occur in file_name.

If file_name ends in .gz or .lz4, it will be decompressed.

JSON LOAD DATA supports a subset of the error recovery options allowed by CSV LOAD DATA. Their behavior is as described under CSV LOAD DATA.

Like CSV LOAD DATA, JSON LOAD DATA allows you to use globbing to load data from multiple files. See the CSV examples.

Writing to multiple databases in a transaction is not supported.

Converting JSON Values

Before assignment or set clause evaluation, the JSON value extracted according to a subvalue_path is converted to a binary collation SQL string whose value depends on the extracted JSON type as follows:

JSON Type

Converted Value

null

SQL NULL

true/false

"1"/"0"

number

Verbatim, from extracted string.

string

All JSON string escape sequences, including escape sequences are converted to UTF-8. Verbatim otherwise.

array

Verbatim, from extracted string. For example, '[1,2]'

object

Verbatim, from extracted string. For example, '{"k":true}'

Conversion isn’t recursive. So, for example, true isn’t converted to "1" when it’s a subvalue of an object which is being extracted whole.

JSON LOAD DATA Examples

Example 1

If example.json consists of:

{"a":{"b":1}, "c":null}
{"a":{"b":2}, "d":null}

Then it can be loaded as follows:

CREATE TABLE t(a INT);

LOAD DATA LOCAL INFILE "example.json" INTO TABLE t(a <- a::b) FORMAT JSON;

SELECT * FROM t;
****
+------+
| a    |
+------+
|    1 |
|    2 |
+------+

Example 2

If example2.json consists of:

{"b":true, "s":"Au00AEu0022A", "n":-1.4820790816978637e-25, "a":[1,2], "o":{"subobject":1}}
{"b":false}
"hello"

Then we can perform a more complicated LOAD DATA:

CREATE TABLE t(b bool NOT NULL, s TEXT, n DOUBLE, a INT, o JSON NOT NULL, whole longblob);

LOAD DATA LOCAL INFILE "example2.json" INTO TABLE t FORMAT JSON(
  b <- b default true,
  s <- s default NULL,
  n <- n default NULL,
  @avar <- a default NULL,
  o <- o default '{"subobject":"replaced"}',
  whole <- %)
  SET a = json_extract_double(@avar, 1)
  WHERE b = true;

SELECT * FROM t;
****
+---+-------+-------------------------+------+--------------------------+-----------------------------------------------------------------------------------------------+
| b | s     | n                       | a    | o                        | whole                                                                                         |
+---+-------+-------------------------+------+--------------------------+-----------------------------------------------------------------------------------------------+
| 1 | A®"A  | -1.4820790816978637e-25 |    2 | {"subobject":1}          | {"b":true, "s":"Au00AEu0022A", "n":-1.4820790816978637e-25, "a":[1,2], "o":{"subobject":1}} |
| 1 | NULL  |                    NULL | NULL | {"subobject":"replaced"} | hello                                                                                         |
+---+-------+-------------------------+------+--------------------------+-----------------------------------------------------------------------------------------------+

There are several things to note in the example above:

  • true was converted to "1" for columns b, but not for column whole. "1" was further converted to the BOOL value 1.

  • The escapes"u00AE" and "u0022" were converted to UTF-8 for column s, but not for column whole. Note that whole would have become invalid JSON if we had translated "u0022".

  • The second row was discarded because it failed to match the WHERE clause.

  • None of the paths in subvalue_mapping could be found in the third row, so DEFAULT literals like '{"subobject":"replaced"}' were assigned instead.

  • We assigned a to an intermediate variable so that we could extract an array element in the SET clause.

  • The top-level JSON values in example2.json weren’t all JSON objects. "hello" is a valid top-level JSON value.

Avro LOAD DATA

Syntax

LOAD DATA [LOCAL] INFILE 'file_name'
  [REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]
  [SCHEMA 'avro_schema']

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path, ...)

subvalue_path:
  {% | [%::]ident [::ident ...]}

See the associated GitHub repo.

Semantics

Error logging and error handling are discussed at the end of this topic.

Extract specified subvalues from each Avro value in file_name. Assign them to specified columns of a new row in tbl_name, or to variables used for a column assignment in a SET clause. Discard rows which don’t match the WHERE clause.

To specify the compression type of an input file, use the COMPRESSION clause. See Handling Data Compression for more information.

Avro LOAD DATA expects Avro data in one of two sub-formats, depending on the SCHEMA clause.

If no SCHEMA clause is provided, file_name must name an Avro Object Container File as described in version 1.8.2 of the Avro specification. In addition, the following restrictions hold:

  • The compression codec of the file must be null.

  • Array and map values must not have more than 16384 elements.

  • The type name of a record must not be used in a symbolic reference to previously defined name in any of its fields. It may still be used in a symbolic reference outside the record definition, however.

    For example, self-referential schemas like the following are rejected by LOAD DATA:

    {
      "type": "record",
      "name": "PseudoLinkedList",
      "fields" : [{"name": "value", "type": "long"},
                  {"name": "next", "type": ["null", "PseudoLinkedList"]}]
    }
    

If a SCHEMA clause is provided, the file must be a raw stream consisting of only the concatenated binary encodings of instances of avro_schema. avro_schema must be a SQL string containing a JSON Avro schema. The restrictions on Object Container Files also apply to raw stream files.

Warning

It’s an error to provide a SCHEMA clause when loading an Object Container File because it contains metadata alongside the encoded values.

All optional Avro schema attributes except the namespace attribute are ignored. Notably, logicalType attributes are ignored.

If file_name ends in .gz or .lz4, it will be decompressed.

Avro LOAD DATA supports a subset of the error recovery options allowed by CSV LOAD DATA. Their behavior is as described under CSV LOAD DATA.

Writing to multiple databases in a transaction is not supported.

The SCHEMA REGISTRY {"IP" | "Hostname"} option allows LOAD DATA to pull the schema from a schema registry. For more information, see the Avro Schema Evolution With Pipelines topic.

Converting Avro Values

Before assignment or set clause evaluation, the Avro value extracted according to a subvalue_path is converted to an unspecified SQL type which may be further explicitly or implicitly converted as if from a SQL string whose value is as follows:

Avro Type

Converted Value

null

SQL NULL

boolean

"1"/"0"

int

The string representation of the value

long

The string representation of the value

float

SQL NULL if not finite. Otherwise, a string convertible without loss of precision to FLOAT

double

SQL NULL if not finite. Otherwise, a string convertible without loss of precision to DOUBLE

enum

The string representation of the enum.

bytes

Verbatim, from input bytes

string

Verbatim, from input bytes

fixed

Verbatim, from input bytes

record

The JSON encoding of the value

map

The JSON encoding of the value

array

The JSON encoding of the value

union

The JSON encoding of the value

logicalType attributes are ignored and have no effect on conversion.

Avro LOAD DATA Examples

Example 1

Consider an Avro Object Container Fileexample.avro with the following schema:

{
  "type": "record",
  "name": "data",
  "fields": [{ "name": "id", "type": "long"},
             { "name": "payload", "type": [ "null",
                                            "string" ]}]
}

example.avro contains three Avro values whose JSON encodings are:

{"id":1,"payload":{"string":"first"}}
{"id":1,"payload":{"string":"second"}}
{"id":1,"payload":null}

example.avro can be loaded as follows:

CREATE TABLE t(payload TEXT, input_record JSON);

LOAD DATA LOCAL INFILE "example.avro"
  INTO TABLE t
  FORMAT AVRO
  ( payload <- %::payload::string,
  input_record <- % );

SELECT * FROM t;
****
+---------+----------------------------------------+
| payload | input_record                           |
+---------+----------------------------------------+
| first   | {"id":1,"payload":{"string":"first"}}  |
| second  | {"id":1,"payload":{"string":"second"}} |
| NULL    | {"id":1,"payload":null}                |
+---------+----------------------------------------+

LOAD DATA was able to parse example.avro because Avro Object Container Files have a header which contains their schema.

Example 2

Consider a file named example.raw_avro, with the same values as example.avro from Example 1 but in the raw stream format. That is, example.raw_avro consists of the binary encoded values and nothing else. We add a SCHEMA clause to tell LOAD DATA to expect a raw stream with the provided schema:

CREATE TABLE t(payload TEXT, input_record JSON);
LOAD DATA LOCAL INFILE "example.raw_avro"
  INTO TABLE t
  FORMAT AVRO
  ( payload <- %::payload::string,
  input_record <- % )
  schema
  '{
    "type": "record",
    "name": "data",
    "fields": [{ "name": "id", "type": "long"},
      { "name": "payload", "type": [ "null", "string" ]}]
  }';

SELECT * FROM t;
****
+---------+----------------------------------------+
| payload | input_record                           |
+---------+----------------------------------------+
| first   | {"id":1,"payload":{"string":"first"}}  |
| second  | {"id":1,"payload":{"string":"second"}} |
| NULL    | {"id":1,"payload":null}                |
+---------+----------------------------------------+

Example 3

Consider an Object Container Fileexample3.avro with a more complicated payload than Example 1. We illustrate extracting values from nested unions and records, and also indirectly extracting elements of nested maps and arrays.

{ "type": "record",
  "namespace": "ns",
  "name": "data",
  "fields": [
      { "name": "id", "type": "long" },
      { "name": "payload", "type":
        [ "null",
          { "type": "record",
            "name": "payload_record",
            "namespace": "ns",
            "fields": [
                { "name": "f_bytes", "type": "bytes"},
                { "name": "f_string", "type": "string"},
                { "name": "f_map", "type":
                  { "type": "map",
                    "values": { "type": "array",
                                "items": "int" }}}
            ]
          }
        ]
      }
  ]
}

The raw JSON encoding of the contents of this file can be seen in column c_whole_raw after the following LOAD DATA:

CREATE TABLE t (
  c_id bigint,
  c_bytes longblob,
  c_string longblob,
  c_array_second int,
  c_whole_raw longblob,
  c_whole_json json
  );

LOAD DATA INFILE "example3.avro"
  INTO TABLE t
  FORMAT AVRO
  ( c_id <- %::id,
  c_bytes <- %::payload::`ns.payload_record`::f_bytes,
  c_string <- %::payload::`ns.payload_record`::f_string,
  @v_map <- %::payload::`ns.payload_record`::f_map,
  c_whole_raw <- %,
  c_whole_json <- %)
  SET c_array_second = JSON_EXTRACT_JSON(@v_map, "a", 1);

SELECT * FROM t;
****
*************************** 1. row ***************************
          c_id: 1
       c_bytes: NULL
      c_string: NULL
c_array_second: NULL
   c_whole_raw: {"id":1,"payload":null}
  c_whole_json: {"id":1,"payload":null}
*************************** 2. row ***************************
          c_id: 2
       c_bytes: "A
      c_string: "A
c_array_second: 2
   c_whole_raw: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"u0022u0041","f_string":""A","f_map":{"a":[1,2]}}}}
  c_whole_json: {"id":2,"payload":{"ns.payload_record":{"f_bytes":""A","f_map":{"a":[1,2]},"f_string":""A"}}}

There are several things to note:

  • We attempted to extract subvalues of the payload_record branch of the union-type payload field. Since that wasn’t the selected member of the union in record 1, LOAD DATA assigned NULL to c_bytes and @v_map.

  • We assigned the JSON encoding of f_map to @v_map and then performed JSON map and array lookups in the SET clause to ultimately extract 2.

  • f_string and f_bytes had the same contents, but we can see how their different Avro types affected their JSON encodings and interacted with the SQL JSON type

    • The JSON encoding of the Avro string value f_string, as seen in c_whole_raw, encodes special characters like " as the escape sequence ".

    • The JSON encoding of the Avro bytes value f_bytes, as seen in c_whole_raw, encodes every byte with a JSON escape.

    • When converting the JSON encoding of record 2 to the SQL JSON type while assigning to c_whole_json, LOAD DATA normalized both representations of the byte sequence "A to "A.

Loading Parquet Files

The LOAD DATA command does not support loading Parquet files. However, you can use the LOAD DATA clause in a CREATE PIPELINE .. FORMAT PARQUET statement to create a pipeline that loads Parquet files.

Handling Data Compression

The COMPRESSION clause specifies how LOAD DATA handles the compression of an input file.

Syntax

LOAD DATA INFILE 'filename' COMPRESSION { AUTO | NONE | LZ4 | GZIP } INTO TABLE ...

LOAD DATA INFILE 'filename' INTO TABLE `tablename` COMPRESSION { AUTO | NONE | LZ4 | GZIP } ...

Arguments

  • AUTO: This is the default setting, it tells LOAD DATA to identify the compression type from the input file’s extension.

  • NONE: Specifies that the input file is uncompressed.

  • LZ4: Specifies that the input file is compressed with LZ4 compression algorithm.

  • GZIP: Specifies that the input file is compressed with GZIP compression algorithm.

LOCAL

LOCAL affects the expected file location, the search behavior for relative path names, and error handling behavior.

When you specify LOCAL, the client reads file_name and sends it to the server. The expected location of the file is within the client directory. If file_name is a relative path, it is relative to the current working directory of the client.

When LOCAL is not specified, the file is read by the server, and needs to be located on the related server host. If a relative file path name is specified, it is searched for as relative to the server’s data directory, or as relative to the directory of the default database in cases where no leading components are given.

Because files need to be sent from the client to the server, specifying LOCAL can be slower. However, it does have the security advantage in that the client user must be able to read the file(s) being loaded. Where LOCAL is not specified, the server needs access to the full data directory, meaning that any user who has the permissions to LOAD DATA or CREATE PIPELINE can read the directory. This is because those permissions include FILE READPermissions Matrix

LOCAL does not support globbing (such as using wildcards in directory or filenames).

An example of using LOCAL follows:

LOAD DATA LOCAL INFILE '/example-directory/foo.csv'
  INTO TABLE foo
  COLUMNS TERMINATED BY ',';

Error Logging

When you run the LOAD DATA command and use the ERRORS HANDLE clause, LOAD DATA logs errors to the information_schema.LOAD_DATA_ERRORS table. The logged errors are the erroneous lines that LOAD DATA encountered as it processed the input file. The following table describes information_schema.LOAD_DATA_ERRORS.

Column Name

Description

DATABASE_NAME

The name of the database associated with the error.

HANDLE

The string value specified in the LOAD DATA ... ERRORS HANDLE <string> statement that caused the error.

ERROR_UNIX_TIMESTAMP

The time of the error event in Unix timestamp format.

ERROR_TYPE

Specifies what type of error occurred. Error is used when LOAD DATA returned the error to the client application. Warning is used when LOAD DATA did not return the error to the client application and instead skipped or ignored the error. Note: If LOAD DATA ... ERRORS HANDLE returns a duplicate key error to the client application, the error is not logged to information_schema.LOAD_DATA_ERRORS.

ERROR_CODE

The error code for the error.

ERROR_MESSAGE

The message associated with the error. This value contains contextual information about the error that can be used for debugging purposes.

LOAD_DATA_LINE

The erroneous line in the input file that LOAD DATA could not process. Load errors are typically caused by malformed data or attempting to write invalid schema types into a column, such as a NULL value into a non-nullable column.

LOAD_DATA_LINE_NUMBER

The line number of the input file caused a parsing error while attempting to load data into the destination table. This line number can be correlated with the value of LOAD_DATA_LINE, which contains the invalid line’s text.

HOST

The hostname or host IP address of the node that processed the erroneous line of the input file.

PORT

The port number of the node that processed the erroneous line of the input file.

PARTITION

Specifies the partition ID on the leaf node that processed the erroneous line of the input file. This value is NULL if the aggregator processed the line.

See the next section for example data that LOAD DATA ... ERRORS HANDLE populates in the information_schema.LOAD_DATA_ERRORS table.

Use the CLEAR LOAD ERRORS command to remove errors from information_schema.LOAD_DATA_ERRORS.

Error Handling

LOAD DATA has several options to handle errors that it encounters as it processes the input file. When you write a LOAD DATA statement, you can decide which option to use.

  • By default, LOAD DATA returns errors to the client application. Errors are returned one at a time.

  • To ignore duplicate key/index value errors in the input file, use the REPLACE clause to replace existing rows with input rows. This clause first deletes the existing rows that have the same value for a primary key or unique index as the input rows, and then inserts the new row.

  • To skip errors in the input file, use the SKIP ... ERRORS clause. Data in the erroneous lines will not be inserted into the destination table.

  • To ignore errors in the input file, use the IGNORE clause. This clause replaces invalid values with their defaults, discards extra fields, or discards erroneous lines completely.

  • When LOCAL is specified, duplicate-key and data interpretation errors do not stop the operation. When LOCAL is not specified, duplicate-key and data interpretation stop the operation.

Warning

In most cases, use SKIP ... ERRORS instead of IGNORE. If you use IGNORE without understanding how it behaves, LOAD DATA may produce unexpected results as it inserts data into the destination table.

The four error handling options are discussed in the following topics.

Default Error Handling

By default, LOAD DATA returns errors to the client application. Errors are returned one at a time. If it returns an error, no data will be inserted into the destination table.

Error Handling Example

Create a new table with a PRIMARY KEY column:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time TIMESTAMP NOT NULL
  );

The following CSV file will loaded be into this table as orders.csv. Note that line 2 has an extra column and there is a duplicate primary key value of 2 in line 4.

1,372,Apples,2016-05-09
3,307,Oranges,2016-07-31,1000
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  INTO TABLE orders
  FIELDS TERMINATED BY ',';
****
ERROR 1262 (01000): Row 2 was truncated; it contained more data than there were input columns

After removing the extra column from row 2:

ERROR 1062 (23000): Leaf Error (127.0.0.1:3308): Duplicate entry '2' for key 'PRIMARY'

After removing the duplicate primary key entry, the LOAD DATA statement is successful and the input file is loaded into the table.

REPLACE Error Handling

SingleStoreDB Cloud’s REPLACE behavior allows you to replace the existing rows with the new rows; only those rows that have the same value for a primary key or unique index as the input rows are replaced. In case of an error that arises due to duplicate key value, it first deletes the conflicting row in the destination that has the duplicate key value and then inserts the new row from the source file.

LOAD DATA inserts source file rows into the destination table in the order in which the rows appear in the source file. When REPLACE is specified, source files that contain duplicate unique or primary key values will be handled in the following way:

  • If the destination table’s schema specifies a unique or primary key column, and

  • The source file contains a row with the same primary or unique key value as the destination table, then

  • The row in the destination table that has the same unique or primary key value as the row in the source file will be deleted and a new row from the source file that matches the primary key value will be inserted into the destination table.

Note: If the source file contains multiple rows with the same primary or unique key value as the destination table, then only the last row in the source file with the same primary or unique key value (as the destination table) replaces the existing row in the destination table.

Note: REPLACE cannot be combined with SKIP DUPLICATE KEY ERRORS. The default behavior of SingleStoreDB Cloud is to throw an error for duplicate keys. However, both REPLACE and SKIP DUPLICATE KEY ERRORS does not throw a duplicate key error; REPLACE replaces the old row with the new row, while SKIP DUPLICATE KEY ERRORS discards the new row and retains the old row.

REPLACE Error Handling Example

Create a new table with a PRIMARY KEY column:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

A row with a primary key 4 is inserted as follows:

INSERT INTO orders VALUES(4,236,"Bananas",2016-06-23);

The following CSV file is loaded into the table as orders.csv. Note the duplicate primary key value of 4 in line 2:

1,372,Apples,2016-05-09
4,138,Pears,2016-07-14
3,307,Oranges,2016-07-31

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  REPLACE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Line 2 in the source file contained a duplicate primary key 4. The REPLACE error handler deletes the row 4,236,"Bananas",2016-06-23 in the destination table and replaces it with the value 4,138,Pears,2016-07-14 from the source file.

SKIP ... ERRORS Error Handling

SingleStoreDB Cloud’s SKIP ... ERRORS behavior allows you to specify an error scenario that, when encountered, discards an offending row. Three kinds of error scenarios can be skipped:

  • SKIP DUPLICATE KEY ERRORS: Any row in the source data that contains a duplicate unique or primary key will be discarded. If the row contains invalid data other than a duplicate key, an error will be generated. See SKIP DUPLICATE KEY ERRORS below.

  • SKIP CONSTRAINT ERRORS: Inclusive of SKIP DUPLICATE KEY ERRORS. If a row violates a column’s NOT NULL constraint, or the row contains invalid JSON or Geospatial values, the row will be discarded. If the row contains invalid data outside the scope of constraint or invalid value errors, an error will be generated. See SKIP CONSTRAINT ERRORS below.

  • SKIP ALL ERRORS: Inclusive of SKIP DUPLICATE KEY ERRORS and SKIP CONSTRAINT ERRORS. Also includes any parsing errors in the row caused by issues such as an invalid number of fields. See SKIP ALL ERRORS below.

SKIP DUPLICATE KEY ERRORS

When SKIP DUPLICATE KEY ERRORS is specified, source files that contain duplicate unique or primary key values will be handled in the following way:

  • If the destination table’s schema specifies a unique or primary key column, and

  • The source file contains one or more rows with a duplicate key value that already exists in the destination table or exists elsewhere in the source file, then

  • Every duplicate row in the source file will be discarded and will not be inserted into the destination table.

SKIP DUPLICATE KEY ERRORS cannot be combined with REPLACE.

SKIP DUPLICATE KEY ERRORS Example

Create a new table with a PRIMARY KEY column:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time TIMESTAMP NOT NULL
  );

The following CSV file will loaded be into this table as orders.csv. Note the duplicate primary key value of 2 in line 3:

1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
3,307,Oranges,2016-07-31

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  SKIP DUPLICATE KEY ERRORS
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Note that only 3 rows were inserted even though 4 rows were present in the source file. Line 3 in the source file contained a duplicate primary key, and you can verify that it was not inserted by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS table:

SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;
****
+-----------------------+---------------------------+--------------------------------+
| load_data_line_number | load_data_line            | error_message                  |
+-----------------------+---------------------------+--------------------------------+
|                     3 | 2,236,Bananas,2016-06-23  | Duplicate entry for unique key |
+-----------------------+---------------------------+--------------------------------+

SKIP CONSTRAINT ERRORS

SKIP CONSTRAINT ERRORS is inclusive of SKIP DUPLICATE KEY ERRORS if REPLACE is not specified. It also applies to rows that violate a column’s NOT NULL constraint and fields that contain invalid JSON or Geospatial values, and handles the offending rows in the following ways:

NOT NULL Constraint

  • If a column in the destination table specifies a NOT NULL constraint, and

  • The source file contains one or more rows with a null value for the constraint column, then

  • The offending row(s) will be discarded and will not be inserted into the destination table.

Invalid JSON or Geospatial Data

  • If a column in the destination table specifies a JSON, GEOGRAPHYPOINT, or GEOGRAPHY data type, and

  • The source file contains one or more rows with invalid values for fields of these types, then

  • The offending row(s) will be discarded and will not be inserted into the destination table.

SKIP CONSTRAINT ERRORS can also be combined with the REPLACE clause.

SKIP CONSTRAINT ERRORS Example

Create a new table with a JSON column type that also has a NOT NULL constraint:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_properties JSON NOT NULL
  );

The following CSV file will loaded be into this table as orders.csv. Note the malformed JSON in line 2, as well as a null value (N) for JSON in line 4:

1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears,{"order-date"}
3,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,N

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  SKIP CONSTRAINT ERRORS
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Note that only 2 rows were inserted even though 4 rows were present in the source file. Line 2 contained malformed JSON, and Line 4 contained an invalid null value. You can verify that both of these offending rows were not inserted by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS table:

SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;
****
+-----------------------+-----------------------------+--------------------------------------------------------------+
| load_data_line_number | load_data_line              | error_message                                                |
+-----------------------+-----------------------------+--------------------------------------------------------------+
|                     2 | 2,138,Pears,{"order-date"}  | Invalid JSON value for column 'order_properties'             |
|                     4 | 4,307,Oranges,N            | NULL supplied to NOT NULL column 'order_properties' at row 4 |
+-----------------------+-----------------------------+--------------------------------------------------------------+

SKIP ALL ERRORS

SKIP ALL ERRORS is inclusive of SKIP DUPLICATE KEY ERRORS and SKIP CONSTRAINT ERRORS in addition to any parsing error. Offending rows are handled in the following way:

  • If one or more rows in the source file cause ... DUPLICATE KEY ... or ... CONSTRAINT ... errors, or

  • If one or more rows in the source file cause parsing errors such as invalid delimiters or an invalid number of fields,

  • The offending row(s) will be discarded and will not be inserted into the destination table.

SKIP ALL ERRORS can also be combined with REPLACE.

SKIP ALL ERRORS Example

Create a new table with a JSON column type that also has a NOT NULL constraint:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_properties JSON NOT NULL
  );

The following CSV file will loaded be into this table as orders.csv. There are three things wrong with this file:

  • Line 2 contains only 3 fields

  • Line 3 has a duplicate primary key

  • Line 4 has a null value for a NOT NULL constraint

1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears
1,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,N

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  SKIP ALL ERRORS
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Only 1 row was written, despite the source file containing 4 rows. Line 2 was dropped because it contained an invalid number of fields, Line 3 was dropped because it contained a duplicate primary key, and line 4 was dropped because it contained a null value for a NOT NULL constraint. You can verify that these offending rows were not inserted by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS table:

SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;
****
+-----------------------+--------------------------------------------+--------------------------------------------------------------+
| load_data_line_number | load_data_line                             | error_message                                                |
+-----------------------+--------------------------------------------+--------------------------------------------------------------+
|                     2 | 2,138,Pears                                | Row 2 doesn't contain data for all columns                   |
|                     3 | 1,236,Bananas,{"order-date":"2016-06-23"}. | Duplicate entry for unique key                               |
|                     4 | 4,307,Oranges,N                           | NULL supplied to NOT NULL column 'order_properties' at row 4 |
+-----------------------+--------------------------------------------+--------------------------------------------------------------+

IGNORE Error Handling

SingleStoreDB Cloud’s IGNORE behavior is identical to MySQL’s IGNORE behavior, and exists only to support backwards compatibility with applications written for MySQL. IGNORE either discards malformed rows, discards extra fields, or replaces invalid values with default data type values. In addition, if an inserted row would have produced an error if IGNORE was not specified, it will be converted to a warning instead.

Consequences of Using IGNORE Instead of SKIP ERRORS

Unlike SKIP ... ERRORS which discards offending rows, IGNORE may change the inserted row’s data to ensure that it adheres to the table schema. This behavior can have serious repercussions for the data integrity of the destination table.

In a best case scenario where a malformed row uses the proper delimiters and contains the correct number of fields, the row can be partially salvaged. Any invalid values are updated with default values, and the modified row is written to the destination table. The result is that at least some of the source data was written to the destination table.

However, the worst case scenario can be severe. For example, if a row’s values are separated by an invalid delimiter, each field is updated with meaningless default values and the modified row is written to the destination table. For the sake of the table’s data integrity, it would have been better if the offending row was discarded. But a row with meaningless data was inserted instead.

Due to the potential consequences of using IGNORE, in most cases SKIP ... ERRORS is a better option. To understand IGNORE’s behavior for each error scenario, continue reading the sections below:

  • Duplicate unique or primary key values

  • Values with invalid types according to the destination table’s schema

  • Rows that contain an invalid number of fields

Duplicate Unique or Primary Key Values

When IGNORE is specified, source files that contain duplicate unique or primary key values will be handled in the following way:

  • If the destination table’s schema specifies a unique or primary key column, and

  • The source file contains one or more rows with a duplicate key value that already exists in the destination table or exists elsewhere in the source file, then

  • Every duplicate row in the source file will be discarded (ignored) and will not be inserted into the destination table.

Duplicate Unique or Primary Key Values Example

Create a new table with a PRIMARY KEY column:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

The following CSV file will loaded be into this table as orders.csv. Note the duplicate primary key value of 2 in line 3:

1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
3,307,Oranges,2016-07-31

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  IGNORE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Note that only 3 rows were inserted even though 4 rows were present in the source file. Line 3 in the source file contained a duplicate primary key. You can verify that these offending rows were not inserted by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS table:

SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;
****
+-----------------------+---------------------------+--------------------------------+
| load_data_line_number | load_data_line            | error_message                  |
+-----------------------+---------------------------+--------------------------------+
|                     3 | 2,236,Bananas,2016-06-23  | Duplicate entry for unique key |
+-----------------------+---------------------------+--------------------------------+

Line 3 in the source file contained a duplicate primary key and was discarded because line 2 was inserted first.

Values with Invalid Types According to the Destination Table’s Schema

When IGNORE is specified, source files that contain rows with invalid types that violate the destination table’s schema will be handled in the following way:

  • If the source file contains one or more rows with values that do not adhere to the destination table’s schema,

  • Each value of an invalid type in a row will be replaced with the default value of the appropriate type, and

  • The modified row(s) will be inserted into the destination table.

IGNORE behaves in a potentially unexpected way for columns that have a DEFAULT value specified. When an invalid value in the inserted row is replaced with the default value of the column’s type, the column’s DEFAULT value is ignored. Instead, the default value for the column’s data type is used.

Example

Create a new table with a PRIMARY KEY column:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

The following CSV file will be loaded be into this table as orders.csv. Line 4 contains a NULL value for order_time, whereas the table schema does not allow NULL values for this field.

1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
3,236,Bananas,2016-06-23
4,307,Oranges,N

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  IGNORE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Note that 4 rows were inserted despite the fact that line 4 in the source file contained a null value for a NOT NULL column. You can verify the error with the fourth row by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS table:

SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;
****
+-----------------------+------------------+--------------------------------------------------------+
| load_data_line_number | load_data_line   | error_message                                          |
+-----------------------+------------------+--------------------------------------------------------+
|                     4 | 4,307,Oranges,N | NULL supplied to NOT NULL column 'order_time' at row 4 |
+-----------------------+------------------+--------------------------------------------------------+

To see what was inserted by replacing the invalid DATETIME value with a default value, query the table:

SELECT * FROM orders ORDER BY 1;
****
+----+-------------+------------------+---------------------+
| id | customer_id | item_description | order_time          |
+----+-------------+------------------+---------------------+
|  1 |         372 | Apples           | 2016-05-09 00:00:00 |
|  2 |         138 | Pears            | 2016-07-14 00:00:00 |
|  3 |         236 | Bananas          | 2016-06-23 00:00:00 |
|  4 |         307 | Oranges          | 0000-00-00 00:00:00 |
+----+-------------+------------------+---------------------+

In this example, the invalid null DATETIME value was replaced with its default value: 0000-00-00 00:00:00.

Rows That Contain an Invalid Number of Fields

When IGNORE is specified, source files that contain rows with an invalid number of fields will be handled in one of two ways:

Too Few Fields

  • If the source file contains one or more rows with too few fields according to the destination table’s schema,

  • Each row’s empty field(s) will be updated with default values, and

  • The row will be inserted into the destination table.

Too Many Fields

  • If the source file contains one or more rows with too many fields according to the destination table’s schema,

  • Each extra field in the row(s) will be discarded (ignored), and

  • The row will be inserted into the destination table.

Example

Create a new table with a PRIMARY KEY column:

CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

The following CSV file will loaded be into this table as orders.csv. There are two things wrong with this file:

  • Line 2 contains only 3 fields instead of 4 and does not have a TIMESTAMP:

  • Line 4 contains an extra field, for a total of 5

1,372,Apples,2016-05-09
2,138,Pears
3,236,Bananas,2016-06-23
4,307,Oranges,2016-07-31,Berries

Load the data into the table:

LOAD DATA INFILE 'orders.csv'
  IGNORE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

Note that 4 rows were inserted despite the invalid number of fields for two of the rows. You can verify the error with the fourth row by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS table:

SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;
****
+-----------------------+----------------------------------+---------------------------------------------------------------------------+
| load_data_line_number | load_data_line                   | error_message                                                             |
+-----------------------+----------------------------------+---------------------------------------------------------------------------+
|                     2 | 2,138,Pears                      | Row 2 doesn't contain data for all columns                                |
|                     4 | 4,307,Oranges,2016-07-31,Berries | Row 4 was truncated; it contained more data than there were input columns |
+-----------------------+----------------------------------+---------------------------------------------------------------------------+

Note that there is a warning for the missing value in row 2 and the extra value in row 4. To see how the data was inserted, query the table:

SELECT * FROM orders ORDER BY 1;
****
+----+-------------+------------------+---------------------+
| id | customer_id | item_description | order_time          |
+----+-------------+------------------+---------------------+
|  1 |         372 | Apples           | 2016-05-09 00:00:00 |
|  2 |         138 | Pears            | 0000-00-00 00:00:00 |
|  3 |         236 | Bananas          | 2016-06-23 00:00:00 |
|  4 |         307 | Oranges          | 2016-07-31 00:00:00 |
+----+-------------+------------------+---------------------+

Line 2 did not have a DATETIME value, so the default value for its type was inserted instead. Line 4’s extra value was discarded, and otherwise the row was inserted with the expected data.

Performance Considerations

Shard Keys

Loading data into a table with a shard key requires reading the necessary columns on the aggregator to compute the shard key before sending data to the leaves. For CSV LOAD DATA only, it is recommended that columns included in the shard key appear earlier in input rows, if either the shard key design or the input format is flexible. Order does not significantly affect the performance of Avro or JSON LOAD DATA.

Keyless Sharding

Loading data into a keylessly sharded table (no shard key is declared, or shard() is specified) will result in batches of data loaded into different partitions, in a round-robin fashion. See Shard Keys for more information.

Retrieve loading status

The information_schema.LMV_LOAD_DATA_STATUS table reports information about rows and bytes read by in-progress LOAD DATA queries.

It also reports activity and database names, which you can use to find corresponding rows in workload profiling tables. See Management View Reference for more details.

Important

Result sets will only be returned if LMV_LOAD_DATA_STATUS is queried on the same aggregator as the in-progress LOAD_DATA queries.

information_schema.LMV_LOAD_DATA_STATUS Table Schema

Column Name

Description

ID

The connection ID.

ACTIVITY_NAME

The name of the database activity.

DATABASE_NAME

The name of the database associated with the file being loaded into the workspace.

BYTES_READ

Bytes read from the input file stream.

ROWS_READ

A count of rows read in from the source file (including skipped rows).

SELECT * FROM information_schema.LMV_LOAD_DATA_STATUS;
****
+------+------------------------------------+---------------+------------+-----------+
| ID   | ACTIVITY_NAME                      | DATABASE_NAME | BYTES_READ | ROWS_READ |
+------+------------------------------------+---------------+------------+-----------+
| 2351 | load_data_company_0e8dec6d07d9cba5 | trades        |   94380647 |    700512 |
+------+------------------------------------+---------------+------------+-----------+

I am trying to do a load data infile to a database. The only issue I have is I am getting a Error 1261. I was getting an incorrect datetime value error earlier but i solved that with the code in the load data infile below (set date_time = ). My problem now is it says that I don’t have enough data for all columns. I know that you are supposed to name the columns after the name of the table but I can’t seem to get it to work.

There is one table and it has 15 columns. The first column is the primary key, the other fourteen are regular columns.

Here is the load file statement:

load data infile 'c:/proj/test.csv' into table base (@var1,event,failure,ue,mc,mn,cell,durat,cause,ne,ims,hier,hier3,hier32)  
 set date_time = STR_TO_DATE(@var1, '%Y%m%d %H%i%s') 
;

Additional notes: pk column is called dataId and is an INT
It is auto increment.

Here is the data from the csv file:

2013-03-20 14:55:22,4098,1,21060800,344,930,4,1000,0,11B,344930000000011,4809532081614990000,8226896360947470000,1150444940909480000 

I am leveraging LOAD DATA INFILE to import a massive CSV file (~30 GB). There are some fields with no data and as a result I’m getting the following error code:

Error Code: 1261. Row 1 doesn't contain data for all columns

I would rather not try and edit the CSV file first due to its size. I had tried recreating the table with DEFAULT NULL for the column with the blank fields however I’m still getting the same error. Appreciate any help for a work-around on this.

Here is the create table

#c.execute("CREATE TABLE dns_part2_b (date VARCHAR(15), time VARCHAR(25), ip1 VARCHAR(20), ip2 VARCHAR(20), dns VARCHAR(150), arec VARCHAR(10), wsa VARCHAR(10) DEFAULT NULL)")

And the LOAD DATA command

LOAD DATA INFILE '/Volumes/data/mycsv.csv' INTO TABLE dns_part2_b FIELDS TERMINATED BY ',' LINES TERMINATED BY 'n' IGNORE 1 LINES;

this is an example of an incomplete record:

date,time,ip1,ip2,dns,arec,wsa
2013-07-01,00:00:14.301038+00:00,1.1.1.1,2.2.2.2,domain.com,A

and a complete one

date,time,ip1,ip2,dns,arec,wsa
2013-07-01,00:00:12.335246+00:00,2.2.2.2,1.1.1.1,WPAD,NB,(WS)

thanks!

Понравилась статья? Поделить с друзьями:
  • Error 126 не найден указанный модуль mta next rp
  • Error 126 не найден указанный модуль mta loader dll
  • Error 126 не найден указанный модуль kontakt
  • Error 126 дота 2
  • Error 126 windows 10 x64