Postgres error missing data for column

I'm trying to import a .txt file into PostgreSQL. The txt file has 6 columns: Laboratory_Name Laboratory_ID Facility ZIP_Code City State And 213 rows. I'm trying to use copy to put ...

I’m trying to import a .txt file into PostgreSQL. The txt file has 6 columns:

Laboratory_Name Laboratory_ID   Facility    ZIP_Code     City   State

And 213 rows.

I’m trying to use copy to put the contents of this file into a table called doe2 in PostgreSQL using this command:

copy DOE2 FROM '/users/nathangroom/desktop/DOE_inventory5.txt' (DELIMITER(' '))

It gives me this error:

missing data for column "facility"

I’ve looked all around for what to do when encountering this error and nothing has helped. Has anyone else encountered this?

Erwin Brandstetter's user avatar

asked Nov 5, 2014 at 0:33

nathanmgroom's user avatar

2

Three possible causes:

  1. One or more lines of your file has only 4 or fewer space characters (your delimiter).

  2. One or more space characters have been escaped (inadvertently). Maybe with a backslash at the end of an unquoted value. For the (default) text format you are using, the manual explains:

Backslash characters () can be used in the COPY data to quote data
characters that might otherwise be taken as row or column delimiters.

Output from COPY TO or pg_dump would not exhibit any of these faults when reading from a table with matching layout. But maybe your file has been edited or is from a different, faulty source?

  1. You are not using the file you think you are using. The copy meta-command of the psql command-line interface is a wrapper for COPY and reads files local to the client. If your file lives on the server, use the SQL command COPY instead.

answered Nov 5, 2014 at 1:30

Erwin Brandstetter's user avatar

Erwin BrandstetterErwin Brandstetter

579k139 gold badges1035 silver badges1189 bronze badges

7

Check the file carefully. In my case, a blank line at the end of the file caused the ERROR: missing data for column. Deleted it, and worked fine.

Printing the blank lines might reveal something interesting:

cat -e $filename

answered Oct 13, 2021 at 16:17

Nagev's user avatar

NagevNagev

9,3454 gold badges52 silver badges66 bronze badges

I had a similar error. check the version of pg_dump that was used in exporting the data and the version of the database you are want to insert it into. make sure they are same. Also, if copy export fails then export the data by insert

answered Jan 2, 2022 at 21:20

aniefiok's user avatar

aniefiokaniefiok

211 silver badge5 bronze badges

Contents

  • 1 History
  • 2 Overview
  • 3 COPY options
  • 4 Example
    • 4.1 error logging off
    • 4.2 skip bad rows
    • 4.3 turn error logging on (default logs in error_logging_table)
    • 4.4 Redirect to another table with a specific label
    • 4.5 Limit to 2 bad rows:

History

Error logging in COPY was a proposed feature developed by Aster Data against the PostgreSQL 9.0 code base. It was submitted and reviewed (1) but not accepted into the core product for that or any other version so far.

Overview

The purpose of error logging in COPY is to prevent the backend from erroring out if a malformed tuple is encountered during a COPY operation. Bad tuples can either be skipped or logged into an error logging table.

The format of the error logging table is as follows:

 CREATE TABLE error_logging_table(
   tupletimestamp TIMESTAMP WITH TIME ZONE,
   targettable    VARCHAR,
   dmltype        CHAR(1),
   errmessage     VARCHAR,
   sqlerrcode     CHAR(5),
   label          VARCHAR,
   key            BIGINT,
   rawdata        BYTEA
 );

The COPY command returns the number of successfully copied tuples only.

COPY options

Error logging is set by adding options to the COPY command. Here is the list of the available options:

Variable name Description Default value
ERROR_LOGGING Enables error handling for COPY commands (when set to true). true
ERROR_LOGGING_SKIP_BAD_ROWS Enables the ability to skip malformed tuples that are encountered in COPY commands (when set to true). true
ERROR_LOGGING_MAX_ERRORS Maximum number of bad rows to log before stopping the COPY operation (0 means unlimited). 0
ERROR_LOGGING_SCHEMA_NAME Schema name of the table where malformed tuples are inserted by the error logging module ‘public’
ERROR_LOGGING_TABLE_NAME Relation name where malformed tuples are inserted by the error logging module. The table is automatically created if it does not exist. ‘error_table’
ERROR_LOGGING_LABEL Optional label that is used to identify malformed tuples COPY command text
ERROR_LOGGING_KEY Optional key to identify malformed tuples Index of the tuple in the COPY stream

Bad tuples can be rejected for a number of reasons (extra or missing column, constraint violation, …). The error table tries to capture as much context as possible about the error. If the table does not exist it is created automatically. The format of the error logging table is as follows:

 CREATE TABLE error_logging_table(
   tupletimestamp TIMESTAMP WITH TIME ZONE,
   targettable    VARCHAR,
   dmltype        CHAR(1),
   errmessage     VARCHAR,
   sqlerrcode     CHAR(5),
   label          VARCHAR,
   key            BIGINT,
   rawdata        BYTEA
 );

tupletimestamp stores the time at which the error occured. targettable describes the table in which the row was inserted when the error occured. The exact error message and sql error code are recorded in errmessage and sqlerrcode, respectively. The original data of the row can be found in rawdata.

Example

 CREATE TEMP TABLE foo (a bigint, b text);

— input_file.txt —

 1	one
 2	
 3	three	111
 four    4
 5	five

— end of input_file.txt —

error logging off

 COPY foo FROM 'input_file.txt';
 ERROR:  missing data for column "b"
 CONTEXT:  COPY foo, line 2: "2"

skip bad rows

 --skip bad rows
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_SKIP_BAD_ROWS);
 SELECT * from foo;
  a |  b   
 ---+------
  1 | one
  5 | five
 (2 rows)

turn error logging on (default logs in error_logging_table)

 --turn error logging on (default logs in error_logging_table)
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING);
 SELECT * from foo;
  a |  b   
 ---+------
  1 | one
  5 | five
 (2 rows)
 SELECT * FROM error_logging_table;
  key |           tupletimestamp            |              label              |  targettable  | dmltype |                errmessage                | sqlerrcode |         rawdata          
 -----+-------------------------------------+---------------------------------+---------------+---------+------------------------------------------+------------+--------------------------
    2 | Thu Sep 10 07:09:17.869521 2009 PDT | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | missing data for column "b"              | 22P04      | x32
    3 | Thu Sep 10 07:09:17.86953 2009 PDT  | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | extra data after last expected column    | 22P04      | x3309746872656509313131
    4 | Thu Sep 10 07:09:17.869538 2009 PDT | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | invalid input syntax for integer: "four" | 22P02      | x666f75720934
 (3 rows)

Redirect to another table with a specific label

 -- Redirect to another table with a specific label
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_SCHEMA_NAME 'error', ERROR_LOGGING_TABLE_NAME 'table1', ERROR_LOGGING_LABEL 'batch1');
 SELECT * FROM error.table1;
  key |           tupletimestamp            | label  |  targettable  | dmltype |                errmessage                | sqlerrcode |         rawdata          
 -----+-------------------------------------+--------+---------------+---------+------------------------------------------+------------+--------------------------
    2 | Thu Sep 10 07:09:17.869521 2009 PDT | batch1 | pg_temp_2.foo | C       | missing data for column "b"              | 22P04      | x32
    3 | Thu Sep 10 07:09:17.86953 2009 PDT  | batch1 | pg_temp_2.foo | C       | extra data after last expected column    | 22P04      | x3309746872656509313131
    4 | Thu Sep 10 07:09:17.869538 2009 PDT | batch1 | pg_temp_2.foo | C       | invalid input syntax for integer: "four" | 22P02      | x666f75720934
 (3 rows)

Limit to 2 bad rows:

 -- Limit to 2 bad rows:  
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_MAX_ERRORS 2);
 ERROR:  invalid input syntax for integer: "four"
 CONTEXT:  COPY foo, line 4, column a: "four"
 SELECT count(*) from error_logging_table;
  count 
  -------
       0
  (1 row)

I’m working on items for migrating my database class from Oracle to PostgreSQL. I ran into an interesting limitation when I tried using the COPY command to read an external CSV file.

I had prepared the system by creating a new directory hierarchy owned by the postgres user on top of a /u01/app mount point. I set the ownership of the directories and files with the following command from the /u01/app mount point:

chown -R postgres:postgres postgres

After running the following command:

COPY transaction_upload
FROM '/u01/app/upload/postgres/transaction_upload_postgres.csv' DELIMITERS ',' CSV;

The command raised the following error:

COPY transaction_upload FROM '/u01/app/upload/postgres/transaction_upload_postgres.csv' DELIMITERS ',' CSV;
ERROR:  must be superuser or a member of the <code>pg_read_server_files</code> role to COPY from a file
HINT:  Anyone can COPY to stdout or from stdin. psql's copy command also works for anyone.

The two options for fixing the problem are: Changing the student user to a superuser, and granting the pg_read_server_files role to the student user. Changing the student user to a superuser isn’t really a practical option. So, I connected as the postgres superuser and granted the pg_read_server_files role to the student user. It is a system level role and therefore doesn’t limit the role to only the videodb database.

As the postgres user, type the following command to grant the pg_read_server_files role to the system user:

GRANT pg_read_server_files TO student;

After granting the role to the student user, I created a small test case. The test table definition is:

CREATE TABLE test
( id          INTEGER
, first_name  VARCHAR(20)
, last_name   VARCHAR(20));

I created a test.csv file in the /u01/app/upload/postgres directory, like:

1,Simon,Bolivar
2,Peter,Davenport
3,Michael,Swan

The test.csv file requires the following permissions and ownerships:

-rw-r--r--. 1 postgres postgres 49 Nov 13 10:56 test.csv

The permissions are user read-write, groups read, and others read. The ownership should be granted to postgres and the primary group for the postgres user, which should also be postgres.

You can then connect to psql as the student user with the database set to videodb and run the following copy command:

COPY test
FROM '/u01/app/upload/postgres/test.csv' DELIMITERS ',' CSV;

If you put a comma at the end of each line, like you would do in MySQL, it raises an error. The trailing comma raises the following error:

ERROR:  extra data after last expected column

If you forget a delimiting commas somewhere on a line, the copy command raises the following error:

ERROR:  missing data for column "last_name"
CONTEXT:  COPY tester, line 3: "3,Michael Swan"

The error points to the column after the missing column. The context points to the line number while displaying the text.

You should take careful note that the copy command is an appending command. If you run it a second time, you insert a duplicate set of values in the target table.

After experimenting, its time to fix my student instance. The transaction_upload_mysql.csv file has two critical errors that need to be fixed. They are:

  1. A comma terminates each line, which would raise an extra data after last expected column error.
  2. A comma terminates each line followed by some indefinite amount of whitespace, which would also raise an extra data after last expected column error.

Since I have students with little expertise in Unix or Linux commands, I must provide a single command that they can use to convert the file with problems to one without problems. However, they should copy the transaction_upload_mysql.csv file to ensure they don’t disable the equivalent functionality for the MySQL solution space.

They should copy two files as the root user from the mysql directory to the postgres directory, as follows:

cp /u01/app/mysql/upload/transaction_upload_mysql.csv /u01/app/postgres/upload/transaction_upload_postgres.csv
cp /u01/app/mysql/upload/transaction_upload2_mysql.csv /u01/app/postgres/upload/transaction_upload2_postgres.csv

As the root user in the /u01/app/upload/postgres directory, run the following command:

cat transaction_upload_postgres.csv | sed -e 's/,$//g' > x; cat x | sed -e 's/,[[:space:]]*$//g' > y; mv y transaction_upload_postgres.csv; rm x

Please check the file permissions and ownerships with the ll (long list) command. If the file isn’t like this:

-rw-r--r--. 1 postgres postgres 49 Nov 13 10:56 transaction_upload_postgres.csv

Then, they should be able to change it as the root user with these commands:

chown postgres:postgres transaction_upload_postgres.csv
chmod 544 transaction_upload_postgres.csv

Lastly, they should connect to the psql as the student user, using the videodb database and run the following command:

COPY transaction_upload
FROM '/u01/app/postgres/upload/transaction_upload_postgres.csv' DELIMITERS ',' CSV;

A query of the import table with this:

SELECT COUNT(*) FROM transaction_upload;

should return:

 count 
-------
 11520
(1 row)

As always, I hope this helps those looking for some explanation and example on the copy feature of PostgreSQL.

The PostgreSQL server COPY command is very simple and just aborts on a single failure. You might think that it could do far better (I know I do), but there’s a reason that the PostgreSQL codebase is so compact with respect to MySQL’s (by a factor of ~ 10/1).

However, there is the (very) nice pgloader programme which compensates for this at the price of having to run a separate utility.

Of course, if you’re good at the PL/pgSQL language (internal to the the server), then maybe you could explore that route — but why reinvent the wheel? Python and Perl also have internal PostgreSQL options. Then of course, there’s all the languages under the sun external to the server.

From the manual:

PgLoader Reference Manual

pgloader loads data from various sources into PostgreSQL. It can
transform the data it reads on the fly and submit raw SQL before and
after the loading. It uses the COPY PostgreSQL protocol to stream the
data into the server, and manages errors by filling a pair of
reject.dat and reject.log files.

which appears to be right up your alley?

The way it works is: (sorry for the long quote)

TL;DR — pgloader loads a batch (configurable) at a time. On failure, it «marks the spot», uses COPY again up until that point, stops, then puts the bad record into a file and continues from bad-record + 1.

Batches And Retry Behaviour

To load data to PostgreSQL, pgloader uses the COPY streaming protocol.
While this is the faster way to load data, COPY has an important
drawback: as soon as PostgreSQL emits an error with any bit of data
sent to it, whatever the problem is, the whole data set is rejected by
PostgreSQL.

To work around that, pgloader cuts the data into batches of 25000 rows
each, so that when a problem occurs it’s only impacting that many rows
of data. Each batch is kept in memory while the COPY streaming
happens, in order to be able to handle errors should some happen.

When PostgreSQL rejects the whole batch, pgloader logs the error
message then isolates the bad row(s) from the accepted ones by
retrying the batched rows in smaller batches. To do that, pgloader
parses the CONTEXT error message from the failed COPY, as the message
contains the line number where the error was found in the batch, as in
the following example:

CONTEXT: COPY errors, line 3, column b: «2006-13-11»

Using that information, pgloader will reload all rows in the batch
before the erroneous one, log the erroneous one as rejected, then try
loading the remaining of the batch in a single attempt, which may or
may not contain other erroneous data.

At the end of a load containing rejected rows, you will find two files
in the root-dir location, under a directory named the same as the
target database of your setup. The filenames are the target table, and
their extensions are .dat for the rejected data and .log for the file
containing the full PostgreSQL client side logs about the rejected
data.

Понравилась статья? Поделить с друзьями:
  • Php syntax error unexpected use
  • Play store изменить язык как
  • Postfix name service error for name
  • Php syntax error unexpected t encapsed and whitespace
  • Post запрос ошибка 500