Friday, February 24, 2012

Almost there with bulk insert

I have BULK INSERT T-SQL statements that work for all of my basic data
tables except for one.

Here is the problem statement (with obvious replacement of irrelevant
path info):

BULK INSERT igbconts_tmp FROM 'C:\\my_code_path\\IGBCONTS.txt'
WITH (KEEPNULLS,
FORMATFILE = 'C:\\my_data_path\\contacts.fmt');

And here is the output from this statement:

Msg 8152, Level 16, State 14, Line 3
String or binary data would be truncated.
The statement has been terminated.

(0 row(s) affected)

This tells me precisely nothing about where the real problem lies. I
am reluctant to post either the table definition or the format file
since they are large (the table, and thus the data file, has 104
fields. However, the first few lines in the format file are:

8.0
105
1 SQLCHAR 0 0 "\"" 0 dummy ""
2 SQLCHAR 0 0 "\",\"" 1 contact_id ""
3 SQLCHAR 0 0 "\",\"" 2 full_name ""
4 SQLCHAR 0 0 "\",\"" 3 last_name ""

And here are the last couple lines:

104 SQLCHAR 0 0 "\",\"" 103 user_defined_field15 ""
105 SQLCHAR 0 0 "\"\r\n" 104 user_defined_field16 ""

The table was created using the string length information given to us
by the data provider, and those fields that are not strings consist of
a few datetime values and a moderate number of floating point numbers.

The message suggests to me that one of the fields is too small for what
was actually found in the corresponding column in the data file for at
least one record. But in addition to there being over 100 columns,
there are several thousand records in the data file!

How do I determine precisely where the problem lies?

Thanks,

TedTed (r.ted.byers@.rogers.com) writes:

Quote:

Originally Posted by

I have BULK INSERT T-SQL statements that work for all of my basic data
tables except for one.
>
Here is the problem statement (with obvious replacement of irrelevant
path info):
>
BULK INSERT igbconts_tmp FROM 'C:\\my_code_path\\IGBCONTS.txt'
WITH (KEEPNULLS,
FORMATFILE = 'C:\\my_data_path\\contacts.fmt');
>
And here is the output from this statement:
>
Msg 8152, Level 16, State 14, Line 3
String or binary data would be truncated.
The statement has been terminated.
>
(0 row(s) affected)
>
This tells me precisely nothing about where the real problem lies.


Well, the real problem is one of two:

1) The data file contains occasional fields that are longer that
the receiving columns in the database.

2) There is a mismatch between the data file and the format file
somewhere, so the fields get out of sync with the data in the
table.

You can decide which of these cases you have by running the command
SET ANSI_WARNINGS OFF before you do the bulk insert. This legacy
setting turns of the check for truncation. Then do a SELECT on the
table, and if the data looks OK, then it's the first alternative. And
if it's a mess, it's the second.

For the first situation, if you want to track down where the errors
are, you have to turn to BCP and use the -e argument to specify an
error file. Yes, you can specify an error file with BULK INSERT too,
but BULK INSERT and BCP behaves differently in this case. BULK
INSERT just aborts, and writes nothing to the error file. BCP
imports the rest of the rows, and writes a message to the error file
so you can see which are the problematic records in the file.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thanks Erland

Turning off the warnings helped solve the problem. The data looked
fine, except for the last column. It had over half a dozen fields
embedded in it (but no useful data). It turns out that the data file
has half a dozen more fields than the supplier documented. I modified
the format file slightly to put the extra fields into an extra column
that is ignored when data is put into the table (column number 0), and
now all works flawlessly. I'll be talking with my colleagues and our
supplier, and this issue will certainly be addressed, but for now,
ignoring the extra fields is OK since we have no idea what they're
supposed to contain in hte rare instance they have data and the
supplier didn't bother to document them.

Thanks again,

Ted

No comments:

Post a Comment