Reading stuctured files into SQL Server Part 1

Tue, Sep 14, 2010 4-minute read

From time to time we have all probably been tasked with getting a structured file into SQL Server.

It could be a comma separated file, it could be some other delimeter. It does not really matter. What matter is that there are several ways of getting that data into SQL Server, where some are fast and efficient and others slow and sometimes even impossible if you do it wrong.

The obvious way of importing a structured file into SQL server is to either use BCP or to use SQL servers built in BULK INSERT.

i.e. 

BULK
INSERT 
YourTableName
FROM 'c:\commaseparatedfilename.csv'
WITH
(
FIELDTERMINATOR ',',
ROWTERMINATOR '\n'
)
GO

 

BCP and the built in method is fine when you want to do a one time import, and if you like to stretch it a bit you could even do periodic importing using a maintenance task that every day at a certain time imports a specific file from a location into sql server.

But what if you need to do some processing of the file? Then you are pretty much stuck with writing a program that reads the file and writes the modified records into sql server.

That might be cumbersome task, and what if your file is several gigabytes in size. Then you cannot simply read in the entire file, since your program might run out of memory. So what you should do is simply if possible read one record from the file at a time and process the record and pass it onto sql server.

I have created a few classes to help with that, which I will present in this blog post and the ones to come.

The tasks you need to do to get that file into SQL Server is probably something like:

 

  1. Read the records out from file, one at a time, as efficiently as possible using as little memory as possible
  2. Parse each record into its different columns resulting in a strongly typed object that can be pased onto SQL Server easily.
  3. Optionally parse each record and its values before its being passed onto SQL Server for storing.

 

I will present a nice solution to task #1 in this first blog post, and will present a solution to #2 in the next blog post.

For task #1 I have created a nice little class that I call FileRecordReader, which basically have a single method called ReadNextRecord.

 

The method ReadNextRecord will read the next record and return that as a string and advance its internal positions to the location of the next record in the file.

 

/// <summary>
/// Reads the next record from the stream.
/// </summary>
/// <returns>The next record from the stream or null if no more records exist.</returns>
public string ReadNextRecord()
{

 

The FileRecordReader class takes a few arguments in its constructor that will help it read the file and understand where a record starts and stops.

 

/// <summary>
/// Initializes a new instance of the <see cref="FileRecordReader"/> class.
/// </summary>
/// <param name="fileStream">The file stream.</param>
/// <param name="recordSeparator">The record separator.</param>
/// <param name="fileEncoding">The file encoding.</param>
public FileRecordReader(Stream fileStream, char recordSeparator, Encoding fileEncoding)
{

 

First argument is the stream where the reader should read its data from, which in real life usages should be a FileStream instance.

Second argument is a char that will be used to separate the records from each other - normal use cases would be a newline character \n, but this class supports any arbitrary character that you would like to use, in the case your records contains linebreaks that you would like to retain in the imported data.

Last parameter is the encoding of the file. This is also very important since a UTF-8 or -16 encoded file that gets read using your standard encoding in windows will not look pretty since they will be parsed incorrectly.

If you look at the source code attached you might see that it has a similar way of working as the built in StreamReader class and the method ReadLine - but if you have a different record separator you cannot use StreamReader but have to parse the file yourself.

I have attached the source code to the FileRecordReader class and also a simple test that tests that the class is working.

FileRecordReader.cs (5.33 kb)

FileRecordReaderTests.cs (2.86 kb)

 

Stay tuned for the next post where I will describe how you can use the above FileRecordReader to present a nice interface that makes it easy to get those records into SQL Server.