Fragmented Thought

Parsing the LEIE Database (DBF file)



Lance Gliser

Heads up! This content is more than six months old. Take some time to verify everything still works as expected.

Had a client today that needed to parse the USA LEIE Database (List of Excluded Individuals and Entities). This is the database the government uses to excluded vendors from taking part in any federally funded healthcare program. You can download a copy of your own. Surprise, the file is in .DBF format!

What is a DBF? It's a "DataBase Format" file. Basically, a self-contained database table. I'm told MS access, MS excel, and a few other programs can open these allowing you to potentially resave them as a csv. But... why? It's a file, with some characters in it. Let's ignore the fact that the last time you or I saw a .DBF was that one time we happen to be working with someone still programming in Visual FoxPro. Use the definitions below to treat it like a giant string, and parse right through.

So, here's the interesting details you'll need to know.

  • The file contains businesses and individuals, with columns somewhat interspersed.
  • There are <em>no linebreaks</em> in the actual data.
  • The columns themselves are a set width. But those columns do not always appear in the data row.

So, knowing the above it's fairly simple to produce a string parse that just uses the set length of fields to fill data arrays so you can process it. Here's a sample function:

function _parse_leie_dbf(){ $fh = fopen('public://import/UPDATED.DBF', 'r'); // Pull in the first 1000 characters, and look for the last column heading, adding 1 // To get the byte location of the first record $data = fread($fh, 1000); $rows_start = strpos($data, 'WVRSTATEC') + strlen('WVRSTATEC') + 1; fseek($fh, $rows_start); $i = 0; // The lines in the file are a set length, without delimiters while($line = fgets($fh, 238)){ $i++; if($i > 15 ){ break; } // The fields in this file are a set length $data = array( 'first_name' => substr($line, 0, 20), 'last_name' => substr($line, 20, 15), 'middle_name' => substr($line, 35, 15), 'company' => substr($line, 50, 30), 'general' => substr($line, 80, 20), 'specialty' => substr($line, 100, 20), 'UPIN' => substr($line, 120, 6), 'NPI' => substr($line, 127, 10), 'DOB' => substr($line, 137, 8), 'thoroughfare' => substr($line, 144, 30), 'locality' => substr($line, 174, 20), 'administrative_area' => substr($line, 195, 2), 'postal_code' => substr($line, 197, 5), 'exclusion_type' => substr($line, 202, 9), 'exclusion_date' => substr($line, 211, 8), 'rein_date' => substr($line, 219, 8), 'waver_date' => substr($line, 227, 8), 'waver_state' => substr($line, 235, 2), ); print '<pre>'; print_r($data); print '</pre>'; } fclose($fh); }