Parsing the LEIE Database (DBF file)

Had a client today that needed to parse the USA LEIE Database (List of Excluded Individuals and Entities). This is the database the government uses to excluded vendors from taking part in any federally funded healthcare program. You can download a copy of your own here: https://oig.hhs.gov/exclusions/exclusions_list.asp. Surprise, the file is in .DBF format!

What is a DBF? It's a "DataBase Format" file. Basically, a self contained database table. I'm told MS access, MS excel, and a few other programs can open these allowing you to potentially resave them as a csv. But... why? It's a file, with some characters in it. Let's ignore the fact that the last time you or I saw a .DBF was that one time we happen to be working with someone still programming in Visual FoxPro. Use the definitions below to treat it like a giant string, and parse right through.

So, here's the interesting details you'll need to know.

  1. The file contains businesses and individuals, with columns somewhat interspersed.
  2. There are no linebreaks in the actual data.
  3. The columns themselves are a set width. But those columns do not always appear in the data row.

So, knowing the above it's fairly simple to produce a string parse that just uses the set length of fields to fill data arrays so you can process it. Here's a sample function:

  1. function _parse_leie_dbf(){
  2.   $fh = fopen('public://import/UPDATED.DBF', 'r');
  3.  
  4.   // Pull in the first 1000 characters, and look for the last column heading, adding 1
  5.   // To get the byte location of the first record
  6.   $data = fread($fh, 1000);
  7.   $rows_start = strpos($data, 'WVRSTATEC') + strlen('WVRSTATEC') + 1;
  8.   fseek($fh, $rows_start);
  9.   $i = 0;
  10.   // The lines in the file are a set length, without delimiters
  11.   while($line = fgets($fh, 238)){
  12.     $i++;
  13.     if($i > 15 ){
  14.       break;
  15.     }
  16.    
  17.     // The fields in this file are a set length
  18.     $data = array(
  19.       'first_name' => substr($line, 0, 20),
  20.       'last_name' => substr($line, 20, 15),
  21.       'middle_name' => substr($line, 35, 15),
  22.       'company' => substr($line, 50, 30),
  23.       'general' => substr($line, 80, 20),
  24.       'specialty' => substr($line, 100, 20),
  25.       'UPIN' => substr($line, 120, 6),
  26.       'NPI' => substr($line, 127, 10),
  27.       'DOB' => substr($line, 137, 8),
  28.       'thoroughfare' => substr($line, 144, 30),
  29.       'locality' => substr($line, 174, 20),
  30.       'administrative_area' => substr($line, 195, 2),
  31.       'postal_code' => substr($line, 197, 5),
  32.       'exclusion_type' => substr($line, 202, 9),
  33.       'exclusion_date' => substr($line, 211, 8),
  34.       'rein_date' => substr($line, 219, 8),
  35.       'waver_date' => substr($line, 227, 8),
  36.       'waver_state' => substr($line, 235, 2),
  37.     );
  38.     print '<pre>'; print_r($data); print '</pre>';
  39.   }
  40.  
  41.   fclose($fh);
  42. }
Tags: 

Comments

Something seems to have changed slightly in the format. Simplified my code and updated it.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.