10/28/2013 05:19 EDT | Updated 12/28/2013 05:12 EST

Pipeline safety incidents: How we organized the data

When CBC News first saw the database of incidents for pipelines from across the country, the number of blanks was shocking.

Many columns — such as the event type, substance or released volume — had not been filled out, even though the information was often contained in the description field for the incident.

CBC News decided to clean up the data obtained from the federal regulator, keeping in mind that the goal was to give Canadians the clearest possible picture of safety-related issues on pipelines.

A large part of the work undertaken by CBC News was sifting through descriptions of each event to locate clear information that could help us fill in some of the blanks.

We also tracked down links to investigation reports for the largest incidents to give people the ability to find out what exactly happened.

The National Energy Board data set, obtained by CBC News via an access-to-information request, still has many holes and cannot be considered the official record of events. 

Only the federal regulator, the National Energy Board, and the companies involved know whether the database contains the final, most accurate details.

However, we believe our interactive website gives Canadians the ability to explore, for the first time, pipeline safety incidents reported in their area and across the country.

It is our hope the NEB will publish its own website of pipeline incidents, as recommended by the Senate energy committee this summer.

Here’s a look at the methodology we followed while tackling this data set.

Updating and adding information

One of the key things missing from the data set is the resolution of each incident. Was it investigated? What caused it? Was the company fined?

Ideally, these questions would be answered. But not all of that information is publicly available.

CBC tracked down the investigation reports into some of the biggest spills and ruptures, attaching links to those documents in our website.

This involved several dozen large-scale cases, mostly incidents probed by the Transportation Safety Board (TSB), an independent agency that looks into pipeline occurrences that pose transport issues.

One of the documents we also used was a featuring basic details about past pipeline ruptures.

A dozen of the ruptures listed there were relevant to the time period covered by our site, so we used their links to investigations by either the NEB or TSB.

A thirteenth rupture noted in the document suggested an NEB investigation had happened, but there were no published reports on it. When we requested a copy from the NEB, they said it could only be accessed with an access-to-information request.

We also searched the Transportation Safety Board reports, adding links to any reports that matched the incidents in the data set.

Finally, we sifted through a that featured basic information about several dozen liquid leaks in recent years.

When we compared this publicly reported liquid spills data to the data set obtained from the NEB via access-to-information requests, more holes became apparent.

In a few of them, dates and spill amounts varied between the data set and the published accounts. In one instance, the difference in dates was five days.

Sometimes, there were inconsistencies in the amount of product released. We updated with the published data in these cases.

Filling in blanks

Due to the high number of blanks left in the database, CBC also tried to fill out any information we could.

We read through each incident summary and used specific details contained within them to fill out any blank fields, such as type of event or amount spilled.

CBC used the criteria set out by NEB itself regarding what constituted a reportable event and only made updates or changes when the information in the summary was clear.

Most changes were made to fields otherwise blank or marked as Not Available. In a few cases we changed the amount of product released if it stated zero in the volume field but a specific amount was contained in the summary field.

In many instances, the summary would report that an unknown amount of product had been released. In these cases, the field for volume released might be a blank or zero. We left it that way.

If an event type was left blank or listed as Not Available -- as was the case for nearly all incidents in the early 2000s -- CBC changed it to the most appropriate event or selected Other Event.

The NEB may cite only one event type for a single incident, but sometimes a case can involve multiple events. We added event types if the summary indicated, for example, that an unintended fire had resulted in a release of product plus a worker was seriously injured.

About 10 incidents had duplicate entries. This appears to be how NEB distinguished spills where two separate products were released. We combined these entries, noting both substances and the total amount released.

NEB data included “>100 m3 release of liquid.” We changed this to the more generic Release of Product label. Users can see the size of spills by viewing the map called Amount.

CBC sought the advice of the NEB on a number of issues, including whether “produced water” should be listed as a Release of Product or Other Event. The data set varied between the two.

Produced water is the salty water trapped in reservoir rock that is emitted during oil or gas production and can contain minor amounts of chemicals from the process, requiring it to be treated before release.

According to the NEB, both produced water and contaminated water are not considered a Release of Product because they are “not considered pipeline products.” As such, they get listed as Other Events in the regulator’s data collection. We followed in their footsteps.

With the help of the NEB, we also converted 10 incidents where the released amount was actually listed as weight into the standard volume. All mentions of tons or tonnes were converted to the Canadian standard metric tonne, then translated into litres, using the density of the product.

In the end, all the above additions aim to make this data more usable for the public.

To see the changes yourself, you can. Columns include both the original data and the final version used on the site.

If you have any pipeline-related stories, please email us at