Background


Data Mining Objectives | The Data Gathering Processes

Data Mining Objectives

The data mining objectives associated with preparing a highly curated engineering data set for an AIM program typically consist of the following components:

  • Establishing a data schema which meets the needs of the engineering and risk analysis
  • Locating and managing high value records across silos of physical and electronic repositories in a cost efficient manner, where the records are usually poorly indexed and not searchable
  • Locating complementary data from current and legacy structured data which can be used as validation and/or lookup tables for correlating data
  • Identifying gaps in record coverage and establishing a remediation plan to identify or generate missing records
  • Applying business rules (ex: convert fractions to decimals), low-level engineering tasks (ex: lookup applicable ASME codes) and data normalization.
  • Producing a database containing the data elements collected from the available documents and databases
  • Populating software applications including risk based inspection (RBI), plant management systems (PMS) and geographical information systems (GIS),
  • Performing trending and anomaly analysis of such as vessel and pipeline inspection points and corrosion
  • Achieving ROI on investments in software and services to perform risk assessment

The Data Gathering Processes

Our observations of the difficulties associated with current data gathering processes include the following:

  • Poor knowledge of available records needed to complete the engineering and risk assessment
  • Silos of on-site physical records and corporate repositories of images which are poorly organized and indexed
  • Poor quality record images (PDF and TIFF) which are not searchable
  • Large volumes of components (vessels, piping, valves, meter stations, pipeline) which require engineering review
  • Engineering resources are used to perform manual tasks to locate records and perform data entry
  • Poorly defined quality control processes to insure data integrity

The table below is reflective of the amount of missing data (white columns) needed to perform the risk evaluation (blue columns). Missing data (red cells) is potentially resident in historical records. Operators typically do not have good methods to locate the relevant records.

Outside Diameter (in) MAOP (psi) Wall Thickness (in) Pipe Grade (ksi) Installation Year Impact Radius (ft) Class Area Potential Impact# (Class) Potential Impact# (HCA) Probability Ignition (%)
0 0 0 0
10.753 0.188 42 1900 0 0 0 0
0 0 0 0
10.75 627 0.25 42 2003 185.6907186 Class 1 0.070871516 0 7.106754105
10.75 627 0.25 42 2003 185.6907186 Class 1 0.070871516 0 7.106754105
0 0 0 0
0 0 0 0
6.625 0.28 42 2003 0 0 0 0
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
0 0 0 0
10.75 0.188 42 1969 0 0 0 0
6.625 573 0.188 42 1963 109.4239388 Class 1 0.024614154 0 3.597086627
6.625 573 0.188 42 1963 109.4239388 Class 1 0.024614154 0 3.597086627
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
10.75 603 0.188 42 1963 182.132763 Class 3 25.72946467 0 6.902683826
0 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0
10.75 0.188 42 1963 0 Class 3 0 0 0