errata

1.1.1last stable release 13 years ago
Complexity Score
Low
Open Issues
N/A
Dependent Projects
4
Weekly Downloadsglobal
180

License

  • MIT
    • Yesattribution
    • Permissivelinking
    • Permissivedistribution
    • Permissivemodification
    • Nopatent grant
    • Yesprivate use
    • Permissivesublicensing
    • Notrademark grant

Downloads

Readme

errata

Define an errata in table format (CSV) and then apply it to an arbitrary source. Inspired by RFC Errata, lets you keep your own errata in a transparent way.

Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.

Inspiration

There’s a process for reporting errata on RFC:

  • RFC Errata
  • Status and Type Descriptions for RFC Errata
  • How to report errata

Example

Every errata has a table structure based on the IETF RFC Editor’s “How to Report Errata”.

date name email type section action x y condition notes 2011-03-22 Ian Hough ian@brighterplanet.com meta Intended use http://example.com/original-data-with-errors.xls A hypothetical document that uses non-ISO country names 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /ANTIGUA & BARBUDA/ ANTIGUA AND BARBUDA 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BOLIVIA/ BOLIVIA, PLURINATIONAL STATE OF 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BOSNIA & HERZEGOVINA/ BOSNIA AND HERZEGOVINA 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BRITISH VIRGIN ISLANDS/ VIRGIN ISLANDS, BRITISH 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /COTE D'IVOIRE/ CÔTE D'IVOIRE 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /DEM\. PEOPLE'S REP\. OF KOREA/ KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /DEM\. REP\. OF THE CONGO/ CONGO, THE DEMOCRATIC REPUBLIC OF THE 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /HONG KONG SAR/ HONG KONG 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /IRAN \(ISLAMIC REPUBLIC OF\)/ IRAN, ISLAMIC REPUBLIC OF

Which would be saved as a CSV:

date,name,email,type,section,action,x,y,condition,notes
2011-03-22,Ian Hough,ian@brighterplanet.com,meta,Intended use,,http://example.com/original-data-with-errors.xls,,A hypothetical document that uses non-ISO country names
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/ANTIGUA & BARBUDA/,ANTIGUA AND BARBUDA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOLIVIA/,"BOLIVIA, PLURINATIONAL STATE OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOSNIA & HERZEGOVINA/,BOSNIA AND HERZEGOVINA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BRITISH VIRGIN ISLANDS/,"VIRGIN ISLANDS, BRITISH",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/COTE D'IVOIRE/,CÔTE D'IVOIRE,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\.  PEOPLE'S REP\. OF KOREA/,"KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\. REP\. OF THE CONGO/,"CONGO, THE DEMOCRATIC REPUBLIC OF THE",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/HONG KONG SAR/,HONG KONG,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/IRAN \(ISLAMIC REPUBLIC OF\)/,"IRAN, ISLAMIC REPUBLIC OF",,

And then used

errata = Errata.new(:url => 'http://example.com/errata.csv')
original = RemoteTable.new(:url => 'http://example.com/original-data-with-errors.xls')
original.each do |row|
  errata.correct! row # destructively correct each row
end

UTF-8

Assumes all input strings are UTF-8. Otherwise there can be problems with Ruby 1.9 and Regexp::FIXEDENCODING. Specifically, ASCII-8BIT regexps might be applied to UTF-8 strings (or vice-versa), resulting in Encoding::CompatibilityError.

More advanced usage

The earth library has dozens of real-life examples showing errata in action:

Model Reference Errata file Country data_miner.rb wri_errata.csv Aircraft data_miner.rb faa_errata.csv Airports data_miner.rb openflights_errata.csv Automobile model variants data_miner.rb feg_errata.csv

Real-world usage

We use errata for data science at Brighter Planet and in production at

  • Brighter Planet’s reference data web service
  • Brighter Planet’s impact estimate web service

The killer combination:

  1. active_record_inline_schema - define table structure
  2. remote_table - download data and parse it
  3. errata (this library!) - apply corrections in a transparent way
  4. data_miner - import data idempotently

Authors

  • Seamus Abshere seamus@abshere.net
  • Andy Rossmeissl andy@rossmeissl.net
  • Ian Hough ijhough@gmail.com

Copyright

Copyright (c) 2012 Brighter Planet. See LICENSE for details.

Dependencies

Loading dependencies...

CVE IssuesActive
0
Scorecards Score
No Data
Test Coverage
No Data
Follows Semver
Yes
Github Stars
21
Dependenciestotal
3
DependenciesOutdated
0
DependenciesDeprecated
0
Threat Modelling
No Data
Repo Audits
No Data

Learn how to distribute errata in your own private RubyGems registry

gem install errata
Processing...
Done

16 Releases

RubyGems on Cloudsmith

Getting started with RubyGems on Cloudsmith is fast and easy.