License
- MIT
- Yesattribution
- Permissivelinking
- Permissivedistribution
- Permissivemodification
- Nopatent grant
- Yesprivate use
- Permissivesublicensing
- Notrademark grant
Downloads
Readme
errata
Define an errata in table format (CSV) and then apply it to an arbitrary source. Inspired by RFC Errata, lets you keep your own errata in a transparent way.
Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.
Inspiration
There’s a process for reporting errata on RFC:
- RFC Errata
- Status and Type Descriptions for RFC Errata
- How to report errata
Example
Every errata has a table structure based on the IETF RFC Editor’s “How to Report Errata”.
date name email type section action x y condition notes 2011-03-22 Ian Hough ian@brighterplanet.com meta Intended use http://example.com/original-data-with-errors.xls A hypothetical document that uses non-ISO country names 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /ANTIGUA & BARBUDA/ ANTIGUA AND BARBUDA 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BOLIVIA/ BOLIVIA, PLURINATIONAL STATE OF 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BOSNIA & HERZEGOVINA/ BOSNIA AND HERZEGOVINA 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BRITISH VIRGIN ISLANDS/ VIRGIN ISLANDS, BRITISH 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /COTE D'IVOIRE/ CÔTE D'IVOIRE 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /DEM\. PEOPLE'S REP\. OF KOREA/ KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /DEM\. REP\. OF THE CONGO/ CONGO, THE DEMOCRATIC REPUBLIC OF THE 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /HONG KONG SAR/ HONG KONG 2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /IRAN \(ISLAMIC REPUBLIC OF\)/ IRAN, ISLAMIC REPUBLIC OFWhich would be saved as a CSV:
date,name,email,type,section,action,x,y,condition,notes
2011-03-22,Ian Hough,ian@brighterplanet.com,meta,Intended use,,http://example.com/original-data-with-errors.xls,,A hypothetical document that uses non-ISO country names
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/ANTIGUA & BARBUDA/,ANTIGUA AND BARBUDA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOLIVIA/,"BOLIVIA, PLURINATIONAL STATE OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOSNIA & HERZEGOVINA/,BOSNIA AND HERZEGOVINA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BRITISH VIRGIN ISLANDS/,"VIRGIN ISLANDS, BRITISH",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/COTE D'IVOIRE/,CÔTE D'IVOIRE,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\. PEOPLE'S REP\. OF KOREA/,"KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\. REP\. OF THE CONGO/,"CONGO, THE DEMOCRATIC REPUBLIC OF THE",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/HONG KONG SAR/,HONG KONG,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/IRAN \(ISLAMIC REPUBLIC OF\)/,"IRAN, ISLAMIC REPUBLIC OF",,
And then used
errata = Errata.new(:url => 'http://example.com/errata.csv')
original = RemoteTable.new(:url => 'http://example.com/original-data-with-errors.xls')
original.each do |row|
errata.correct! row # destructively correct each row
end
UTF-8
Assumes all input strings are UTF-8. Otherwise there can be problems with Ruby 1.9 and Regexp::FIXEDENCODING. Specifically, ASCII-8BIT regexps might be applied to UTF-8 strings (or vice-versa), resulting in Encoding::CompatibilityError.
More advanced usage
The earth
library has dozens of real-life examples showing errata in action:
Real-world usage
We use errata
for data science at Brighter Planet and in production at
- Brighter Planet’s reference data web service
- Brighter Planet’s impact estimate web service
The killer combination:
active_record_inline_schema
- define table structureremote_table
- download data and parse iterrata
(this library!) - apply corrections in a transparent waydata_miner
- import data idempotently
Authors
- Seamus Abshere seamus@abshere.net
- Andy Rossmeissl andy@rossmeissl.net
- Ian Hough ijhough@gmail.com
Copyright
Copyright (c) 2012 Brighter Planet. See LICENSE for details.