Data collection

For the purpose of the following discussion, the data under consideration includes that contained in store receipts and product packages. The notion of pubwan, of course, applies to all potentially tabular data that are not already in the machine-readable part of the public domain.

manual data entry
The early stages of pubwan development will probably not be possible without manual data entry. This entails participants volunteering time as well as information. Manually entering the information content of store receipts, store shelf tags, product packages and perhaps the occasional catalog (legally questionable?), is bound to be a tedious process. Software must be devised that addresses this tediousness directly and elegantly. Input forms should be end-user configurable, on the fly, given the seemingly intentional non-standardization of the source documents.

Another tool for readying source data for machine readability, and perhaps in some cases tabularity, is XML tagging. Barcode wiki is offered as an example of how this might be addressed. A apropos technology for the pubwan movement to work on would be some sort of XML-to-email-to-(my?)SQL gateway.

automated data collection
Pubwan will need a large data set in order to be useful to people, which is to say, for the “network effects” to kick in. It will need a large number of contributors to populate a large data set. It will need to be easy-to-do in order to attract a large community of volunteers.

semi-automated data collection
The most valuable interfaces are the barcode reader, the GPS receiver and the humble clock (included of course in the GPS system). If one can automatically space-time-stamp instances of barcodes, the only manual data entry is price, which is conveniently amenable to the numeric keypad. What is needed is a single portable instrument which contains a barcode reader, GPS receiver and numeric keypad. Ideally both the hardware and software should be open source. A nice (and eventually necessary) addition is an RFID reader. RFID may be the most potent weapon against early-stage pubwan, and of course against information symmetry in general.

fully automated data collection
Collection of data about consumer behavior by business entities is now fully automated. Granted there are still some manual interfaces such as the warranty card and the form to apply for a loyalty card, but the point of sale is fully automated in the sense that consumers volunteer information for proprietary use with virtually no increase in personal effort over executing the transaction itself. Data collection and warehousing is fully integrated into each transaction. An obvious goal for the pubwan movement is integrating consumer-controlled harvesting of transaction information into the act of purchasing. This will be difficult if not impossible, because all the information technologies involved in the transaction—the cash register, the item database, the market research database, and even the consumer's “own” credit or debit card—are the private property—the proprietary information system—of someone else. Full automation of data collection for pubwan use may require persuading companies in the retail, financial and IT sectors to offer API libraries, akin to developments already underway in the “social networking” industry. This is of dubious potential for pubwan development, given the demonstrable fact that information does not want to be free.

Another approach to the problem may involve the creation of consumer-controlled financial institutions (existing credit unions?) capable of issuing credit and/or debit cards offering cardholder-controlled information flows. Basically this means provision of online statements in a form that is machine-readable, database-importable, high resolution and of nonproprietary format. This still leaves breaking down receipts by line item as a chore for the consumer. More bleakly, there would still be the contractual hooks to the data processing entities behind the MasterCard and Visa brands. This may also be an insurmountable obstacle.