Wednesday, September 2, 2009

Democratization of Data (or not?)

I had an interesting conversation with someone who manages corporate customer data for a major investment bank last week. She said that she had read our blog and developed the impression that Avox was attempting to democratize entity data. In her opinion (and as it turns out, in mine), this alone is likely to degrade the overall quality level of the content.

After this discussion, it dawned on me that we had better be clear about our objectives at Avox, particularly as they relate to

1) Maximize the amount of productive "challenges" received that help us to improve quality of the data in Every challenge to records with an AVID will be verified by our expert team of analysts before being applied (assuming, of course, the challenge is proven correct). The "democratization" element of this activity equates to the opening up of a chunk of our content on for the world to see and for anyone to challenge. BTW, the quality improvement benefits all our data vendor partner offerings too.

2) Provide a platform for others to link to and from. Our next release of wiki-data will incorporate additional identifiers from other firms. The AVID will be part of a URI for each entity enabling the technology community to efficiently leverage a single version of the truth. This will in turn attract more usage, more challenges, lower latency and increased accuracy.

3) Create a forum for anyone to comment on data records, propose changes in public and ultimately to add "non verified" data records which Avox or other firms can verify for clients. This is the big stretch and, if I'm honest, it's the iteration of wiki-data we are least certain about.

To be clear, Avox will never provide to clients data that has not been verified according to the terms of their service level agreements. We do however regularly get asked for large volumes of business entity data to facilitate marketing campaigns for example where data quality is not as important as it is for credit risk management or regulatory compliance. This is where information on millions of entities is sometimes required but the budget for procurement of this content is meager. Perhaps in these cases it is not necessary to have an independent and rigourous analysis performed on every entity at regular intervals and the community self checking mechanism is adequate. Moreover, a global community maintained model may be a great solution for a free and universal yellow pages capability (for example). Over time, more and more of these entities will be verified by expert third parties such as Avox and assigned identifiers as the market demands.

We don't have all the answers however our aim is to give you, the user community, a platform to access and use the content cheaply and efficiently. We are looking for your views and guideance on how to shape to make it a more market friendly service that represents real value for you.

BTW, keep your eyes open for v1 of later in September. You can check out Jonathan Lister's blog for progress - the software is opensource.