Open Data represents an extraordinary business opportunity and the sector’s pioneers did not wait to seize it. Philippe Very, Lead Data Scientist at Sidetrade, decrypts the Open Data and its various possibilities.
How can you define the concept of Open Data?
Philippe Very – The term Open Data is used to describe data accessible by anyone, which can be utilized or shared. The essential criteria of Open Data are:
- The availability of data through interfaces or, in a more comfortable way, by massive download;
- The reutilization of data to allow the development of internal tools, without any access restrictions (for example trade limitations or restrictions of use for some sectors);
- The redistribution to partners that don’t have the time to invest on the search or management of data.
Brought together, those conditions enable the development of products and services in a more sophisticated, qualitative and easy way.
What are the benefits of Open Data? Are we heading to the development of a free-access model?
P.V. – Opening data is a deep-rooted trend. In the past, data were seen as a wealth and charged-for; today, they are publically exposed. The application of the Numeric Republic law, in October 2016, instilled a new momentum in France. A lot a institutions like INSEE and public entities now have to open their data.
The main goal: revitalize the economy and reinvent a new business model based on gratuity. Public bodies start to realize that the true innovators – which are SMEs, start-ups and students – will stop developing new services if they have to pay for it. San Francisco understood it years ago: their open data thickened the industries of the Silicon Valley and created a lot of new companies who developed an activity on these data.
Paradoxically, companies that can afford to keep their data for themselves won’t deny it. More and more private APIs (Applications Programming Interface) are closing again: Twitter, LinkedIn… Others are adopting a “freemium” model, for example in Belgium, where they apply a creative commons license implying that any service created from the open data will also have to be open. Data are precious; they grew in value with the emergence of social networks. We can see a paradox between the willingness to open the doors from some economic agents and the constitution of data monopolies.
What are the concrete business applications of Open Data?
P.V. – Thanks to free open sources, companies can develop artificial intelligence solutions and advanced machine learning algorithms. The difficulty resides in the constitution of an exhaustive and qualitative data base. In this way, Open Data is a breath of fresh air for Data Scientists!
Sidetrade uses transactions as first source of data, with a weight of 230 million payment experiences on 5.3 million companies. Today, this wealth can be completed by free-access firmographic data from INSEE. This scenario is ideal for Sidetrade because it allows us to improve the precision and the pertinence of predictive models.
Besides, the comprehensiveness of INSEE data on French companies enables firms as Sidetrade to serenely build a central customer data repository, including INSEE information of the size, the sector and the revenue of companies.
When does the Data Scientists team intervene?
P.V. – Sidetrade’s Data Scientists team starts with a validation and quality-control work on every data source – transactional or firmographic data.
Since the openness of INSEE data in January, we enriched or existing customer repository with comprehensive firmographic data. Nevertheless, the aggregation of data coming from various sources can be particularly complex. The stake here was to link our internal transactional data with the INSEE data, by affecting a SIREN number to each company in our database. The Data Scientists team developed a matching tool to build such a link.
However, the race for data is never won and the need to enrich the volume of processed data is never fulfilled. That’s the cost of predictive models!