FAQ & Definitions

We try to maintain here a living FAQ and some definitions. If you don’t find what you are looking for, please reach out at hello@substra.org or via your preferred channel.

 

Frequently asked questions

How did this start?

Following a previous collaborative initiative called Morpheo, the main concepts underlying the Substra framework were designed during the elaboration of a multi-partner research project proposal in H2 2017 - H1 2018: HealthChain. This project is supported by Bpifrance, and is a result from the Digital Investments Program for the major challenges of the future RFP. As part of the Healthchain project, a consortium coordinated by Owkin (a private company) has been established, including Substra Foundation, Apricity (a private company), the Assistance Publique des Hôpitaux de Paris, the University Hospital Center of Nantes, the Léon Bérard Center, the French National Center for Scientific Research, the École Polytechnique, the Institut Curie and the University of Paris.

 

Who is making the Substra software framework?

So far, Owkin's Substra team in Nantes (France) is developing the Substra framework. It started in April 2018. It has been released under the Apache 2.0 open source license in Fall 2019.

 

Is this only a software project?

The software project is very central, but we'd like the Substra initiative to be broader. We plan to produce content about the underlying approaches and methodologies. We are also working to identify challenges specific to multi-partner collaborative machine learning and how best to approach them (for example on how to share revenue between data science partners and data providers, see our repository distributed-learning-contributivity on Github). Substra Foundation is at the service of the open source initiative, and is also participating in collaborative research projects like HealthChain and Melloddy.

 

Under what license will the Substra framework be released

It has been released under the Apache 2.0 license.

 

How can one contribute?

Substra is a collaborative initiative and we'd like to foster contributions from motivated individuals and interested organizations. Federating a vibrant community is a thrilling objective.

You can directly head to the Github repositories, read the Contributing Guide, participate in the Discourse forum.

Contributions to the initiative and joining the community are not only available through software engineering. Enthusiasts wishing to participate in other ways (dissemination and communication efforts, local events / meetups, research studies...) are welcome! If you already have an interest, an idea, or just some comments, we'd be really glad to hear them. Please contact us at hello@substra.org or via your preferred channel (see all options on the Contact page).

 

 

Definitions

Hyperledger Fabric The world-leading private and permissioned blockchain framework. Hyperledger Fabric is one of the Hyperledger open source projects hosted by the Linux Foundation. It has been widely adopted as a reference framework for implementing blockchain-based services in business ecosystems. Substra Framework is built upon Hyperledger Fabric and its core components (distributed ledger, identities and membership mechanisms, smart contracts, consensus mechanisms, etc.).

 

Distributed ledger A distributed ledger is a consensus of replicated, shared, and synchronized digital data geographically spread across multiple sites, countries, or institutions. There is no central administrator or centralized data storage. A peer-to-peer network is required as well as consensus algorithms to ensure replication across nodes is undertaken (source: Wikipedia, June 2019).

 

Trustless
Substra Framework is a ‘trustless’ ML orchestration framework. The word ‘trustless’ might be ambiguous in certain circumstances. We believe it should be used as ‘doesn’t require trust a priori between parties’: the code implementation of the software enables parties to collaborate without trusting each other, it technically guarantees that actions and transactions will be performed as defined in the rules agreed upon. What is required is to ‘trust the code’: it might not be straightforward and even require some audit effort, but in many cases it is easier than trusting a number of other independent organisations.

 

Privacy-preserving
Substra Framework is a tool in the quest for ‘privacy-preserving’ ML (with the word ‘privacy’ referring to both the privacy of the dataset for the organisation managing it, or the privacy of personal data for the individuals these data refer to). It enables data analysis and machine learning computations on data without transferring the data to anyone and without giving data scientists read access to these data. It has to be combined with privacy enhancement approaches in ML algorithms (contractual requirements, algorithms audits...) and data pre-processing (differential privacy, anonymization of PII...).

 

Machine learning orchestration
In contexts where multiple parties collaborate for elaborating machine learning models, the different operations (e.g. algorithms transfers, training computations, model evaluations, predictions…) need to be orchestrated in time and space. Such an orchestration is done over a network connecting the parties, and requires complete traceability of all operations, identities certifications, security (among others). Substra Framework enables the implementation of applications or services requiring secure, traceable, distributed machine learning orchestration.