Improving the quality of movement of athletes and patients is important for reducing injury risk and enhancing quality of life. Currently, assessment of movement quality is performed by eye using the experience of a physiotherapist. However, it is difficult to quantify objectively and to record improvement of movement quality over time. In a market feasibility study that we performed with physiotherapists and coaches, we found that there is a significant market for a tool to quantify and record movement. It is important that this tool is accurate and easy to use. We have been working to build a mobile app using deep learning/computer vision technologies so that physios can quantify and track human motion. These technologies are still very new and we believe it is important that physiotherapists have an understanding of how the technology works. This means that they can choose the right trade-offs with respect to privacy and ownership. Below, we will discuss the way that most tech startups or companies would build this product today. We highlight issues with this and finally propose our solution.
How does deep learning/computer vision work?
Like all deep learning algorithms, the aim is to learn a mapping from an input (an image or video in this case) to a target label of interest (the skeletal joint positions of an athlete or patient). To train the algorithm, we need to collect a dataset of input-target pairs. We pass the inputs to the algorithm, e.g. one by one, and tune the many weights of the algorithm until the outputs generally match the target labels for each training sample. If we have done this well, hopefully the algorithm will generalise to unseen samples.
What type of data do we train the algorithm with?
To achieve the most accurate human skeleton estimates, we often use expensive motion capture (MoCap) systems in a laboratory environment. We record videos with standard cameras (the input) and the skeleton joint positions with the MoCap system (the labels). This is one type of dataset. The issue using this approach alone is that the videos from the standard cameras are not diverse. They have a very specific type of lighting (e.g. indoors). Algorithms trained on these images will only perform well on similar images, and will not work on different types of images (e.g. outdoors). To get around this, we supplement the dataset with diverse images of humans taken from the internet. The approximate skeleton joint positions of the humans in these images are labelled by human annotators. We find that the MoCap data provides accuracy of joint estimates, while the internet images provide image diversity. Other labels, such as an assessment of movement quality, are also of interest.
How would most startups/companies create this technology today?
The main task of a new startup or a new project in a company is to acquire datasets. For our use case, videos of human movements are acquired by setting up cameras in the target environment (e.g. a gym or hospital). Separately, motion capture may also be performed in a lab. The subjects captured by the videos and MoCap system give their permission and sign consent forms for the data to be used towards a particular use case. The videos are labelled by human annotators. This might happen in house under the supervision of the company (or be outsourced to another company or a platform like Amazon Mechanical Turk). The company pays the annotators around $1 per image and in return the annotator gives away the rights to the data. The company may also pay physiotherapists (domain experts) to add movement quality labels to the videos. Labels provided by medical professionals can cost up to $100 per image/video. The company then pays data scientists to train algorithms on the data. The data scientists also sign contracts that give all right to their ideas and intellectual property (IP) to the company. The datasets and algorithms are stored on centralised servers controlled by the company.
What are the issues with this approach and how can new technology help?
Value Flow (Who benefits?)
The annotators and domain experts receive a fixed, limited reward. In contrast, the rewards for the tech company are unlimited (although there is also a risk that the technology will not create any value). The domain experts may also benefit from a useful tool that makes their life easier, but this tool will also consistently extract their data for improving the algorithms. This is similar to the Facebook model, where a tool is provided “for free” in exchange for the users data. There is also an additional risk for domain experts in that the data that they label is being used to automate tasks within their area of expertise. Motion analysis is just one of many tasks that physiotherapists perform and the new technology may only be useful for simpler diagnoses initially. However, it is not inconceivable that continued automation over the span of years or decades could result in a reduced amount of work required from domain experts. Ironically, the data that they labelled may be used to train an AI model that eventually puts them out of a job.
Wouldn’t it be better if we set up a system such that those whose jobs were directly affected by automation were those to which the value of the automation flowed? With self-driving cars on the horizon, what if we had started an initiative where cameras were mounted to the cars of professional drivers, such as trucks and taxis, rather than the cars of automotive manufacturers and tech companies? If these professional drivers were put out of work, at least the value produced by the automation technology would flow to them rather than a centralised tech entity. For physiotherapists, can we set the system up so that the value of a future physioAI flows to this community, and not Facebook Health? This idea actually maps very well to the field of deep learning since we directly need the domain experts input in the form of labelled data to train such automation algorithms. One of the steps towards this is to encourage data ownership. Data unions (like a trade union, but for data) can be formed to aggregate individual data into a valuable dataset.
Using the traditional method, the dataset and labels are stored on a centralised server controlled by the company. Any new data captured with an app using this technology is also transferred to the centralised server. This provides a single point of failure for hacks, which are happening all the time. While subjects in the dataset may have the right to request that their data be deleted, many are not aware of their right and this rarely happens in practice.
Privacy concerns also prevent useful existing datasets from being used optimally. Take the analysis of human movement, for example. MoCap systems have been around since the 1980s. There is a mountain of data out there in universities, hospitals and sports clubs. However, the data is considered sensitive and cannot be widely shared. Many of the best-performing deep learning algorithms we have are benchmarked on small public MoCap datasets (the most popular one has less than 10 subjects). In contrast, a single MoCap dataset in a university can have 100s of subjects. It is well known that more (high-quality) data leads to improved performance of deep learning algorithms. In fact, it’s more important than the algorithm itself. While training today’s algorithms on MoCap datasets, what if we could increase the number of subjects from less than 10 to 100s? What if we could connect a network of these datasets that contains 1,000s, 10,000s or even 100,000s of subjects, while maintaining safety and privacy? Would this not likely result in an order of magnitude increase in performance?
Luckily, private AI technologies are reaching maturity. Compute-to-data and federated learning allow deep learning algorithms to be trained on datasets or collections of datasets without the data being transferred to a centralised location controlled by a tech company creating the algorithm. This means that the data can stay on the users’ mobile device or on-site of a trusted third party (such as a sports club, university or hospital), while still being used to advance our knowledge and understanding. This approach is compliant with GDPR, and in fact improves on the privacy and protection provided by GDPR. Even better, much of this technology has been open sourced, provided by groups like Ocean Protocol and Openmined, such that new startups can quickly build on top.
Every company that wants to enter the space needs to collect a dataset. This dataset is considered a barrier to entry and is almost never shared. Often datasets that already exist are re-collected due to inaccessibility. The data acquisition process can take years for a company. This is a huge inefficiency for technological progress.
Individuals are less likely to have competitive considerations and thus increased data ownership may help to reduce this problem. Aside from this, improved incentives for collaboration can help all ships to rise. With the Ocean data marketplace, companies can open a new income stream by monetising their datasets while maintaining full control of the data. If a competitor trains an algorithm on the data, the company could receive royalties every time that algorithm is used. They may also have their own algorithm. Is this not a more desirable competitive environment that encourages innovation rather than barriers to entry?
What is a better way to create this technology?
We now suggest a new approach that differs from the typical approach of today. A new data science group (like VisioTherapy or Algovera) has an idea for an algorithm that provides business value and checks various data marketplaces to see if the data that they require exists. With the ability to bring in new revenue streams while maintaining privacy, many universities, hospitals and sports club make their data available for research and commercial purposes. If the data exists, the group purchases access to the required datasets and trains their algorithms using private AI infrastructure. The data providers are rewarded with fees while retaining full control over the data (it never leaves their servers). If the data doesn’t exist, the group can make use of new apps (like our VisioTherapy app or the DataUnion app) for crowdsourcing labels from annotators and domain experts. Unlike other crowdsourcing platforms, the app rewards the contributors with ownership shares of the dataset. The contributors can exchange these shares for cash or hold on to the shares with the expectation of royalties. The new algorithm is a success and the group makes it available on an algorithm marketplace and within a user app. Whenever the algorithm or app are bought, the value flows back to the data science group and the data contributors. A community of domain experts and other individuals receive value rather than a single centralised tech entity.
How have VisioTherapy been working towards this?
With the VisioTherapy project (funded by OceanDAO), we have developed an app (in collaboration with DataUnion) that can be used to crowdsource videos of human movements. We are collecting a dataset, providing ownership of that dataset to the contributors and making the dataset available on the Ocean marketplace. Within the app, users can upload and annotate videos, and also manage their ownership shares of the dataset. We have also begun exploring the roadblocks to making MoCap datasets owned by universities, hospitals and sports clubs available (privately and safely) on the marketplace. The next steps of the project are to continue to acquire and curate more data and to incentivise communities of data scientists to create algorithms — and maybe even apps – on top (e.g. using the Algovera community). This is a Proof of Concept for decentralised AI applied to the physio space: One that is created and controlled by the community rather than a tech company.
Richard Blythman. Richard is a PhD from Trinity College Dublin and an expert in AI and machine learning