Context

This research internship is part of the Momentum project “Managing your data without leakage of information” funded by the CNRS and coordinated by Pierre Bourhis. This project aims to strengthen the privacy of users on the Internet by identifying potential leaks of private information and proposing effective countermeasures to better control the nature of personal data that can be accessed by unauthorized third parties. This project therefore addresses a social issue for the entire population.

Social networks, like other web applications, are now massively adopting the REST architectural style as an API design standard that allow them to share information with authorized third parties. While this mediation layer between the third parties and the database plays the role of controlling access to the nature of the data that a third party can retrieve, it can also reveal information about the structure of the database to the database.

To answer this problem, we want to study to what extent a database schema can be learned from the exchange protocol imposed by a REST API, thus making it possible to establish links between data that is not shared a priori.

Objectives

In particular, we propose the identification of logical rules to extract a database schema by relying on the traces of request-response exchanges produced from a REST API.

These traces of exchanges can be obtained by an HTTP robot (e.g., Chrome headless) that will explore the REST API of a given site to observe the exchange of information between a client and a web application.

In a second step, these traces will constitute a raw data set on which extraction rules can be identified in order to infer a probable schema of the data stored in database.

Finally, the identification of this schema will serve not only i) to highlight potential vulnerabilities to the privacy of users but also ii) to make recommendations to developers of web applications to strengthen control strategies access to data.