News 2017

Security & Privacy

Paul Francis to lead session at the IAPP Europe Data Protection Congress 2017

April 2017
The session, entitled “Challenges and Strategies for Certifying Data Anonymization for Data Sharing,” brings together technical and legal experts to explore how Data Protection Officers (DPOs) can manage the complexities and uncertainties of GDPR-compliant data anonymization. The IAPP Congress will be held November 7-9 in Brussels.

Session Abstract:

Data sharing is increasingly important. Companies share data internally across business units to gain business insights, they share data externally with data analytics vendors, and they often share data simply to make money. Ensuring the anonymity of users in the data set is necessary. The process of approving or certifying anonymization however is costly, time consuming, and uncertain. Current approaches to anonymization are ad hoc at best. They require a custom strategy for each new data sharing scenario, and it is often unclear whether the data is really anonymized or not.

In this informative and lively session, corporate DPOs, vendors of analytics solutions, and privacy researchers share their experiences with data anonymization and the approval process. They provide case studies illustrating the pitfalls of "do it yourself" anonymization, and show how some new ready-for-use anonymization can eliminate the delays and guesswork of data anonymization.

Paul Francis to give keynote at Oakland '17 Workshop on Privacy Engineering

April 2017
Paul Francis will give the keynote address at the Oakland (IEEE S&P) Workshop on Privacy Engineering. The talk, entitled "The Diffix Framework: Revisiting Noise, Again", presents the first database anonymization system that exhibits low noise, unlimited queries, simple configuration, and rich query semantics while still giving strong anonymity.

The workshop will be held May 25 in San Jose, CA.

Talk Abstract:

For over 40 years, the holy grail of database anonymization is a system that allows a wide variety of statistical queries with minimal answer distortion, places no limits on the number of queries, is easy to configure, and gives strong protection of individual user data.  This keynote presents Diffix, a database anonymization system that promises to finally bring us within reach of that goal.  Diffix adds noise to query responses, but "fixes" the noise to the response so that repeated instances of the same response produce the same noise.  While this addresses the problem of averaging attacks, it opens the system to "difference attacks" which can reveal individual user data merely through the fact that two responses differ.  Diffix proactively examines queries and responses to defend against difference attacks.  This talk presents the design of Diffix, gives a demo of a commercial-quality implementation, and discusses shortcomings and next steps.

Targeted malware paper accepted at NDSS '17

January 2017
The paper "A Broad View of the Ecosystem of Socially Engineered Exploit Documents" was accepted at NDSS '17 (Network and Distributed System Security Symposium).  The authors include Stevens Le Blond, Cédric Gilbert, Utkarsh Upadhyay, and Manuel Gomez Rodriguez from MPI-SWS, as well as David Choffnes from Northeastern University.

Our understanding of exploit documents as a vector to deliver targeted malware is limited to a handful of studies done in collaboration with the Tibetans, Uyghurs, and political dissidents in the Middle East. In this measurement study, we present a complementary methodology relying only on publicly available data to capture and analyze targeted attacks with both greater scale and depth. In particular, we detect exploit documents uploaded over one year to a large anti-virus aggregator (VirusTotal) and then mine the social engineering information they embed to infer their likely targets and contextual information of the attacks. We identify attacks against two ethnic groups (Tibet and Uyghur) as well as 12 countries spanning America, Asia, and Europe. We then analyze the exploit documents dynamically in sandboxes to correlate and compare the exploited vulnerabilities and malware families targeting different groups. Finally, we use machine learning to infer the role of the uploaders of these documents to VirusTotal (i.e., attacker, targeted victim, or third-party), which enables their classification based only on their metadata, without any dynamic analysis. We make our datasets available to the academic community.