CHARACTERIZE Characterization of Malicious Code in Mobile Apps

01/06/2018 to 31/05/2021

The current momentum of Android has attracted the interest of malicious app writers who are contributing with an increasingly high number of malware distributed through official and alternative markets. A malicious app is typically designed and advertised as providing specific user-desired functionalities, and yet is implemented to behave in a way that contradicts with the user interests. The palette of techniques used by malware goes from simple use of sensitive API methods (such as sendSMS) to more sophisticated exploitation of new vulnerabilities (such as data residue attacks after uninstallation of popular apps). Malware writers can further leverage evasion techniques to harden the job of security analysts by challenging static analysis approaches through the use of reflection, native code and string encryption, or by limiting the efficiency of dynamic analysis techniques through the non-execution of malicious behavior in an emulated environment. Nonetheless, reports from Antivirus vendors and studies in the literature regularly highlight the predominance of a set of malware families within which samples are categorized based on the runtime behavior of app code (i.e., the malware activation process as well as the actions and data used by the malicious payloads).

In CHARACTERIZE, we build on this key assumption that malicious behavior types are, to some extent, instantiated by similar code patterns in the variety of malware samples. The first challenge is then, for each family, to identify recurring samples of malicious pieces of code to infer the common patterns or the common features in order to “characterize” them. The second challenge is to leverage these patterns or features to detect Android Malware. To that end, CHARACTERIZE envisages to explore two parallel directions:

(1) Explainable Per-Family Machine Learning Malware Detection;

(2) Pattern Matching Based Malware Detection.We foresee the following contributions:

  • By relying on our previous works and on our AndroZoo dataset, CHARACTERIZE contributes in releasing a unique dataset of Piggybacked apps and apps with lineage (i.e. different versions).
  • CHARACTERIZE provides a large-scale characterization of malware families by extracting patterns and features by means of “diff” computations. The innovation lies on our capacity to learn from localized pieces of code which implements maliciousness in an app.
  • CHARACTERIZE proposes per-family machine learning malware detectors using new (semantics) features. This per-family detection with semantics features allows the explanation of why an app has been classified as malware.
  • CHARACTERIZE proposes a pattern-matching based approach, robust against obfuscation, to detect occurrences of malicious code fragment. The analyzed apps will be first instrumented to reduce the impact of obfuscation and ease the analysis (i.e. the pattern finding).

The analysis will be static or dynamic according to the level of sophistication of the obfuscation.The novelty of CHARACTERIZE lies in building on our capacity to learn from localized pieces of code which implements maliciousness in an app. A funded research project in this theme will allow to reach new levels of expertise and yield significant contributions in the field of mobile malware detection in app markets.

Wednesday, 7 November, 2018


Pilots for the European Cybersecurity Competence Networks: how can your SME benefit? - 6th Webinar -

The four pilot projects involved in the development of the European Cybersecurity Competence Network will present their plans and upcoming tools and services for SMEs in the webinar on the 2nd of April, 10:00 AM CEST



Future Events

Cyber Insurance and its Contribution to Cyber Risk Mitigation - Leiden March 25-29
25/03/2019 to 29/03/2019

The rise in both the scale and severity of recent cyberattacks demands new thinking about cybersecurity risk and the mitigation and transfer of that risk. Cyber insurance is one potential way to manage risk by transferring damage liability, but the cyber insurance market is immature and the understanding and actuarial knowledge of cyber-risk is currently underdeveloped.

e-SIDES workshop 2019

e-SIDES workshop: Towards Value-Centric Big Data: Connect People, Processes and Technology


2 April 2019

10am to 4pm


e-SIDES is a research project funded by European Commission H2020 Programme that deals with the ethical, legal, social and economic implications of privacy-preserving technologies in different big data context.