Short Text Classification Approach to Identify Child Sexual Exploitation Material
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Journal of Applied Soft Computing
Abstract
Producing or sharing Child Sexual Exploitation Material (CSEM) is a serious crime
fought vigorously by Law Enforcement Agencies (LEAs). When an LEA seizes a computer from a potential producer or consumer of CSEM, they need to analyze the suspect’s hard disk’s files looking for pieces of evidence. However, a manual inspection
of the file content looking for CSEM is a time-consuming task. In most cases, it is
unfeasible in the amount of time available for the Spanish police using a search warrant. Instead of analyzing its content, another approach that can be used to speed
up the process is to identify CSEM by analyzing the file names and their absolute
paths. The main challenge for this task lies behind dealing with short text distorted
deliberately by the owners of this material using obfuscated words and user-defined
naming patterns. This paper presents and compares two approaches based on short
text classification to identify CSEM files. The first one employs two independent supervised classifiers, one for the file name and the other for the path, and their outputs are later on fused into a single score. Conversely, the second approach uses
only the file name classifier to iterate over the file’s absolute path. Both approaches
operate at the character n-grams level, while binary and orthographic features enrich the file name representation, and a binary Logistic Regression model is used for
classification. The presented file classifier achieved an average class recall of 0.98.
This solution could be integrated into forensic tools and services to support Law
Enforcement Agencies to identify CSEM without tackling every file’s visual content,
which is computationally much more highly demanding.
Description
item.page.type
Article
item.page.format
Keywords
International Resources, Spain, child sexual abuse, Supervised Learning, Short Text Classification, File Name Classification, File Path Classification
Citation
Al-Nabki, M. W., Fidalgol, E., Alegre, E., & Alaiz-Rodríguez, R. (2020). Short Text Classification Approach to Identify Child Sexual Exploitation Material. arXiv preprint arXiv:2011.01113.