Romanian Journal of Information Science and Technology (ROMJIST)

An open – access publication

  |  HOME  |   GENERAL INFORMATION  |   ROMJIST ON-LINE  |  KEY INFORMATION FOR AUTHORS  |   COMMITTEES  |  

ROMJIST is a publication of Romanian Academy,
Section for Information Science and Technology

Editor – in – Chief:
Academician Dan Dascalu

Secretariate (office):
Adriana Neagu
Adress for correspondence: romjist@nano-link.net (after 1st of January, 2019)

Editing of the printed version: Mihaela Marian (Publishing House of the Romanian Academy, Bucharest)

Technical editor
of the on-line version:
Lucian Milea (University POLITEHNICA of Bucharest)

Sponsors:
• National Institute for R & D
in Microtechnologies
(IMT Bucharest), www.imt.ro
• Association for Generic
and Industrial Technologies (ASTEGI), www.astegi.ro

ROMJIST Volume 23, No. T, 2020, pp. T94-T105, Paper no. 676/2020
 

Xun LI, Lishui CHEN, Yazhe TANG
HARD: Bit-Split String Matching Using a Heuristic Algorithm to Reduce Memory Demand

ABSTRACT: High-speed content inspection relies on a fast multi-pattern matching algorithm to detect predefined rules. When the number of target rules becomes large, the memory requirements of the matching engine become a critical issue. An effective technique to design high-performance matching engines is to divide the target rule set into multiple subgroups and to use a parallel matching hardware unit for each subgroup. The key to this effective technique is how to find a strategy to divide subgroups. This paper proposes an effective rule classifying method referred to as HARD for heterogeneous bit-split string matching architectures. HARD uses the uniqueness of the target pattern to classify all target rule characters. This paper also presents a method to estimate the distance between strings in unique pattern category. The distance formula is next used to find a class for each rule. Furthermore, each class will be processed on different sizes of finite state machine. The experimental results show that the more the number of rules in the rule set, the more obvious the effect of HARD. In popular data sets, when the number of rules is above 4000, HARD can save nearly 50% of memory consumption compared to the previous bit-split string matching methods mentioned in the paper.

KEYWORDS: String Matching; State Machine Splitting; Finite State Machines; Fine-grained Traffic Identification

Read full text (pdf)






  |  HOME  |   GENERAL INFORMATION  |   ROMJIST ON-LINE  |  KEY INFORMATION FOR AUTHORS  |   COMMITTEES  |