I did some benchmarking of regular expression engines and the results are quite interesting. The benchmark involved finding all matches of different patterns in a 15 Mb text file. It is a similar benchmark to that found in other comparisons of regular expression engines (
here and
here). The following engines were compared.
- Delphi’s built-in engine (PCRE)
- A modified version of Jcl’s regular expression engine, also PCRE based. Three versions of this set up were tested:
- with Study and JIT enabled
- with Study and no JIT
- without Study and JIT
- Python’s built-in regular expression engine and
- A relative new comer FLRE a pascal-based regular expression engine.
Here follow the results:
Time | Match count
==============================================================================
Delphi's own TRegEx:
/Twain/ : 35.00 ms | 811
/(?i)Twain/ : 69.00 ms | 965
/[a-z]shing/ : 510.00 ms | 1540
/Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 581.00 ms | 262
/\b\w+nn\b/ : 548.00 ms | 262
/[a-q][^u-z]{13}x/ : 566.00 ms | 4094
/Tom|Sawyer|Huckleberry|Finn/ : 743.00 ms | 2598
/(?i)Tom|Sawyer|Huckleberry|Finn/ : 875.00 ms | 4152
/.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2617.00 ms | 2598
/.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 2814.00 ms | 1976
/Tom.{10,25}river|river.{10,25}Tom/ : 421.00 ms | 2
/[a-zA-Z]+ing/ : 763.00 ms | 78423
/\s[a-zA-Z]{0,12}ing\s/ : 485.00 ms | 55201
/([A-Za-z]awyer|[A-Za-z]inn)\s/ : 1468.00 ms | 209
/["'][^"']{0,30}[?!\.]["']/ : 302.00 ms | 8885
Total Time: 12812.00 ms
==============================================================================
Modified TJclWideRegEx with Study and JIT:
/Twain/ : 9.00 ms | 811
/(?i)Twain/ : 30.00 ms | 965
/[a-z]shing/ : 74.00 ms | 1540
/Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 17.00 ms | 262
/\b\w+nn\b/ : 115.00 ms | 262
/[a-q][^u-z]{13}x/ : 181.00 ms | 4094
/Tom|Sawyer|Huckleberry|Finn/ : 18.00 ms | 2598
/(?i)Tom|Sawyer|Huckleberry|Finn/ : 50.00 ms | 4152
/.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 194.00 ms | 2598
/.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 216.00 ms | 1976
/Tom.{10,25}river|river.{10,25}Tom/ : 24.00 ms | 2
/[a-zA-Z]+ing/ : 168.00 ms | 78423
/\s[a-zA-Z]{0,12}ing\s/ : 97.00 ms | 55248
/([A-Za-z]awyer|[A-Za-z]inn)\s/ : 95.00 ms | 209
/["'][^"']{0,30}[?!\.]["']/ : 25.00 ms | 8885
Total Time: 1336.00 ms
==============================================================================
Modified TJclWideRegEx with Study no JIT:
/Twain/ : 11.00 ms | 811
/(?i)Twain/ : 42.00 ms | 965
/[a-z]shing/ : 272.00 ms | 1540
/Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 19.00 ms | 262
/\b\w+nn\b/ : 418.00 ms | 262
/[a-q][^u-z]{13}x/ : 384.00 ms | 4094
/Tom|Sawyer|Huckleberry|Finn/ : 23.00 ms | 2598
/(?i)Tom|Sawyer|Huckleberry|Finn/ : 209.00 ms | 4152
/.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2664.00 ms | 2598
/.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 2730.00 ms | 1976
/Tom.{10,25}river|river.{10,25}Tom/ : 45.00 ms | 2
/[a-zA-Z]+ing/ : 627.00 ms | 78423
/\s[a-zA-Z]{0,12}ing\s/ : 279.00 ms | 55248
/([A-Za-z]awyer|[A-Za-z]inn)\s/ : 593.00 ms | 209
/["'][^"']{0,30}[?!\.]["']/ : 40.00 ms | 8885
Total Time: 8389.00 ms
==============================================================================
Modified TJclWideRegEx no Study no JIT:
/Twain/ : 10.00 ms | 811
/(?i)Twain/ : 43.00 ms | 965
/[a-z]shing/ : 341.00 ms | 1540
/Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 383.00 ms | 262
/\b\w+nn\b/ : 500.00 ms | 262
/[a-q][^u-z]{13}x/ : 659.00 ms | 4094
/Tom|Sawyer|Huckleberry|Finn/ : 716.00 ms | 2598
/(?i)Tom|Sawyer|Huckleberry|Finn/ : 984.00 ms | 4152
/.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 2769.00 ms | 2598
/.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 3130.00 ms | 1976
/Tom.{10,25}river|river.{10,25}Tom/ : 409.00 ms | 2
/[a-zA-Z]+ing/ : 845.00 ms | 78423
/\s[a-zA-Z]{0,12}ing\s/ : 501.00 ms | 55248
/([A-Za-z]awyer|[A-Za-z]inn)\s/ : 815.00 ms | 209
/["'][^"']{0,30}[?!\.]["']/ : 281.00 ms | 8885
Total Time: 12411.00 ms
============================================================================== Python Regular Expressions
Twain time: 0.00505 found mathces: 811
(?i)Twain time: 0.209 found mathces: 965
[a-z]shing time: 0.24 found mathces: 1540
Huck[a-zA-Z]+|Saw[a-zA-Z]+ time: 0.0807 found mathces: 262
\b\w+nn\b time: 0.513 found mathces: 262
[a-q][^u-z]{13}x time: 0.685 found mathces: 4081
Tom|Sawyer|Huckleberry|Finn time: 0.0715 found mathces: 2598
(?i)Tom|Sawyer|Huckleberry|Finn time: 0.93 found mathces: 4152
.{0,2}(Tom|Sawyer|Huckleberry|Finn) time: 0.957 found mathces: 2598
.{2,4}(Tom|Sawyer|Huckleberry|Finn) time: 0.935 found mathces: 1976
Tom.{10,25}river|river.{10,25}Tom time: 0.0922 found mathces: 2
[a-zA-Z]+ing time: 0.52 found mathces: 78423
\s[a-zA-Z]{0,12}ing\s time: 0.299 found mathces: 55201
([A-Za-z]awyer|[A-Za-z]inn)\s time: 0.358 found mathces: 209
["][^"]{0,30}[?!\.]["] time: 0.0175 found mathces: 5261
total time: 5910 ms
Time | Match count
==============================================================================
FLRE:
/Twain/ : 7.83 ms | 811
/(?i)Twain/ : 4.78 ms | 965
/[a-z]shing/ : 8.95 ms | 1540
/Huck[a-zA-Z]+|Saw[a-zA-Z]+/ : 8.14 ms | 262
/\b\w+nn\b/ : 50.53 ms | 262
/[a-q][^u-z]{13}x/ : 94.96 ms | 4094
/Tom|Sawyer|Huckleberry|Finn/ : 12.24 ms | 2598
/(?i)Tom|Sawyer|Huckleberry|Finn/ : 21.34 ms | 4152
/.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ : 47.25 ms | 2598
/.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ : 47.37 ms | 1976
/Tom.{10,25}river|river.{10,25}Tom/ : 46.06 ms | 2
/[a-zA-Z]+ing/ : 60.95 ms | 78423
/\s[a-zA-Z]{0,12}ing\s/ : 55.11 ms | 55248
/([A-Za-z]awyer|[A-Za-z]inn)\s/ : 10.76 ms | 209
/["'][^"']{0,30}[?!\.]["']/ : 49.21 ms | 8885
Total time: 542.93 ms
And the winner by a big margin is FLRE!
Notes:
- The time to read the file is not included in the timings.
- PCRE with JIT is also quite impressive. In other test it was found to compete well with the engines released by Intel and Google. However I found the JIT is very buggy (crashes often) and limited (works only with very few options).
No comments:
Post a Comment