iogf/crocs
Regex-like and Backus-Naur-like on python classes. A concrete and minimalist parsing library.
repo name | iogf/crocs |
repo link | https://github.com/iogf/crocs |
homepage | |
language | Python |
size (curr.) | 178 kB |
stars (curr.) | 470 |
created | 2017-07-09 |
license | Apache License 2.0 |
crocs
Write regex using pure python class/function syntax and test it better.
The idea behind crocs is simplifying the construction and debugging of regex’s. It is possible to implement regex’s using a function/class syntax, the resulting structure is then compiled into a regex’s string. it is as well possible to generate random inputs for the regex that would match the regex pattern.
The examples below clarifies better.
Wildcard
from crocs.regex import Join, X
e = Join('a', X(), 'b')
e.test()
e.hits()
The above code would give you the regex’s string and also possible matches.
Regex: a.b
Input: aob
Group dict: {}
Group 0: aob
Groups: ()
Match with:
akb a)b aKb aSb atb a{b aTb a!b a&b a7b
A regex can be thought as a sequence of patterns that are joined together. Crocs offers Regex’s operators as Python classes. You reason using these classes specification to implement your desired patterns of search.
Sets
A simple regex sequence would look like:
from crocs.regex import Join, Include, Seq
e = Join('x', Include(Seq('0', '9')))
e.test()
e.hits()
That would give you the possible hits:
Regex: x[0-9]
Input: x0
Group dict: {}
Group 0: x0
Groups: ()
Match with:
x0 x2 x4 x9 x2 x5 x0 x5 x7 x3
Groups
from crocs.regex import Join, Group, X
e = Join('a', Group('b', X()))
e.test()
e.hits()
Would output.
[tau@archlinux demo]$ python group.py
Regex: a(b.)
Input: abH
Group dict: {}
Group 0: abH
Groups: ('bH',)
Match with:
abH abH abH abH abH abH abH abH abH abH
Concrete Example
It solves the problem of catching mails whose domain ends with ‘br’ and the hostname contains ‘python’ in the beginning too. It makes sure that the first letter in the mail name is in the set a-z as well.
from crocs.regex import Seq, Include, Repeat, Join, NamedGroup, Include
# First we define how our patterns look like.
name_valid_letters = Seq('a', 'z')
name_valid_numbers = Seq('0', '9')
name_valid_signs = '_.-'
# The include works sort of Repeat except for one char.
# You can think of it as fetching one from the described sets.
name_valid_chars = Include(name_valid_letters,
name_valid_numbers, name_valid_signs)
# Think of the Repeat class as meaning: fetch the
# described pattern one or more Repeat.
name_chunk = Repeat(name_valid_chars, 1)
# The first letter in the mail name has to be a in 'a-z'.
name_fmt = Join(Include(name_valid_letters), name_chunk)
# Think of group as a way to keep reference
# to the fetched chunk.
name = NamedGroup('name', name_fmt)
# The random's hostname part looks like the name except
# it starts with 'python' in the beginning,
# so we fetch the random chars.
hostname_chars = Include(name_valid_letters)
hostname_chunk = Repeat(hostname_chars, 1)
# We format finally the complete hostname Join.
hostname_fmt = Join('python', hostname_chunk)
# Keep reference for the group.
hostname = NamedGroup('hostname', hostname_fmt)
# Keep reference of the domain chunk.
domain = NamedGroup('domain', 'br')
# Finally we generate the regex and check how it looks like.
match_mail = Join(name, '@', hostname, '.', domain)
match_mail.test()
match_mail.hits()
That would output:
[tau@archlinux demo]$ python xmails.py
Regex: (?P<name>[a-z][a-z0-9_\.\-]{1,})@(?P<hostname>python[a-z]{1,})\.(?P<domain>br)
Input: jd7.gs2@pythontritd.br
Group dict: {'name': 'jd7.gs2', 'hostname': 'pythontritd', 'domain': 'br'}
Group 0: jd7.gs2@pythontritd.br
Groups: ('jd7.gs2', 'pythontritd', 'br')
Match with:
ppm4nh5s@pythong.br xc61_c_qic@pythonvbyzldk.br qpq.63@pythonzwwl.br
t8@pythongfmwhje.br pqf@pythonbofrqbrcfk.br k65vyirxs@pythonttahjeup.br
i.e3ui._@pythonylsg.br m0@pythonubjdm.br ijbf_ktux@pythonhdlh.br rtza45@pythonerypbo.br
Install
Note: Work with python3 only.
pip install crocs