User:MarkMYoung/regular expressions

Useful Regular Expressions

There are too many places with incomplete or incorrect regular expressions scattered on the Internet and books are reluctant to list them because the author would likely have to compose an errata at some point. So, I am compiling a list of regular expressions (although these may also be incorrect, they are at least in one place). One decent source is the Regexp::Common module available from CPAN. However, I was prompted to maintain this page when I discovered Regexp::Common::net's (v2.120) regular expression for a decimal IPv4 address unit of (?k:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}) was incorrect because it would accept '05' as a decimal IP unit (which is octal) and it does not have an IPv6 regular expression.

CSV

my $CSV_REGEXP = qr/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/;
my $tsv = join( "\t", split( $CSV_REGEXP, $csv ));

This does not remove the double-quotes which are now superfluous.

Decimal Number

my $DECIMAL_REGEXP = qr/^([-+]?(?:(?:\d+\.?\d*)|(?:\d*\.?\d+)))$/;

Domain Name

This regular expression merely ensures the domain only contains valid characters and checks for constituent domain length between 1 and 63.
my $DOMAINNAME_CHARSET_REGEXP = qr/[\w\-]/;
my $DOMAINNAME_UQ_REGEXP = qr/(?:(?:$DOMAINNAME_CHARSET_REGEXP){1,63})(?:\.(?:$DOMAINNAME_CHARSET_REGEXP){1,63})*/;

One can either use the specific or more general top-level domain regular expression.
my $DOMAINNAME_TLD_ENUM_REGEXP = qr/(?:\.[a-zA-Z]{2}|(?i:aero|biz|com|gov|info|jobs|museum|name|net|org))/i;
my $DOMAINNAME_TLD_REGEXP = qr/(?:\.[a-zA-Z]{2,6})/;
my $DOMAINNAME_FQ_REGEXP = $DOMAINNAME_UQ_REGEXP . $DOMAINNAME_TLD_REGEXP;

This regular expression excludes hyphens at the beginning, after a dot, consecutively, before a dot, and at the end.
my $DOMAINNAME_MISPLACED_HYPHENS_REGEXP = qr/(?:\A\-)|(?:\.\-)|(?:\-\-)|(?:\-\.)|(?:\-\z)/;

Keep in mind that something as simple as the word "a" or the text "0.7-1.2" matches as an unqualified hostname. So, this regular expression is good for validation, but not for searching.
my $domainLength_i = length( $hostName_str );
my $isValidDomainLength_b = (($domainLength_i >= 1) && ($domainLength_i <= 255));
my $isUqDomainName_b = ($isValidDomainLength_b && ($hostName_str !~ $DOMAINNAME_MISPLACED_HYPHENS_REGEXP) && ($hostName_str =~ $DOMAINNAME_UQ_REGEXP));

This is much better suited for searching.
my $isFqDomainName_b = ($isUqDomainName_b && ($hostName_str =~ $DOMAINNAME_TLD_REGEXP));

Here is a reasonable one-line regular expression that does not check for overall length greater than 255 or misplaced hyphens.
my $DOMAINNAME_FQ_REASONABLE_REGEXP = qr/(?:[\w\-]{1,63})(?:\.[\w\-]{1,63})*(?:\.[a-zA-Z]{2,6})/;

E-Mail Address / Username

my $USERNAME_CHARSET_REGEXP = qr/[\w\!\#\$\%\&\'\`\*\+\/\=\?\^\{\|\}\~\-]/;
my $USERNAME_REASONABLE_REGEXP = qr/(?:$USERNAME_CHARSET_REGEXP)+(?:\.(?:$USERNAME_CHARSET_REGEXP)+)*/;
my $EMAIL_REASONABLE_REGEXP = $USERNAME_REASONABLE_REGEXP . qr/\@/ . $HOSTNAME_FQ_REGEXP;

IP Address

my $IP4_REGEXP = qr/(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])(?:\.\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]){3}/;
my $IP6_REGEXP = qr/(?:[\dA-Fa-f]{1,4})(?:\:[\dA-Fa-f]{0,4}){6}(?:\:[\dA-Fa-f]{1,4})/;
my $IP_REGEXP = $IP4_REGEXP . qr/|/ . $IP6_REGEXP;

MAC Address

my $MAC_REGEXP = qr/(?:[0-9a-fA-F]{1,2}){6}/;

URI

my $URI_REGEXP = qr/(?:([^:\/?#]+):)?(?:\/\/([^\/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?/;
my($scheme, $authority, $path, $query, $fragment) = $uri =~ m/$URI_REGEXP/;

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.