3 About validateUrlSyntax():
4 This function will verify if a http URL is formatted properly, returning
5 either with true or false.
7 I used rfc #2396 URI: Generic Syntax as my guide when creating the
8 regular expression. For all the details see the comments below.
12 validateUrlSyntax( url_to_check[, options])
14 url_to_check - string - The url to check
16 options - string - A optional string of options to set which parts of
17 the url are required, optional, or not allowed. Each option
18 must be followed by a "+" for required, "?" for optional, or
21 s - Scheme. Allows "+?-", defaults to "s?"
22 H - http:// Allows "+?-", defaults to "H?"
23 S - https:// (SSL). Allows "+?-", defaults to "S?"
24 E - mailto: (email). Allows "+?-", defaults to "E-"
25 F - ftp:// Allows "+?-", defaults to "F-"
26 Dependant on scheme being enabled
27 u - User section. Allows "+?-", defaults to "u?"
28 P - Password in user section. Allows "+?-", defaults to "P?"
29 Dependant on user section being enabled
30 a - Address (ip or domain). Allows "+?-", defaults to "a+"
31 I - Ip address. Allows "+?-", defaults to "I?"
32 If I+, then domains are disabled
33 If I-, then domains are required
34 Dependant on address being enabled
35 p - Port number. Allows "+?-", defaults to "p?"
36 f - File path. Allows "+?-", defaults to "f?"
37 q - Query section. Allows "+?-", defaults to "q?"
38 r - Fragment (anchor). Allows "+?-", defaults to "r?"
40 Paste the funtion code, or include_once() this template at the top of the page
41 you wish to use this function.
45 validateUrlSyntax('http://george@www.cnn.com/#top')
47 validateUrlSyntax('https://games.yahoo.com:8080/board/chess.htm?move=true')
49 validateUrlSyntax('http://www.hotmail.com/', 's+u-I-p-q-r-')
51 validateUrlSyntax('/directory/file.php#top', 's-u-a-p-f+')
54 if (validateUrlSyntax('http://www.canowhoopass.com/', 'u-'))
56 echo 'URL SYNTAX IS VERIFIED';
58 echo 'URL SYNTAX IS ILLEGAL';
68 -Added new TLD's - .jobs, .mobi, .post and .travel. They are official, but not yet active.
71 -Fixed bug allowing empty username even when it was required
72 -Changed and added a few options to add extra schemes
73 -Added mailto: ftp:// and http:// options
74 -https option was 'l' now it is 'S' (capital)
75 -Added password option. Now passwords can be disabled while usernames are ok (for email)
76 -IP Address option was 'i' now it is 'I' (capital)
77 -Options are now case sensitive
78 -Added validateEmailSyntax() and validateFtpSyntax() functions below<br>
81 -IP group range is more specific. Used to allow 0-299. Now it is 0-255
82 -Port range more specific. Used to allow 0-69999. Now it is 0-65535<br>
83 -Fixed bug disallowing 'i-' option.<br>
84 -Changed license to GPL
87 -Fixed bug disallowing 'l-' option. Thanks Dr. Cheap
90 -Added options parameter to make it easier for people to plug the function in
91 without needed to rework the code.
92 -Split the example application away from the function
97 -Easier to disable sections
98 -Easier to port to other languages
99 -Easier to port to verify email addresses
100 -Uses only simple regular expressions so it is more portable
101 -Follows RFC closer for domain names. Some "play" domains may break
102 -Renamed from 'verifyUrl()' to 'validateUrlSyntax()'
103 -Removed extra code which added 'http://' and trailing '/' if it was missing
104 -That code was better suited for a massaging function, not verifying
106 -Now splits up and forces '/path?query#fragment' order
107 -No longer requires a path when using a query or fragment
110 -Allowed port numbers above 9999. Now allows up to 69999
113 -Added new top level domains
114 -aero, coop, museum, name, info, biz, pro
120 Intentional Limitations:
121 -Does not verify url actually exists. Only validates the syntax
122 -Strictly follows the RFC standards. Some urls exist in the wild which will
123 not validate. Including ones with square brackets in the query section '[]'
131 Rod Apeldoorn - rod(at)canowhoopass(dot)com
135 http://www.canowhoopass.com/
139 -WEAV -Several members of Weav helped to test - http://weav.bc.ca/
140 -There were also a number of emails from other developers expressing
141 thanks and suggestions. It is nice to be appreciated. Thanks!
145 Copyright 2004, Rod Apeldoorn
147 This program is free software; you can redistribute it and/or modify
148 it under the terms of the GNU General Public License as published by
149 the Free Software Foundation; either version 2 of the License, or (at
150 your option) any later version.
152 This program is distributed in the hope that it will be useful, but
153 WITHOUT ANY WARRANTY; without even the implied warranty of
154 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
155 General Public License for more details.
157 You should have received a copy of the GNU General Public License along
158 with this program; if not, write to the Free Software Foundation, Inc.,
159 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
161 To view the license online, go to: http://www.gnu.org/copyleft/gpl.html
164 Alternate Commercial Licenses:
165 For information in regards to alternate licensing, contact me.
169 // BEGINNING OF validateUrlSyntax() function
170 function validateUrlSyntax( $urladdr, $options="" ){
172 // Force Options parameter to be lower case
173 // DISABLED PERMAMENTLY - OK to remove from code
174 // $options = strtolower($options);
176 // Check Options Parameter
177 if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
179 trigger_error("Options attribute malformed", E_USER_ERROR
);
182 // Set Options Array, set defaults if options are not specified
184 if (strpos( $options, 's') === false) $aOptions['s'] = '?';
185 else $aOptions['s'] = substr( $options, strpos( $options, 's') +
1, 1);
187 if (strpos( $options, 'H') === false) $aOptions['H'] = '?';
188 else $aOptions['H'] = substr( $options, strpos( $options, 'H') +
1, 1);
190 if (strpos( $options, 'S') === false) $aOptions['S'] = '?';
191 else $aOptions['S'] = substr( $options, strpos( $options, 'S') +
1, 1);
193 if (strpos( $options, 'E') === false) $aOptions['E'] = '-';
194 else $aOptions['E'] = substr( $options, strpos( $options, 'E') +
1, 1);
196 if (strpos( $options, 'F') === false) $aOptions['F'] = '-';
197 else $aOptions['F'] = substr( $options, strpos( $options, 'F') +
1, 1);
199 if (strpos( $options, 'u') === false) $aOptions['u'] = '?';
200 else $aOptions['u'] = substr( $options, strpos( $options, 'u') +
1, 1);
201 // Password in user section
202 if (strpos( $options, 'P') === false) $aOptions['P'] = '?';
203 else $aOptions['P'] = substr( $options, strpos( $options, 'P') +
1, 1);
205 if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
206 else $aOptions['a'] = substr( $options, strpos( $options, 'a') +
1, 1);
207 // IP Address in address section
208 if (strpos( $options, 'I') === false) $aOptions['I'] = '?';
209 else $aOptions['I'] = substr( $options, strpos( $options, 'I') +
1, 1);
211 if (strpos( $options, 'p') === false) $aOptions['p'] = '?';
212 else $aOptions['p'] = substr( $options, strpos( $options, 'p') +
1, 1);
214 if (strpos( $options, 'f') === false) $aOptions['f'] = '?';
215 else $aOptions['f'] = substr( $options, strpos( $options, 'f') +
1, 1);
217 if (strpos( $options, 'q') === false) $aOptions['q'] = '?';
218 else $aOptions['q'] = substr( $options, strpos( $options, 'q') +
1, 1);
220 if (strpos( $options, 'r') === false) $aOptions['r'] = '?';
221 else $aOptions['r'] = substr( $options, strpos( $options, 'r') +
1, 1);
224 // Loop through options array, to search for and replace "-" to "{0}" and "+" to ""
225 foreach($aOptions as $key => $value)
229 $aOptions[$key] = '{0}';
233 $aOptions[$key] = '';
237 // DEBUGGING - Unescape following line to display to screen current option values
238 // echo '<pre>'; print_r($aOptions); echo '</pre>';
241 // Preset Allowed Characters
242 $alphanum = '[a-zA-Z0-9]'; // Alpha Numeric
243 $unreserved = '[a-zA-Z0-9_.!~*' . '\'' . '()-]';
244 $escaped = '(%[0-9a-fA-F]{2})'; // Escape sequence - In Hex - %6d would be a 'm'
245 $reserved = '[;/?:@&=+$,]'; // Special characters in the URI
247 // Beginning Regular Expression
248 // Scheme - Allows for 'http://', 'https://', 'mailto:', or 'ftp://'
250 if ($aOptions['H'] === '') { $scheme .= 'http://'; }
251 elseif ($aOptions['S'] === '') { $scheme .= 'https://'; }
252 elseif ($aOptions['E'] === '') { $scheme .= 'mailto:'; }
253 elseif ($aOptions['F'] === '') { $scheme .= 'ftp://'; }
256 if ($aOptions['H'] === '?') { $scheme .= '|(http://)'; }
257 if ($aOptions['S'] === '?') { $scheme .= '|(https://)'; }
258 if ($aOptions['E'] === '?') { $scheme .= '|(mailto:)'; }
259 if ($aOptions['F'] === '?') { $scheme .= '|(ftp://)'; }
260 $scheme = str_replace('(|', '(', $scheme); // fix first pipe
262 $scheme .= ')' . $aOptions['s'];
263 // End setting scheme
265 // User Info - Allows for 'username@' or 'username:password@'. Note: contrary to rfc, I removed ':' from username section, allowing it only in password.
266 // /---------------- Username -----------------------\ /-------------------------------- Password ------------------------------\
267 $userinfo = '((' . $unreserved . '|' . $escaped . '|[;&=+$,]' . ')+(:(' . $unreserved . '|' . $escaped . '|[;:&=+$,]' . ')+)' . $aOptions['P'] . '@)' . $aOptions['u'];
269 // IP ADDRESS - Allows 0.0.0.0 to 255.255.255.255
270 $ipaddress = '((((2(([0-4][0-9])|(5[0-5])))|([01]?[0-9]?[0-9]))\.){3}((2(([0-4][0-9])|(5[0-5])))|([01]?[0-9]?[0-9])))';
272 // Tertiary Domain(s) - Optional - Multi - Although some sites may use other characters, the RFC says tertiary domains have the same naming restrictions as second level domains
273 $domain_tertiary = '(' . $alphanum . '(([a-zA-Z0-9-]{0,62})' . $alphanum . ')?\.)*';
275 // Second Level Domain - Required - First and last characters must be Alpha-numeric. Hyphens are allowed inside.
276 $domain_secondary = '(' . $alphanum . '(([a-zA-Z0-9-]{0,62})' . $alphanum . ')?\.)';
278 // we want more relaxed URLs in Moodle: MDL-11462
279 // This regex is disabled on purpose in favour of the more exact version below
280 // Top Level Domain - First character must be Alpha. Last character must be AlphaNumeric. Hyphens are allowed inside.
281 $domain_toplevel = '([a-zA-Z](([a-zA-Z0-9-]*)[a-zA-Z0-9])?)';
284 /* // Top Level Domain - Required - Domain List Current As Of December 2004. Use above escaped line to be forgiving of possible future TLD's
285 $domain_toplevel = '(aero|biz|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|post|pro|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|az|ax|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)';
288 // Address can be IP address or Domain
289 if ($aOptions['I'] === '{0}') { // IP Address Not Allowed
290 $address = '(' . $domain_tertiary . $domain_secondary . $domain_toplevel . ')';
291 } elseif ($aOptions['I'] === '') { // IP Address Required
292 $address = '(' . $ipaddress . ')';
293 } else { // IP Address Optional
294 $address = '((' . $ipaddress . ')|(' . $domain_tertiary . $domain_secondary . $domain_toplevel . '))';
296 $address = $address . $aOptions['a'];
298 // Port Number - :80 or :8080 or :65534 Allows range of :0 to :65535
299 // (0-59999) |(60000-64999) |(65000-65499) |(65500-65529) |(65530-65535)
300 $port_number = '(:(([0-5]?[0-9]{1,4})|(6[0-4][0-9]{3})|(65[0-4][0-9]{2})|(655[0-2][0-9])|(6553[0-5])))' . $aOptions['p'];
302 // Path - Can be as simple as '/' or have multiple folders and filenames
303 $path = '(/((;)?(' . $unreserved . '|' . $escaped . '|' . '[:@&=+$,]' . ')+(/)?)*)' . $aOptions['f'];
305 // Query Section - Accepts ?var1=value1&var2=value2 or ?2393,1221 and much more
306 $querystring = '(\?(' . $reserved . '|' . $unreserved . '|' . $escaped . ')*)' . $aOptions['q'];
308 // Fragment Section - Accepts anchors such as #top
309 $fragment = '(#(' . $reserved . '|' . $unreserved . '|' . $escaped . ')*)' . $aOptions['r'];
312 // Building Regular Expression
313 $regexp = '^' . $scheme . $userinfo . $address . $port_number . $path . $querystring . $fragment . '$';
315 // DEBUGGING - Uncomment Line Below To Display The Regular Expression Built
316 // echo '<pre>' . htmlentities(wordwrap($regexp,70,"\n",1)) . '</pre>';
318 // Running the regular expression
319 if (eregi( $regexp, $urladdr ))
321 return true; // The domain passed
325 return false; // The domain didn't pass the expression
328 } // END Function validateUrlSyntax()
333 About ValidateEmailSyntax():
334 This function uses the ValidateUrlSyntax() function to easily check the
335 syntax of an email address. It accepts the same options as ValidateURLSyntax
336 but defaults them for email addresses.
340 validateEmailSyntax( url_to_check[, options])
342 url_to_check - string - The url to check
344 options - string - A optional string of options to set which parts of
345 the url are required, optional, or not allowed. Each option
346 must be followed by a "+" for required, "?" for optional, or
347 "-" for not allowed. See ValidateUrlSyntax() docs for option list.
349 The default options are changed to:
350 s-H-S-E+F-u+P-a+I-p-f-q-r-
352 This only allows an address of "name@domain".
355 validateEmailSyntax('george@fakemail.com')
356 validateEmailSyntax('mailto:george@fakemail.com', 's+')
357 validateEmailSyntax('george@fakemail.com?subject=Hi%20George', 'q?')
358 validateEmailSyntax('george@212.198.33.12', 'I?')
363 Rod Apeldoorn - rod(at)canowhoopass(dot)com
367 http://www.canowhoopass.com/
371 Copyright 2004 - Rod Apeldoorn
373 Released under same license as validateUrlSyntax(). For details, contact me.
378 function validateEmailSyntax( $emailaddr, $options="" ){
380 // Check Options Parameter
381 if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
383 trigger_error("Options attribute malformed", E_USER_ERROR
);
386 // Set Options Array, set defaults if options are not specified
388 if (strpos( $options, 's') === false) $aOptions['s'] = '-';
389 else $aOptions['s'] = substr( $options, strpos( $options, 's') +
1, 1);
391 if (strpos( $options, 'H') === false) $aOptions['H'] = '-';
392 else $aOptions['H'] = substr( $options, strpos( $options, 'H') +
1, 1);
394 if (strpos( $options, 'S') === false) $aOptions['S'] = '-';
395 else $aOptions['S'] = substr( $options, strpos( $options, 'S') +
1, 1);
397 if (strpos( $options, 'E') === false) $aOptions['E'] = '?';
398 else $aOptions['E'] = substr( $options, strpos( $options, 'E') +
1, 1);
400 if (strpos( $options, 'F') === false) $aOptions['F'] = '-';
401 else $aOptions['F'] = substr( $options, strpos( $options, 'F') +
1, 1);
403 if (strpos( $options, 'u') === false) $aOptions['u'] = '+';
404 else $aOptions['u'] = substr( $options, strpos( $options, 'u') +
1, 1);
405 // Password in user section
406 if (strpos( $options, 'P') === false) $aOptions['P'] = '-';
407 else $aOptions['P'] = substr( $options, strpos( $options, 'P') +
1, 1);
409 if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
410 else $aOptions['a'] = substr( $options, strpos( $options, 'a') +
1, 1);
411 // IP Address in address section
412 if (strpos( $options, 'I') === false) $aOptions['I'] = '-';
413 else $aOptions['I'] = substr( $options, strpos( $options, 'I') +
1, 1);
415 if (strpos( $options, 'p') === false) $aOptions['p'] = '-';
416 else $aOptions['p'] = substr( $options, strpos( $options, 'p') +
1, 1);
418 if (strpos( $options, 'f') === false) $aOptions['f'] = '-';
419 else $aOptions['f'] = substr( $options, strpos( $options, 'f') +
1, 1);
421 if (strpos( $options, 'q') === false) $aOptions['q'] = '-';
422 else $aOptions['q'] = substr( $options, strpos( $options, 'q') +
1, 1);
424 if (strpos( $options, 'r') === false) $aOptions['r'] = '-';
425 else $aOptions['r'] = substr( $options, strpos( $options, 'r') +
1, 1);
429 foreach($aOptions as $key => $value)
431 $newoptions .= $key . $value;
434 // DEBUGGING - Uncomment line below to display generated options
435 // echo '<pre>' . $newoptions . '</pre>';
437 // Send to validateUrlSyntax() and return result
438 return validateUrlSyntax( $emailaddr, $newoptions);
440 } // END Function validateEmailSyntax()
445 About ValidateFtpSyntax():
446 This function uses the ValidateUrlSyntax() function to easily check the
447 syntax of an FTP address. It accepts the same options as ValidateURLSyntax
448 but defaults them for FTP addresses.
452 validateFtpSyntax( url_to_check[, options])
454 url_to_check - string - The url to check
456 options - string - A optional string of options to set which parts of
457 the url are required, optional, or not allowed. Each option
458 must be followed by a "+" for required, "?" for optional, or
459 "-" for not allowed. See ValidateUrlSyntax() docs for option list.
461 The default options are changed to:
462 s?H-S-E-F+u?P?a+I?p?f?q-r-
465 validateFtpSyntax('ftp://netscape.com')
466 validateFtpSyntax('moz:iesucks@netscape.com')
467 validateFtpSyntax('ftp://netscape.com:2121/browsers/ns7/', 'u-')
471 Rod Apeldoorn - rod(at)canowhoopass(dot)com
475 http://www.canowhoopass.com/
479 Copyright 2004 - Rod Apeldoorn
481 Released under same license as validateUrlSyntax(). For details, contact me.
484 function validateFtpSyntax( $ftpaddr, $options="" ){
486 // Check Options Parameter
487 if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
489 trigger_error("Options attribute malformed", E_USER_ERROR
);
492 // Set Options Array, set defaults if options are not specified
494 if (strpos( $options, 's') === false) $aOptions['s'] = '?';
495 else $aOptions['s'] = substr( $options, strpos( $options, 's') +
1, 1);
497 if (strpos( $options, 'H') === false) $aOptions['H'] = '-';
498 else $aOptions['H'] = substr( $options, strpos( $options, 'H') +
1, 1);
500 if (strpos( $options, 'S') === false) $aOptions['S'] = '-';
501 else $aOptions['S'] = substr( $options, strpos( $options, 'S') +
1, 1);
503 if (strpos( $options, 'E') === false) $aOptions['E'] = '-';
504 else $aOptions['E'] = substr( $options, strpos( $options, 'E') +
1, 1);
506 if (strpos( $options, 'F') === false) $aOptions['F'] = '+';
507 else $aOptions['F'] = substr( $options, strpos( $options, 'F') +
1, 1);
509 if (strpos( $options, 'u') === false) $aOptions['u'] = '?';
510 else $aOptions['u'] = substr( $options, strpos( $options, 'u') +
1, 1);
511 // Password in user section
512 if (strpos( $options, 'P') === false) $aOptions['P'] = '?';
513 else $aOptions['P'] = substr( $options, strpos( $options, 'P') +
1, 1);
515 if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
516 else $aOptions['a'] = substr( $options, strpos( $options, 'a') +
1, 1);
517 // IP Address in address section
518 if (strpos( $options, 'I') === false) $aOptions['I'] = '?';
519 else $aOptions['I'] = substr( $options, strpos( $options, 'I') +
1, 1);
521 if (strpos( $options, 'p') === false) $aOptions['p'] = '?';
522 else $aOptions['p'] = substr( $options, strpos( $options, 'p') +
1, 1);
524 if (strpos( $options, 'f') === false) $aOptions['f'] = '?';
525 else $aOptions['f'] = substr( $options, strpos( $options, 'f') +
1, 1);
527 if (strpos( $options, 'q') === false) $aOptions['q'] = '-';
528 else $aOptions['q'] = substr( $options, strpos( $options, 'q') +
1, 1);
530 if (strpos( $options, 'r') === false) $aOptions['r'] = '-';
531 else $aOptions['r'] = substr( $options, strpos( $options, 'r') +
1, 1);
535 foreach($aOptions as $key => $value)
537 $newoptions .= $key . $value;
540 // DEBUGGING - Uncomment line below to display generated options
541 // echo '<pre>' . $newoptions . '</pre>';
543 // Send to validateUrlSyntax() and return result
544 return validateUrlSyntax( $ftpaddr, $newoptions);
546 } // END Function validateFtpSyntax()