3 About validateUrlSyntax():
4 This function will verify if a http URL is formatted properly, returning
5 either with true or false.
7 I used rfc #2396 URI: Generic Syntax as my guide when creating the
8 regular expression. For all the details see the comments below.
12 validateUrlSyntax( url_to_check[, options])
14 url_to_check - string - The url to check
16 options - string - A optional string of options to set which parts of
17 the url are required, optional, or not allowed. Each option
18 must be followed by a "+" for required, "?" for optional, or
21 s - Scheme. Allows "+?-", defaults to "s?"
22 H - http:// Allows "+?-", defaults to "H?"
23 S - https:// (SSL). Allows "+?-", defaults to "S?"
24 E - mailto: (email). Allows "+?-", defaults to "E-"
25 F - ftp:// Allows "+?-", defaults to "F-"
26 Dependant on scheme being enabled
27 u - User section. Allows "+?-", defaults to "u?"
28 P - Password in user section. Allows "+?-", defaults to "P?"
29 Dependant on user section being enabled
30 a - Address (ip or domain). Allows "+?-", defaults to "a+"
31 I - Ip address. Allows "+?-", defaults to "I?"
32 If I+, then domains are disabled
33 If I-, then domains are required
34 Dependant on address being enabled
35 p - Port number. Allows "+?-", defaults to "p?"
36 f - File path. Allows "+?-", defaults to "f?"
37 q - Query section. Allows "+?-", defaults to "q?"
38 r - Fragment (anchor). Allows "+?-", defaults to "r?"
40 Paste the funtion code, or include_once() this template at the top of the page
41 you wish to use this function.
45 validateUrlSyntax('http://george@www.cnn.com/#top')
47 validateUrlSyntax('https://games.yahoo.com:8080/board/chess.htm?move=true')
49 validateUrlSyntax('http://www.hotmail.com/', 's+u-I-p-q-r-')
51 validateUrlSyntax('/directory/file.php#top', 's-u-a-p-f+')
54 if (validateUrlSyntax('http://www.canowhoopass.com/', 'u-'))
56 echo 'URL SYNTAX IS VERIFIED';
58 echo 'URL SYNTAX IS ILLEGAL';
68 -Added new TLD's - .jobs, .mobi, .post and .travel. They are official, but not yet active.
71 -Fixed bug allowing empty username even when it was required
72 -Changed and added a few options to add extra schemes
73 -Added mailto: ftp:// and http:// options
74 -https option was 'l' now it is 'S' (capital)
75 -Added password option. Now passwords can be disabled while usernames are ok (for email)
76 -IP Address option was 'i' now it is 'I' (capital)
77 -Options are now case sensitive
78 -Added validateEmailSyntax() and validateFtpSyntax() functions below<br>
81 -IP group range is more specific. Used to allow 0-299. Now it is 0-255
82 -Port range more specific. Used to allow 0-69999. Now it is 0-65535<br>
83 -Fixed bug disallowing 'i-' option.<br>
84 -Changed license to GPL
87 -Fixed bug disallowing 'l-' option. Thanks Dr. Cheap
90 -Added options parameter to make it easier for people to plug the function in
91 without needed to rework the code.
92 -Split the example application away from the function
97 -Easier to disable sections
98 -Easier to port to other languages
99 -Easier to port to verify email addresses
100 -Uses only simple regular expressions so it is more portable
101 -Follows RFC closer for domain names. Some "play" domains may break
102 -Renamed from 'verifyUrl()' to 'validateUrlSyntax()'
103 -Removed extra code which added 'http://' and trailing '/' if it was missing
104 -That code was better suited for a massaging function, not verifying
106 -Now splits up and forces '/path?query#fragment' order
107 -No longer requires a path when using a query or fragment
110 -Allowed port numbers above 9999. Now allows up to 69999
113 -Added new top level domains
114 -aero, coop, museum, name, info, biz, pro
120 Intentional Limitations:
121 -Does not verify url actually exists. Only validates the syntax
122 -Strictly follows the RFC standards. Some urls exist in the wild which will
123 not validate. Including ones with square brackets in the query section '[]'
131 Rod Apeldoorn - rod(at)canowhoopass(dot)com
135 http://www.canowhoopass.com/
139 -WEAV -Several members of Weav helped to test - http://weav.bc.ca/
140 -There were also a number of emails from other developers expressing
141 thanks and suggestions. It is nice to be appreciated. Thanks!
145 Copyright 2004, Rod Apeldoorn
147 This program is free software; you can redistribute it and/or modify
148 it under the terms of the GNU General Public License as published by
149 the Free Software Foundation; either version 2 of the License, or (at
150 your option) any later version.
152 This program is distributed in the hope that it will be useful, but
153 WITHOUT ANY WARRANTY; without even the implied warranty of
154 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
155 General Public License for more details.
157 You should have received a copy of the GNU General Public License along
158 with this program; if not, write to the Free Software Foundation, Inc.,
159 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
161 To view the license online, go to: http://www.gnu.org/copyleft/gpl.html
164 Alternate Commercial Licenses:
165 For information in regards to alternate licensing, contact me.
169 // BEGINNING OF validateUrlSyntax() function
170 function validateUrlSyntax( $urladdr, $options="" ){
172 // Force Options parameter to be lower case
173 // DISABLED PERMAMENTLY - OK to remove from code
174 // $options = strtolower($options);
176 // Check Options Parameter
177 if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
179 trigger_error("Options attribute malformed", E_USER_ERROR
);
182 // Set Options Array, set defaults if options are not specified
184 if (strpos( $options, 's') === false) $aOptions['s'] = '?';
185 else $aOptions['s'] = substr( $options, strpos( $options, 's') +
1, 1);
187 if (strpos( $options, 'H') === false) $aOptions['H'] = '?';
188 else $aOptions['H'] = substr( $options, strpos( $options, 'H') +
1, 1);
190 if (strpos( $options, 'S') === false) $aOptions['S'] = '?';
191 else $aOptions['S'] = substr( $options, strpos( $options, 'S') +
1, 1);
193 if (strpos( $options, 'E') === false) $aOptions['E'] = '-';
194 else $aOptions['E'] = substr( $options, strpos( $options, 'E') +
1, 1);
196 if (strpos( $options, 'F') === false) $aOptions['F'] = '-';
197 else $aOptions['F'] = substr( $options, strpos( $options, 'F') +
1, 1);
199 if (strpos( $options, 'u') === false) $aOptions['u'] = '?';
200 else $aOptions['u'] = substr( $options, strpos( $options, 'u') +
1, 1);
201 // Password in user section
202 if (strpos( $options, 'P') === false) $aOptions['P'] = '?';
203 else $aOptions['P'] = substr( $options, strpos( $options, 'P') +
1, 1);
205 if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
206 else $aOptions['a'] = substr( $options, strpos( $options, 'a') +
1, 1);
207 // IP Address in address section
208 if (strpos( $options, 'I') === false) $aOptions['I'] = '?';
209 else $aOptions['I'] = substr( $options, strpos( $options, 'I') +
1, 1);
211 if (strpos( $options, 'p') === false) $aOptions['p'] = '?';
212 else $aOptions['p'] = substr( $options, strpos( $options, 'p') +
1, 1);
214 if (strpos( $options, 'f') === false) $aOptions['f'] = '?';
215 else $aOptions['f'] = substr( $options, strpos( $options, 'f') +
1, 1);
217 if (strpos( $options, 'q') === false) $aOptions['q'] = '?';
218 else $aOptions['q'] = substr( $options, strpos( $options, 'q') +
1, 1);
220 if (strpos( $options, 'r') === false) $aOptions['r'] = '?';
221 else $aOptions['r'] = substr( $options, strpos( $options, 'r') +
1, 1);
224 // Loop through options array, to search for and replace "-" to "{0}" and "+" to ""
225 foreach($aOptions as $key => $value)
229 $aOptions[$key] = '{0}';
233 $aOptions[$key] = '';
237 // DEBUGGING - Unescape following line to display to screen current option values
238 // echo '<pre>'; print_r($aOptions); echo '</pre>';
241 // Preset Allowed Characters
242 $alphanum = '[a-zA-Z0-9]'; // Alpha Numeric
243 $unreserved = '[a-zA-Z0-9_.!~*' . '\'' . '()-]';
244 $escaped = '(%[0-9a-fA-F]{2})'; // Escape sequence - In Hex - %6d would be a 'm'
245 $reserved = '[;/?:@&=+$,]'; // Special characters in the URI
247 // Beginning Regular Expression
248 // Scheme - Allows for 'http://', 'https://', 'mailto:', or 'ftp://'
250 if ($aOptions['H'] === '') { $scheme .= 'http://'; }
251 elseif ($aOptions['S'] === '') { $scheme .= 'https://'; }
252 elseif ($aOptions['E'] === '') { $scheme .= 'mailto:'; }
253 elseif ($aOptions['F'] === '') { $scheme .= 'ftp://'; }
256 if ($aOptions['H'] === '?') { $scheme .= '|(http://)'; }
257 if ($aOptions['S'] === '?') { $scheme .= '|(https://)'; }
258 if ($aOptions['E'] === '?') { $scheme .= '|(mailto:)'; }
259 if ($aOptions['F'] === '?') { $scheme .= '|(ftp://)'; }
260 $scheme = str_replace('(|', '(', $scheme); // fix first pipe
262 $scheme .= ')' . $aOptions['s'];
263 // End setting scheme
265 // User Info - Allows for 'username@' or 'username:password@'. Note: contrary to rfc, I removed ':' from username section, allowing it only in password.
266 // /---------------- Username -----------------------\ /-------------------------------- Password ------------------------------\
267 $userinfo = '((' . $unreserved . '|' . $escaped . '|[;&=+$,]' . ')+(:(' . $unreserved . '|' . $escaped . '|[;:&=+$,]' . ')+)' . $aOptions['P'] . '@)' . $aOptions['u'];
269 // IP ADDRESS - Allows 0.0.0.0 to 255.255.255.255
270 $ipaddress = '((((2(([0-4][0-9])|(5[0-5])))|([01]?[0-9]?[0-9]))\.){3}((2(([0-4][0-9])|(5[0-5])))|([01]?[0-9]?[0-9])))';
272 // Tertiary Domain(s) - Optional - Multi - Although some sites may use other characters, the RFC says tertiary domains have the same naming restrictions as second level domains
273 $domain_tertiary = '(' . $alphanum . '(([a-zA-Z0-9-]{0,62})' . $alphanum . ')?\.)*';
275 // Second Level Domain - Required - First and last characters must be Alpha-numeric. Hyphens are allowed inside.
276 $domain_secondary = '(' . $alphanum . '(([a-zA-Z0-9-]{0,62})' . $alphanum . ')?\.)';
278 /* // This regex is disabled on purpose in favour of the more exact version below
279 // Top Level Domain - First character must be Alpha. Last character must be AlphaNumeric. Hyphens are allowed inside.
280 $domain_toplevel = '([a-zA-Z](([a-zA-Z0-9-]*)[a-zA-Z0-9])?)';
283 // Top Level Domain - Required - Domain List Current As Of December 2004. Use above escaped line to be forgiving of possible future TLD's
284 $domain_toplevel = '(aero|biz|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|post|pro|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|az|ax|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)';
287 // Address can be IP address or Domain
288 if ($aOptions['I'] === '{0}') { // IP Address Not Allowed
289 $address = '(' . $domain_tertiary . $domain_secondary . $domain_toplevel . ')';
290 } elseif ($aOptions['I'] === '') { // IP Address Required
291 $address = '(' . $ipaddress . ')';
292 } else { // IP Address Optional
293 $address = '((' . $ipaddress . ')|(' . $domain_tertiary . $domain_secondary . $domain_toplevel . '))';
295 $address = $address . $aOptions['a'];
297 // Port Number - :80 or :8080 or :65534 Allows range of :0 to :65535
298 // (0-59999) |(60000-64999) |(65000-65499) |(65500-65529) |(65530-65535)
299 $port_number = '(:(([0-5]?[0-9]{1,4})|(6[0-4][0-9]{3})|(65[0-4][0-9]{2})|(655[0-2][0-9])|(6553[0-5])))' . $aOptions['p'];
301 // Path - Can be as simple as '/' or have multiple folders and filenames
302 $path = '(/((;)?(' . $unreserved . '|' . $escaped . '|' . '[:@&=+$,]' . ')+(/)?)*)' . $aOptions['f'];
304 // Query Section - Accepts ?var1=value1&var2=value2 or ?2393,1221 and much more
305 $querystring = '(\?(' . $reserved . '|' . $unreserved . '|' . $escaped . ')*)' . $aOptions['q'];
307 // Fragment Section - Accepts anchors such as #top
308 $fragment = '(#(' . $reserved . '|' . $unreserved . '|' . $escaped . ')*)' . $aOptions['r'];
311 // Building Regular Expression
312 $regexp = '^' . $scheme . $userinfo . $address . $port_number . $path . $querystring . $fragment . '$';
314 // DEBUGGING - Uncomment Line Below To Display The Regular Expression Built
315 // echo '<pre>' . htmlentities(wordwrap($regexp,70,"\n",1)) . '</pre>';
317 // Running the regular expression
318 if (eregi( $regexp, $urladdr ))
320 return true; // The domain passed
324 return false; // The domain didn't pass the expression
327 } // END Function validateUrlSyntax()
332 About ValidateEmailSyntax():
333 This function uses the ValidateUrlSyntax() function to easily check the
334 syntax of an email address. It accepts the same options as ValidateURLSyntax
335 but defaults them for email addresses.
339 validateEmailSyntax( url_to_check[, options])
341 url_to_check - string - The url to check
343 options - string - A optional string of options to set which parts of
344 the url are required, optional, or not allowed. Each option
345 must be followed by a "+" for required, "?" for optional, or
346 "-" for not allowed. See ValidateUrlSyntax() docs for option list.
348 The default options are changed to:
349 s-H-S-E+F-u+P-a+I-p-f-q-r-
351 This only allows an address of "name@domain".
354 validateEmailSyntax('george@fakemail.com')
355 validateEmailSyntax('mailto:george@fakemail.com', 's+')
356 validateEmailSyntax('george@fakemail.com?subject=Hi%20George', 'q?')
357 validateEmailSyntax('george@212.198.33.12', 'I?')
362 Rod Apeldoorn - rod(at)canowhoopass(dot)com
366 http://www.canowhoopass.com/
370 Copyright 2004 - Rod Apeldoorn
372 Released under same license as validateUrlSyntax(). For details, contact me.
377 function validateEmailSyntax( $emailaddr, $options="" ){
379 // Check Options Parameter
380 if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
382 trigger_error("Options attribute malformed", E_USER_ERROR
);
385 // Set Options Array, set defaults if options are not specified
387 if (strpos( $options, 's') === false) $aOptions['s'] = '-';
388 else $aOptions['s'] = substr( $options, strpos( $options, 's') +
1, 1);
390 if (strpos( $options, 'H') === false) $aOptions['H'] = '-';
391 else $aOptions['H'] = substr( $options, strpos( $options, 'H') +
1, 1);
393 if (strpos( $options, 'S') === false) $aOptions['S'] = '-';
394 else $aOptions['S'] = substr( $options, strpos( $options, 'S') +
1, 1);
396 if (strpos( $options, 'E') === false) $aOptions['E'] = '?';
397 else $aOptions['E'] = substr( $options, strpos( $options, 'E') +
1, 1);
399 if (strpos( $options, 'F') === false) $aOptions['F'] = '-';
400 else $aOptions['F'] = substr( $options, strpos( $options, 'F') +
1, 1);
402 if (strpos( $options, 'u') === false) $aOptions['u'] = '+';
403 else $aOptions['u'] = substr( $options, strpos( $options, 'u') +
1, 1);
404 // Password in user section
405 if (strpos( $options, 'P') === false) $aOptions['P'] = '-';
406 else $aOptions['P'] = substr( $options, strpos( $options, 'P') +
1, 1);
408 if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
409 else $aOptions['a'] = substr( $options, strpos( $options, 'a') +
1, 1);
410 // IP Address in address section
411 if (strpos( $options, 'I') === false) $aOptions['I'] = '-';
412 else $aOptions['I'] = substr( $options, strpos( $options, 'I') +
1, 1);
414 if (strpos( $options, 'p') === false) $aOptions['p'] = '-';
415 else $aOptions['p'] = substr( $options, strpos( $options, 'p') +
1, 1);
417 if (strpos( $options, 'f') === false) $aOptions['f'] = '-';
418 else $aOptions['f'] = substr( $options, strpos( $options, 'f') +
1, 1);
420 if (strpos( $options, 'q') === false) $aOptions['q'] = '-';
421 else $aOptions['q'] = substr( $options, strpos( $options, 'q') +
1, 1);
423 if (strpos( $options, 'r') === false) $aOptions['r'] = '-';
424 else $aOptions['r'] = substr( $options, strpos( $options, 'r') +
1, 1);
428 foreach($aOptions as $key => $value)
430 $newoptions .= $key . $value;
433 // DEBUGGING - Uncomment line below to display generated options
434 // echo '<pre>' . $newoptions . '</pre>';
436 // Send to validateUrlSyntax() and return result
437 return validateUrlSyntax( $emailaddr, $newoptions);
439 } // END Function validateEmailSyntax()
444 About ValidateFtpSyntax():
445 This function uses the ValidateUrlSyntax() function to easily check the
446 syntax of an FTP address. It accepts the same options as ValidateURLSyntax
447 but defaults them for FTP addresses.
451 validateFtpSyntax( url_to_check[, options])
453 url_to_check - string - The url to check
455 options - string - A optional string of options to set which parts of
456 the url are required, optional, or not allowed. Each option
457 must be followed by a "+" for required, "?" for optional, or
458 "-" for not allowed. See ValidateUrlSyntax() docs for option list.
460 The default options are changed to:
461 s?H-S-E-F+u?P?a+I?p?f?q-r-
464 validateFtpSyntax('ftp://netscape.com')
465 validateFtpSyntax('moz:iesucks@netscape.com')
466 validateFtpSyntax('ftp://netscape.com:2121/browsers/ns7/', 'u-')
470 Rod Apeldoorn - rod(at)canowhoopass(dot)com
474 http://www.canowhoopass.com/
478 Copyright 2004 - Rod Apeldoorn
480 Released under same license as validateUrlSyntax(). For details, contact me.
483 function validateFtpSyntax( $ftpaddr, $options="" ){
485 // Check Options Parameter
486 if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
488 trigger_error("Options attribute malformed", E_USER_ERROR
);
491 // Set Options Array, set defaults if options are not specified
493 if (strpos( $options, 's') === false) $aOptions['s'] = '?';
494 else $aOptions['s'] = substr( $options, strpos( $options, 's') +
1, 1);
496 if (strpos( $options, 'H') === false) $aOptions['H'] = '-';
497 else $aOptions['H'] = substr( $options, strpos( $options, 'H') +
1, 1);
499 if (strpos( $options, 'S') === false) $aOptions['S'] = '-';
500 else $aOptions['S'] = substr( $options, strpos( $options, 'S') +
1, 1);
502 if (strpos( $options, 'E') === false) $aOptions['E'] = '-';
503 else $aOptions['E'] = substr( $options, strpos( $options, 'E') +
1, 1);
505 if (strpos( $options, 'F') === false) $aOptions['F'] = '+';
506 else $aOptions['F'] = substr( $options, strpos( $options, 'F') +
1, 1);
508 if (strpos( $options, 'u') === false) $aOptions['u'] = '?';
509 else $aOptions['u'] = substr( $options, strpos( $options, 'u') +
1, 1);
510 // Password in user section
511 if (strpos( $options, 'P') === false) $aOptions['P'] = '?';
512 else $aOptions['P'] = substr( $options, strpos( $options, 'P') +
1, 1);
514 if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
515 else $aOptions['a'] = substr( $options, strpos( $options, 'a') +
1, 1);
516 // IP Address in address section
517 if (strpos( $options, 'I') === false) $aOptions['I'] = '?';
518 else $aOptions['I'] = substr( $options, strpos( $options, 'I') +
1, 1);
520 if (strpos( $options, 'p') === false) $aOptions['p'] = '?';
521 else $aOptions['p'] = substr( $options, strpos( $options, 'p') +
1, 1);
523 if (strpos( $options, 'f') === false) $aOptions['f'] = '?';
524 else $aOptions['f'] = substr( $options, strpos( $options, 'f') +
1, 1);
526 if (strpos( $options, 'q') === false) $aOptions['q'] = '-';
527 else $aOptions['q'] = substr( $options, strpos( $options, 'q') +
1, 1);
529 if (strpos( $options, 'r') === false) $aOptions['r'] = '-';
530 else $aOptions['r'] = substr( $options, strpos( $options, 'r') +
1, 1);
534 foreach($aOptions as $key => $value)
536 $newoptions .= $key . $value;
539 // DEBUGGING - Uncomment line below to display generated options
540 // echo '<pre>' . $newoptions . '</pre>';
542 // Send to validateUrlSyntax() and return result
543 return validateUrlSyntax( $ftpaddr, $newoptions);
545 } // END Function validateFtpSyntax()