tt_products & cooluri - Probleme mit URL


  • 1
  • flotisso flotisso
    T3PO
    0 x
    14 Beiträge
    0 Hilfreiche Beiträge
    11. 11. 2011, 08:46

    Hier ein Auszug aus blooberry.com

    1. RFC 1738: Uniform Resource Locators (URL) specification
    2. --------------------------------------------------------------------------------
    3. The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set:
    4. "...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
    5. HTML, on the other hand, allows the entire range of the ISO-8859-1 (ISO-Latin) character set to be used in documents - and HTML4 expands the allowable range to include all of the Unicode character set as well. In the case of non-ISO-8859-1 characters (characters above FF hex/255 decimal in the Unicode set), they just can not be used in URLs, because there is no safe way to specify character set information in the URL content yet [RFC2396.]
    6.  
    7. URLs should be encoded everywhere in an HTML document that a URL is referenced to import an object (A, APPLET, AREA, BASE, BGSOUND, BODY, EMBED, FORM, FRAME, IFRAME, ILAYER, IMG, ISINDEX, INPUT, LAYER, LINK, OBJECT, SCRIPT, SOUND, TABLE, TD, TH, and TR elements.)
    8.  
    9. What characters need to be encoded and why?
    10. --------------------------------------------------------------------------------
    11. ASCII Control characters
    12. Why: These characters are not printable.
    13. Characters: Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)
    14. Non-ASCII characters
    15. Why: These are by definition not legal in URLs since they are not in the ASCII set.
    16. Characters: Includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.)
    17. "Reserved characters"
    18. Why: URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
    19. Characters: Character Code
    20. Points
    21. (Hex) Code
    22. Points
    23. (Dec)
    24. Dollar ("$")
    25. Ampersand ("&")
    26. Plus ("+")
    27. Comma (",")
    28. Forward slash/Virgule ("/")
    29. Colon (":")
    30. Semi-colon (";")
    31. Equals ("=")
    32. Question mark ("?")
    33. 'At' symbol ("@")
    34. 24
    35. 26
    36. 2B
    37. 2C
    38. 2F
    39. 3A
    40. 3B
    41. 3D
    42. 3F
    43. 40 36
    44. 38
    45. 43
    46. 44
    47. 47
    48. 58
    49. 59
    50. 61
    51. 63
    52. 64
    53.  
    54. "Unsafe characters"
    55. Why: Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.
    56. Characters: Character Code
    57. Points
    58. (Hex) Code
    59. Points
    60. (Dec) Why encode?
    61. Space 20 32 Significant sequences of spaces may be lost in some uses (especially multiple spaces)
    62. Quotation marks
    63. 'Less Than' symbol ("<")
    64. 'Greater Than' symbol (">") 22
    65. 3C
    66. 3E 34
    67. 60
    68. 62 These characters are often used to delimit URLs in plain text.
    69. 'Pound' character ("#") 23 35 This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
    70. Percent character ("%") 25 37 This is used to URL encode/escape other characters, so it should itself also be encoded.
    71. Misc. characters:
    72. Left Curly Brace ("{")
    73. Right Curly Brace ("}")
    74. Vertical Bar/Pipe ("|")
    75. Backslash ("\")
    76. Caret ("^")
    77. Tilde ("~")
    78. Left Square Bracket ("[")
    79. Right Square Bracket ("]")
    80. Grave Accent ("`")
    81. 7B
    82. 7D
    83. 7C
    84. 5C
    85. 5E
    86. 7E
    87. 5B
    88. 5D
    89. 60
    90. 123
    91. 125
    92. 124
    93. 92
    94. 94
    95. 126
    96. 91
    97. 93
    98. 96 Some systems can possibly modify these characters.
    99.  
    100.  
    101.  
    102. How are characters URL encoded?
    103. --------------------------------------------------------------------------------
    104. URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
    105. Example
    106. Space = decimal code point 32 in the ISO-Latin set.
    107. 32 decimal = 20 in hexadecimal
    108. The URL encoded representation will be "%20"

  • 1