URL and URI

url encoding/decoding

URLEncoder/URLDecoder classes are used to encode or decode strings that are used in query string for GET requests or POST requests with the "application/x-www-form-urlencoded" MIME format. When encoding a String, the following rules apply:

  • the alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same
  • the special characters ".", "-", "*", and "_" remain the same
  • the space character is converted into a plus sign
  • all other characters are represented as %xy, where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8.

And finally, make sure to encode whitespace using "+" or "%20" in the query string, and using "%20" within the rest of the URL.

// prepare query string
StringJoiner sj = new StringJoiner("&");
for (Map.Entry<String, String> entry : params.entrySet()) {
    sj.add(URLEncoder.encode(entry.getKey(), "UTF-8") + "="
        + URLEncoder.encode(entry.getValue(), "UTF-8"));
}
    
// make something like http://example.com?param1=blabla&param2=34
String url = urlBase + "?" + sj.toString; 
// "http://example.com/with spaces/ã.jpg?param=Ы Ю" URI uri = URI( "http", "example.com", "/with spaces/ã.jpg", "param=Ы Ю", null); //: http://example.com/with%20spaces/%C3%A3.jpg?param=%D0%AB%20%D0%AE System.out.println(uri.toASCIIString()); //: http://example.com/with%20spaces/ã.jpg?param=Ы%20Ю System.out.println(uri.toUrl()); // or uri.toString //: http%3A%2F%2Fexample.com%2Fwith+spaces%2F%C3%A3.jpg%3Fparam%3D%D0%AB+%D0%AE System.out.println( URLEncoder.encode( "http://example.com/with spaces/ã.jpg?param=Ы Ю", "UTF-8");

URL

The URL class represents a URL. The main functionality is access to the resource and getting various parts of the url.

method description
openConnection() Returns a URLConnection instance that represents a connection to the remote object referred to by the URL. Don't be confused, this is not an actual connection. The connection will be established after calling URLConnection.connect().
openConnection(proxy) Same as openConnection(), except that the connection will be made through the specified proxy if possible.
openStream() Opens a connection to this URL and returns an InputStream for reading from that connection. This method is a shorthand for openConnection().getInputStream().
toURI() Returns a URI equivalent to this URL.
getAuthority() Gets the authority part of this URL.
getContent() Gets the contents of this URL. This method is a shorthand for: openConnection().getContent().
getFile() Gets the file name of this URL. The returned file portion will be same as getPath(), plus value of getQuery(), if any.
getHost() Gets the host name of this URL.
getPath() Gets the path part of this URL.
getPort() Gets the port number of this URL. Returns -1 if the port is not set.
getProtocol() Gets the protocol name of this URL.
getQuery() Returns the query part of this URL, or null if it does not exist.
getPath() Returns the path part of this URL, or an empty string if it does not exist.
getUserInfo() Returns the userInfo part of this URL, or null if it does not exist.
getRef() Returns the anchor of this URL, or null if it does not exist.

URI

The URI class represents a URI.

When you create URI from components, components will be encoded automatically.

When you create a URI by parsing a string, an exception may be thrown if the uri string contains an invalid character.

// url contains invalid characters
String strUrl = "http://example.com/with spaces/ã.jpg?param=Ы Ю";
URL url = URL(strUrl);  

URI uri1 = URI(strUrl); // exception
URI uri2 = URI.create(strUrl); // exception
URI uri3 = url.toURI(); // exception

// ok
URI uri = URI("http", "example.com", "/with spaces/ã.jpg", "param=Ы Ю", null);
URI uri4 = URI(url.getProtocol(), url.getUserInfo(), url.getHost(), 
               url.getPort(), url.getPath(), url.getQuery(), 
               url.getRef()); 

The main functionality of the URI is getting various parts of the uri like the URL class and transformation of uri (see below in table).

method description
normalize() Normalizes this URI's path
  1. All "." segments are removed
  2. If a ".." segment is preceded by a non-".." segment then both of these segments are removed. This step is repeated until it is no longer applicable.
  3. If the path is relative, and if its first segment contains a colon character (':'), then a "." segment is prepended. This prevents a relative URI with a path such as "a:b/c/d" from later being re-parsed as an opaque URI with a scheme of "a" and a scheme-specific part of "b/c/d".
A normalized path will begin with one or more ".." segments if there were insufficient non-".." segments preceding them to allow their removal.
resolve(uri) Resolves the given URI against this URI:
  1. A new URI is constructed with this URI's scheme and the given URI's query and fragment components.
  2. If the given URI has an authority component then the new URI's authority and path are taken from the given URI.
  3. Otherwise the new URI's authority component is copied from this URI, and its path is computed as follows:
    • If the given URI's path is absolute then the new URI's path is taken from the given URI.
    • Otherwise the given URI's path is relative, and so the new URI's path is computed by resolving the path of the given URI against the path of this URI. This is done by concatenating all but the last segment of this URI's path, if any, with the given URI's path and then normalizing the result as if by invoking the normalize method.
If the given URI is already absolute, or if this URI is opaque, then the given URI is returned.
relativize(uri) Relativizes the given URI against this URI:
  1. If either this URI or the given URI are opaque, or if the scheme and authority components of the two URIs are not identical, or if the path of this URI is not a prefix of the path of the given URI, then the given URI is returned.
  2. Otherwise a new relative hierarchical URI is constructed with query and fragment components taken from the given URI and with a path component computed by removing this URI's path from the beginning of the given URI's path.