UrlBuilder

Origin

In JDK, we can use the URL object to format URLs, but it cannot be used to parse and process some special URLs, such as URLs with encoding and non-standard paths and parameters. In the old version of hutool, the standardization of URLs relied entirely on string replacement, which was not only inefficient but also had a complex processing process. Therefore, after version 5.3.1, the UrlBuilder class was added to split each part of the URL for separate processing and formatting to achieve URL standardization.

According to the standard definition of Uniform Resource Identifier, the structure of a URL is as follows:

  • [scheme:]scheme-specific-part[#fragment]

  • [scheme:][//authority][path][?query][#fragment]

  • [scheme:][//host:port][path][?query][#fragment]

According to this format, UrlBuilder divides a URL into six parts: scheme, host, port, path, query, and fragment. The path and query parts are more complex and are encapsulated using the UrlPath and UrlQuery classes respectively.

Usage

Compared to the URL object, UrlBuilder is more user-friendly. For example:

URL url = new URL("www.hutool.cn");

This will throw a java.net.MalformedURLException: no protocol error. However, if UrlBuilder is used, it will have a default protocol:

// Output: http://www.hutool.cn/
String buildUrl = UrlBuilder.create().setHost("www.hutool.cn").build();

Full Construction

// https://www.hutool.cn/aaa/bbb?ie=UTF-8&wd=test
String buildUrl = UrlBuilder.create()
   .setScheme("https")
   .setHost("www.hutool.cn")
   .addPath("/aaa").addPath("bbb")
   .addQuery("ie", "UTF-8")
   .addQuery("wd", "test")
   .build();

Chinese Encoding

When there are Chinese characters in the parameters, they will be automatically encoded using the default UTF-8 encoding. However, you can also call the setCharset method to customize the encoding. For example:

// https://www.hutool.cn/s?ie=UTF-8&ie=GBK&wd=%E6%B5%8B%E8%AF%95
String buildUrl = UrlBuilder.create()
   .setScheme("https")
   .setHost("www.hutool.cn")
   .addPath("/s")
   .addQuery("ie", "UTF-8")
   .addQuery("ie", "GBK")
   .addQuery("wd", "测试")
   .build();

Parse

When there is a URL string, the of method can be used for parsing:

UrlBuilder builder = UrlBuilder.ofHttp("www.hutool.cn/aaa/bbb/?a=张三&b=%e6%9d%8e%e5%9b%9b#frag1", CharsetUtil.CHARSET_UTF_8);

// Output: 张三
Console.log(builder.getQuery().get("a"));
// Output: 李四
Console.log(builder.getQuery().get("b"));

In this example, we notice that the original URL had parameters “a” which were not encoded and “b” which was encoded. When a user provides such a mixed URL, Hutool is able to identify and decode both parameters. Of course, they will be re-encoded when calling the build() method.

Special URL Parse

Sometimes URLs may contain the ampersand symbol ("&") as a separator, which Google Chrome browser will convert to an “&” symbol. Hutool handles this in the same way:

String urlStr = "https://mp.weixin.qq.com/s?__biz=MzI5NjkyNTIxMg==&mid=100000465&idx=1";
UrlBuilder builder = UrlBuilder.ofHttp(urlStr, CharsetUtil.CHARSET_UTF_8);

// Output: https://mp.weixin.qq.com/s?__biz=MzI5NjkyNTIxMg==&mid=100000465&idx=1
Console.log(builder.build());

(Note: UrlBuilder is mainly used in the http module. When building an HttpRequest, the URLs passed by users are diverse. To achieve maximum adaptability and reduce the need for users to handle URLs, we use UrlBuilder to normalize URLs.)