FileTypeUtil

Origins

When uploading files, sometimes we need to determine the file type. However, we can’t simply rely on the file extension to do this (to prevent malicious scripts from uploading to the server), so we need to determine the file type by reading the first few binary bits on the server side.

Usage

This utility class is very simple to use. You can call the FileTypeUtil.getType method to determine the file type. This method also provides numerous overloaded methods for reading different files and streams.

File file = FileUtil.file("d:/test.jpg");
String type = FileTypeUtil.getType(file);
// Output jpg if it's a jpg file
Console.log(type);

Principles and Limitations

This class determines the file type by reading the first N bytes of the file stream. We have mapped common file types to their corresponding values in a Map, which were collected from various networks. This means that we can only identify a limited number of file types. However, these types have covered common image, audio, video, and Office document types, which can meet most usage scenarios.

For some text-based files, we can’t determine their types by the first few bytes of the file, such as JSON files. These files are essentially text files, and we should read their text content and determine their types based on their syntax.

Custom Types

To improve the extensibility of FileTypeUtil, we can use the putFileType method to add custom file types.

FileTypeUtil.putFileType("ffd8ffe000104a464946", "new_jpg");

The first parameter is the hexadecimal representation of the first N bytes of the file stream, which we can read from a custom file and select a certain length (the longer the length, the more accurate). The second parameter is the file type, and then you can use the FileTypeUtil.getType method.

Note xlsx and docx files are essentially various XML packages zipped into a zip file, so they will be recognized as zip format.