Skip to main content

About File Encodings

tip

You probably don't have to worry about this.

sqlfmt uses Python, and Python uses Unicode to represent strings of text, including your SQL code while it is being formatted. When Unicode strings are saved to disk (or read from disk), they must be encoded into a sequence of bytes. This is a surprisingly complex subject! There are dozens of ways to encode characters into bytes, and a complex web of partial compatibility between them.

By default, sqlfmt assumes your .sql files are encoded in UTF-8, which is becoming the de facto standard Unicode encoding.

If your .sql files have a different encoding, you can specify that encoding at runtime by configuring sqlfmt with the --encoding option. For example, to decode files using the cp1252 encoding, you can run sqlfmt with:

sqlfmt --encoding cp1252 ./path/to/cp1252_file.sql

Alternatively, sqlfmt can detect its host machine's locale and use that locale's preferred encoding. For this behavior, pass the special word inherit to the --encoding option, like this:

sqlfmt --encoding inherit .
danger

Using --encoding inherit can cause compatibility issues between users on different operating systems, or even different versions of an operating system. It is provided as an option to replicate the default behavior of sqlfmt, before v0.17.0.

The BOM

sqlfmt will automatically detect the presence of a UTF BOM in a source file. If a BOM is in the source, it will also be written to the formatted file.

To always write a BOM to the formatted file, whether or not the source contains a BOM, you can use the utf-8-sig encoding:

sqlfmt --encoding utf-8-sig .

Newlines

In addition to different encodings, different platforms use different symbols to represent a newline (or line break). Unix platforms use \n, Windows uses \r\n, and Mac Classic (prior to OS X) used just \r.

sqlfmt reads in your file in "universal newline" mode, which translates every newline to \n while in memory. It then writes your file using your host machine's default line ending. This means that sqlfmt won't write a formatted file just to change its line endings, and a file won't fail in --check mode if it uses different line endings from what the host machine would use. This also plays better with git's default newline behavior, which is also machine-specific. This behavior is not configurable.