Article: Idiosyncratic argument parsing behaviour in Unreal Engine 4

Prior to 4.21.0, the command-line parsing logic in Epic Games’ Unreal Engine 4 deviates from standard conventions in unexpected ways.

Tags: C++, Unreal Engine

UPDATE: Unreal Engine 4.21.0 has merged my pull request (login required) which fixes this behaviour. The details in this article apply only to Unreal Engine 4.20.3 and older.

Contents

Overview

During the course of adding automation testing functionality to ue4cli, I observed some rather unexpected behaviour when invoking the Unreal Editor from the command-line across different operating systems.

According to standard argument parsing conventions, the following two invocations should be completely equivalent:

UE4Editor <UPROJECT> -ExecCmds="automation list;quit"    # Works correctly under all platforms
UE4Editor <UPROJECT> "-ExecCmds=automation list;quit"    # Does not work correctly under Windows

However, I observed that while the former invocation functioned correctly under all platforms, the latter invocation mysteriously failed to work under Windows. Upon further investigation, I discovered that the failure of the second invocation was due to an interesting quirk in the way that Unreal Engine 4 implements its command-line parsing logic. This idiosyncrasy represents a potential pitfall for all developers creating tools or Continuous Integration (CI) pipelines that invoke the Editor from the command-line.

If you are already familiar with standard command-line argument parsing conventions across both POSIX-based operating systems and Windows, feel free to skip ahead to the section Unreal Engine 4 argument parsing behaviour.

Standard argument parsing conventions

Invocation basics

By convention, a command-line invocation of an application is represented by a single string consisting of a series of whitespace-delimited components, like so:

program arg1 arg2 arg3

The first component specifies the name of the application itself, and the remaining components represent the list of arguments that should be provided as input to the application. When a command interpreter or shell processes the invocation string, it first splits the string into the list of individual components. The initial component is used to locate the executable file for the application, which is then loaded and run. The remaining components are made available to the application’s entry point in the form of an array, so that the application can consume the list of arguments. In the case of C++, the entry point for a command-line application is called main and typically has the following prototype:

int main(int argc, char* argv[]);

The parameter argc contains the number of command-line arguments, and the parameter argv is the array of argument values. Typically, the first element in the argv array contains the application name, whilst the remaining array elements contain the arguments.

Operating system support

At an operating system level, POSIX-based systems such as macOS and Linux provide native support for the convention of using argc and argv through the exec() family of functions, ensuring that all applications can rely on receiving arguments that have already been processed.

Windows, on the other hand, does not enforce this convention at a system level, as the CreateProcess() function represents an application’s invocation using the original, unprocessed string. This raw string (sans the application name) is passed to GUI applications via the WinMain() entry point, whilst command-line applications receive the array of processed arguments via the same main() entry point as is used under POSIX-based systems. However, GUI applications can still opt-in to argument processing and retrieve the argc and argv values by calling the CommandLineToArgvW() function. Conversely, command-line applications can opt-out of argument processing and retrieve the original, unprocessed invocation string by calling the GetCommandLine() function.

Delimiter processing rules

Since whitespace is used as the delimiter for the components of a command-line invocation string, it is necessary to support escape sequences which permit the use of whitespace inside individual argument values. Under POSIX-based systems, the bash shell supports escaping whitespace characters by preceding them with a backslash, like so:

program single\ argument\ with\ spaces

However, this syntax becomes verbose for arguments with a large number of whitespace characters and is not particularly readable. Instead, the more common convention is to wrap quotes containing whitespace in quotes, like so:

program "single argument with spaces"

This convention is supported by all popular shells under POSIX-based systems and also by cmd.exe under Windows. It is important to note that the quote characters are removed by the shell during parsing, and are not present in the values of the argv array that the application receives. This is because the quotes are intended purely to assist the shell in parsing the invocation string, and are not part of the application’s input data. If an argument value needs to contain actual quote characters, these will need to be escaped using the specific rules of the shell or command interpreter. Under Windows, there are actually several layers of rules to account for.

It is also important to note that the exact placement of the quotes around an argument is flexible, so long as no whitespace characters appear outside of the quotes. For example, the following two invocations are equivalent:

program "single argument with spaces"    # argv = ['program', 'single argument with spaces']
program single" argument with spaces"    # argv = ['program', 'single argument with spaces']

However, this next invocation is not, and would be interpreted as two separate arguments:

program single "argument with spaces"    # argv = ['program', 'single', 'argument with spaces']

Unreal Engine 4 argument parsing behaviour

Now that the standard conventions for parsing command-line arguments have been described, we can examine the specific ways in which Unreal Engine 4 deviates from these conventions.

As of Unreal Engine 4.19.2, command-line argument parsing in the Engine is based around the FCommandLine and FParse classes. The FParse class is designed to be flexible enough for generic data parsing tasks in addition to command-line parsing, and so operates on strings rather than argv-style arrays. In instances where command-line parameters with values need to be parsed (e.g. -ExecCmds=COMMANDS), the FCommandLine::Get() function is called to retrieve the command-line invocation string, which is then parsed by the FParse::Value() function to extract the value. It is from the implementation of this parsing function that the deviation from standard conventions stems.

The current implementation of the FParse::Value() function only considers a parameter value to be quoted if the opening quote appears immediately after the equals sign (login required.) If a quote character does not appear in this exact position, the parameter value is treated as an unquoted string that does not contain any whitespace, and so after skipping any initial whitespace the parser will then halt immediately when it next encounters an instance of a whitespace character. This means that if the opening quote appears at the start of the entire argument (i.e. "-ExecCmds=COMMANDS" instead of -ExecCmds="COMMANDS"), the argument value will end up being truncated.

The fact that this behaviour contradicts standard argument parsing conventions is not a problem in and of itself, since the FParse class is designed to handle generic data processing tasks and this is simply the format that it is designed to consume. Command-line argument handling is merely a layer built on top of this functionality and can simply transform the strings it passes to the underlying parser into the expected format. The problem is that this format conversion is not currently performed under Windows.

The Unreal Engine entry points for both macOS (login required) and Linux (login required) reconstruct the command-line invocation string from the values of the argv array, injecting quotes in the relevant locations to ensure the generated strings adhere to the requirements of the FParse class. However, the entry point for Windows (login required) simply retrieves the raw command-line invocation string using the GetCommandLine() function and leaves it unmodified, completely bypassing the standard Windows argument parsing rules and leaving all quote characters in the exact positions where they appeared in the original invocation.

The end result of this is that command-line parsing under macOS and Linux behaves in exactly the way that users have come to expect, whereas seemingly correct invocations will result in truncated argument values and incorrect behaviour under Windows.

Solution

The solution is for the Windows entry point to simply perform the same transformations that are performed under macOS and Linux. The raw invocation string can be passed to the CommandLineToArgvW() function to retrieve the argc and argv values, which can then be used to build a string suitable for consumption by the FParse class. This brings the command-line behaviour under Windows into line with both the other operating systems and standard argument parsing conventions. I submitted pull request #4812 (login required) which implements this fix, and the patch was subsequently merged in Unreal Engine 4.21.0.

Developers creating tools or Continuous Integration (CI) pipelines that invoke version 4.20.3 or older of the Editor from the command-line will need to ensure that quotes are placed in precisely the right location in order to ensure correct functionality across all operating systems.