Windows command line with spaces — Ваш верный помощник с OS Windows

Command-line environments like the Windows Command Prompt and PowerShell use spaces to separate commands and arguments—but file and folder names can also contain spaces. To specify a file path with a space inside it, you’ll need to “escape” it.

Command Line 101: Why You Have to Escape Spaces

“Escaping” a character changes its meaning. For example, escaping a space will cause the shell to treat it like a standard space character rather than a special character that separates command-line arguments.

For example, let’s say you have a text file that you want to see the contents of. You can do that with the type command. Assuming the text file is at C:\Test\File.txt, the following command in Command Prompt will show its contents:

Great. Now, what if you have the same file at C:\Test Folder\Test File.txt? If you try running the below command, it won’t work—those spaces in the file path are getting in the way.

type C:\Test Folder\Test File.txt

The command line thinks you’re trying to look for a file called C:\Test and says it “cannot find the path specified.”

Three Ways to Escape Spaces on Windows

There are three different ways you can escape file paths on Windows:

By enclosing the path (or parts of it) in double quotation marks ( ” ).
By adding a caret character ( ^ ) before each space. (This only works in Command Prompt/CMD, and it doesn’t seem to work with every command.)
By adding a grave accent character ( ` ) before each space. (This only works in PowerShell, but it always works.)

We’ll show you how to use each method.

Enclose the Path in Quotation Marks ( ” )

The standard way to ensure Windows treats a file path properly is to enclose it in double quotation mark ( ” ) characters. For example, with our sample command above, we’d just run the following instead:

type "C:\Test Folder\Test File.txt"

You can actually enclose parts of the path in quotation marks if you prefer. For example, let’s say you had a file named File.txt in that folder. You could run the following:

type C:\"Test Folder"\File.txt

However, that isn’t necessary—in most cases, you can just use quotation marks around the whole path.

This solution works both in the traditional Command Prompt (CMD) environment and in Windows PowerShell.

Sometimes: Use the Caret Character to Escape Spaces ( ^ )

In the Command Prompt, the caret character ( ^ ) will let you escape spaces—in theory. Just add it before each space in the file name. (You’ll find this character in the number row on your keyboard. To type the caret character, press Shift+6.)

Here’s the problem: While this should work, and it does sometimes, it doesn’t work all the time. The Command Prompt’s handling of this character is strange.

For example, with our sample command, you’d run the following, and it wouldn’t work:

type C:\Test^ Folder\Test^ File.txt

On the other hand, if we try opening our file directly by typing its path into the Command Prompt, we can see that the caret character escapes the spaces properly:

C:\Test^ Folder\Test^ File.txt

So when does it work? Well, based on our research, it seems to work with some applications and not others. Your mileage may vary depending on the command you’re using. The Command Prompt’s handling of this character is strange. Give it a try with whatever command you’re using, if you’re interested—it may or may not work.

For consistency, we recommend you stick with double quotes in the Command Prompt—or switch to PowerShell and use the grave accent method below.

PowerShell: Use the Grave Accent Character ( ` )

PowerShell uses the grave accent ( ` ) character as its escape character. Just add it before each space in the file name. (You’ll find this character above the Tab key and below the Esc key on your keyboard.)

type C:\Test` Folder\Test` File.txt

Each grave accent character tells PowerShell to escape the following character.

Note that this only works in the PowerShell environment. You’ll have to use the caret character in Command Prompt.

If you’re familiar with UNIX-like operating systems like Linux and macOS, you might be used to using the backslash ( \ ) character before a space to escape it. Windows uses this for normal file paths, so it doesn’t work—-the caret ( ^ ) and grave accent ( ` ) characters are the Windows version of backslash, depending on which command-line shell you’re using.

Source: how to geek

Источник

We share a lot of tips and tricks that involve running commands in Command Prompt on Windows 10. A lot of common things, such as pinging a server, or checking the status of your network switch are done vie Command Prompt. If you’re not comfortable using the Command Prompt beyond commands that are already written out and to be executed as they are, you tend to miss out on lots of useful things you can do from the Command Prompt. One, rather frequent question that new users have when using the Command Prompt is how to enter the name or address of a folder or file that has a space in its name or in its path.

Generally speaking, if you’re trying to run a command that involves specifying the path to a folder or file, and the path is incorrect i.e., Command Prompt is unable to see it, the error message won’t tell you as much. The message that Command Prompt returns will vary depending on the command you’ve run and it will seem more like there’s something wrong with the command, rather than the path making it more difficult to trouble shoot the problem. The fix is really simple.

Entering paths with spaces

The trick is the double-quotes. Make it a rule of thumb to enclose any and all file paths that you enter in Command Prompt in double quotes.

The following command will not run. The path has a space in it and at that space, the command breaks and Command Prompt thinks you’ve entered a new command or parameter.

XCOPY C:\Users\fatiw\OneDrive\Desktop\My test Folder D:\ /T /E

This command will work. The only difference between the two is that in the second one, the path is in double-quotes.

XCOPY "C:\Users\fatiw\OneDrive\Desktop\My test Folder" D:\ /T /E

Even if your path doesn’t have a space in it, it’s a good idea to enclose it in double-quotes and develop the habit of doing it. If you forget, or you’re dealing with a longer path, a simple error like this might be hard to spot.

This holds true for all command line apps that you use on Windows 10. In PowerShell, any command that requires a file or folder path to be entered should be enclosed in double-quotes. If the path doesn’t have a space in it, you’ll be fine but if it does, the command won’t run so again, this is about developing a habit to save yourself trouble later.

Fatima Wahab

Fatima has been writing for AddictiveTips for six years. She began as a junior writer and has been working as the Editor in Chief since 2014.

Fatima gets an adrenaline rush from figuring out how technology works, and how to manipulate it. A well-designed app, something that solves a common everyday problem and looks

Источник

Working with file paths in the Windows Command Line can at times be a cumbersome experience, especially when dealing with spaces in file names and paths. Spaces are often treated as delimiters or argument separators by the command line, which can lead to errors when trying to access files or directories with spaces in their names. This article will outline various methods to handle spaces in file paths effectively in the Windows Command Line environment. We’ll explore a mix of simple techniques, command examples, and common use cases to help you become proficient in navigating and manipulating file paths in the command line interface.

Understanding the Problem

When you input a command in the Windows Command Line, the interpreter looks for spaces between words to determine where one command ends and another begins. For example, if you enter the command cd Program Files, the command line interprets this as two separate arguments: cd (change directory) and Program Files. As a result, it generates an error since it does not recognize Files as an argument for the cd command.

Why It Matters

Efficiently managing file paths with spaces is crucial for system administrators, developers, and anyone using the command line. Missing or misinterpreting spaces can lead to incorrect commands, resulting in frustration and wasted time. Thus, understanding how to properly handle paths with spaces enhances your ability to navigate the command line effectively.

Approaches to Escape Spaces in File Paths

In the Windows Command Line, there are a few established methods to handle spaces in file paths. Below, we’ll discuss these methods in detail.

1. Enclosing Paths in Double Quotes

The most straightforward method to handle spaces in file paths is to enclose the entire path in double quotes. When the command line sees a quoted string, it treats everything inside those quotes as a single argument.

Example:

cd "C:Program FilesMyApplication"

In this example, the command line understands that you want to change the directory to C:Program FilesMyApplication as a single entity instead of separating it into C:Program and FilesMyApplication.

Tip:

Whenever you compose any command that involves a path containing spaces, wrapping the entire path in quotes is a safe practice.

2. Using Escape Characters

When using the Windows Command Line, there are special characters you can use to escape spaces. The caret (^) character can act as an escape character in Windows Command Line. This means that you can use it to signal to the command line to treat the subsequent space as a character in a file name rather than as an argument separator.

Example:

cd C:Program^ FilesMyApplication

In this case, ^ effectively tells the command line to ignore the space in Program Files, treating it as part of the path.

Case with Multiple Spaces:

If you have multiple spaces in a file name, you must precede each space with the caret:

cd C:Folder^ NameAnother^ FolderMy^ File.txt

This approach allows you to specify paths without using quotes, although it can be more cumbersome and less readable.

3. Using 8.3 Short File Names

Windows has a legacy feature known as the «8.3 filename convention.» This allows long file names to be represented by a shorter form. When using this method, you can often avoid dealing with spaces completely.

Find the Short Name:
To find the short name of a directory or file, you can use the dir /x command:

dir /x "C:Program Files"

You might see output that looks like this:

Directory of C:

07/01/2023  10:00 AM              PROGRA~1     Program Files

You can then use the short name (e.g., PROGRA~1) in your commands:

cd C:PROGRA~1MyApplication

While this method can save space-related headaches, the readability of short names may not be as clear as using the full name.

4. Using the Tab Key for Autocompletion

The Windows Command Line features a built-in autocompletion function that can be particularly useful when dealing with complex paths. You can begin typing a path and then press the Tab key to cycle through files and folders that match the text you’ve input so far.

Example:

cd C:Prog

If C:Program Files is one of the options, simply pressing Tab will automatically complete the path for you, including the necessary escape handling for spaces.

5. Utilizing Batch Files for Longer Commands

For lengthy commands frequently executed, create a batch file (.bat). Batch files allow you to enter commands in a text format, making it easier to include spaces since you can enclose paths within quotes.

Example:
Create a navigate.bat file that contains:

@echo off
cd "C:Program FilesMyApplication"
start myapp.exe

You can then execute this batch file directly, effectively abstracting away the complexities involved in manual command entry.

6. Using PowerShell for Alternative Solutions

If you want to leverage a command line that natively handles spaces more intuitively, consider using PowerShell. PowerShell can handle spaces in file paths without requiring the same escape characters or quote encapsulation.

Example:

cd "C:Program FilesMyApplication"

PowerShell also allows for automatic tab completion. Since it is more robust for scripting and file manipulations compared to the traditional command line, many users prefer it for more complex tasks.

Common Use Cases

Understanding how to handle spaces effectively can enhance your command-line experience in numerous scenarios. Let’s explore a few common use cases:

Navigating Directories:
Changing directories often involves spaces, especially in program installations.
```
cd "C:Program Files"
```
Accessing Files:
If you have files with spaces in their names, access them using enclosing quotes.
```
type "C:My DocumentsReport.docx"
```
Executing Applications:
When launching applications from the command line that reside in paths with spaces, ensure the full path is quoted.
```
start "" "C:Program FilesMyAppMyApp.exe"
```
Using Copy and Move Commands:
With copy or move, paths with spaces need to be quoted to avoid errors.
```
copy "C:Program FilesOriginalFile.txt" "C:My DocumentsBackupFile.txt"
```

Best Practices

Always Quote Paths with Spaces: When you are writing commands, especially for file pathways, prioritize quoting to avoid unintentional errors.
Familiarize Yourself with 8.3 Names: When working in legacy systems or with older software, knowing how to access short names might come in handy.
Consider Scripting with Batch Files: If you frequently run the same commands, consider creating batch files to automate the process and eliminate the risk of errors due to spaces.
Explore PowerShell: As a more powerful alternative to Command Prompt, PowerShell manages spaces more elegantly and is worth considering for your scripting needs.
Utilize Autocomplete: Take advantage of the Command Line’s built-in autocomplete feature to reduce typing errors and navigate to paths easily.

Conclusion

Handling spaces in file paths on the Windows Command Line may seem tedious, but with the proper techniques, you can navigate it seamlessly. Whether you choose to use quotes, escape characters, or PowerShell, becoming familiar with these methods will enhance your efficiency. Each of these methods has its advantages; thus, the choice of which to use can depend on your specific task or preference.

Mastering command line navigation empowers you to become a more effective user, allowing you to utilize the full capabilities of your operating system without getting bogged down by common pitfalls associated with spaces in file paths. With practice, you’ll find these techniques can save you time and frustration, whether you’re a beginner or an experienced user.

Источник

There are several ways to open or run a file in Windows. You can double-click the file to open it or use the Run command box. You can also use command prompt to open the file. Sometimes it happens that when we manually enter the full path and filename in command prompt, the file doesn’t open or the program doesn’t run.

Usually when we want to open a file in another program (which Windows recognizes), we use the “start” command before the file path and filename to open the file. Here is the syntax of opening a text file:

C:\>start sample.txt

But when it comes to filenames with spaces, Windows will only take the first word and search for that name. And if it doesn’t find the filename, it will give the following error:

I have been double quote at the start and end of the filename to open the file.

C:\>start "sample text file.txt"

But sometimes this doesn’t work. Windows will still take the first word i.e., sample as filename and search for it in the specified folder path.

You can also try out the following syntax which I have used successfully to open files through command line:

C:\>start "sample text file".txt

If one option doesn’t work, you can try out the next one and most probably one of the options discussed above will work. I’m not sure why but this is Windows behavior. Different versions of DOS interpret filenames differently and this causes confusion on how to write the correct syntax of filenames with spaces.

Источник

I’d like to have an argument, please

This post presents a better way to understand the quoting and escaping of Windows command line arguments.

A Problem Scenario

In order for you to appreciate why it’s important to understand the things we’re discussing, I’m going to start with a typical problem scenario that illustrates the kind of things that can go wrong.

We know from the Microsoft documentation for CreateProcess that command lines are split into individual arguments based on the location of spaces on the command line. We’re told to enclose arguments that contain spaces between a pair of double quote characters. The typical description of how whitespace (space and tab) and quotes are dealt with while parsing a command line uses terms like double-quoted string and inside- or outside a quoted part. Even the Microsoft description of how a command line is parsed says a string surrounded by double quotation marks (“string”) is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.

We’re led to believe that analyzing a command line is just a matter of looking for matching pairs of quote characters that enclose individual arguments or embedded quotes. It sounds simple enough and we’re pretty sure we understand how it works. But still, occasionally we come across an example we can’t explain, or (more likely), some command line breaks an existing program or script.

Suppose we have a program, CreateDocs.exe, that generates documentation for a set of C++ source files. It’s driven by a batch file that accepts a starting directory and an optional “VERBOSE” switch. The batch file first passes the starting directory to the program, then generates a list of subdirectories which it also passes one at a time to the program. Our batch file faithfully quotes each directory in case it contain spaces. Recall that %~1 is the first batch argument (%1) with any existing quotes removed, so [”%~1″] unconditionally double quotes the string whether it was quoted or not on the command line:

@echo off
REM %1 is the root of the directory tree to process.
REM %2 is the optional string VERBOSE
cls
setlocal
echo.
IF '%1'=='' echo No Path Specified& goto:eof
set ROOT_PATH=%~1

REM Process root directory:
CreateDocs "%ROOT_PATH%" %2

REM Process subdirectories:
FOR /F %%S IN ('dir /ad /b "%ROOT_PATH%"') DO (
   CreateDocs "%%S" %2
   )
echo.
goto:eof

Once in a while the script neglects to process files in the starting directory. We discover that it fails whenever the user appends a trailing backslash to the specified directory. The backslash is interpreted by CreateDocs as an escape for the closing double quote. So instead of receiving [SomeDirectory\] in argv[1] when processing [”SomeDirectory\” VERBOSE], it receives [SomeDirectory” VERBOSE]. If you don’t already understand, you will later see why this is happening. It may seem a little mysterious that the FOR loop works correctly, but that’s only because cmd.exe parses things differently than the executable does.

Although we now know what the problem is, we can’t just tell our users not to append a trailing backslash. Someone will get it wrong! To fix this in CreateDocs would require some relatively complex logic. We could, for example, detect that argv[0] contains a [”] at the end of a valid path (and the directory exists), possibly followed by [ VERBOSE].

We opt instead to add logic to our script to detect and remove the trailing backslash:

@echo off
REM %1 is the root of the directory tree to process.
REM %2 is the optional string VERBOSE
cls
setlocal
echo.
IF '%1'=='' echo No Path Specified& goto:eof
set ROOT_PATH=%~1

REM Strip off trailing backslash:
IF [%ROOT_PATH:~-1,1%]==[\] set ROOT_PATH=%ROOT_PATH:~0,-1%

REM Process root directory:
CreateDocs "%ROOT_PATH%" %2

REM Process subdirectories:
FOR /F %%S IN ('dir /ad /b "%ROOT_PATH%"') DO (
   CreateDocs "%%S" %2
   )
echo.
goto:eof

We congratulate ourselves on our cleverness and we use the script successfully for months.

Then one day we change our build process. We want to be able to debug using binaries built on different development machines. Because the source code is installed in different places on different machines (something we don’t want to change), the debugger can’t always find the .pdb file because it’s specified in the executable image with an absolute path. We could fix this various ways, but we decide to use a fixed drive letter in the build scripts and use subst to map the root of the source code tree, wherever it is on a particular machine, to the root of this drive.

Once again we start to notice that our script occasionally (but not always) fails.

What’s happening now?

The answer is that our script was smart, but not smart enough. The problem this time turns out to be caused by the fact that we are now specifying a root directory (ex. [x:\]). When the batch file removes the trailing backslash it generates [x:] which (for a root directory only) is not the same as with the backslash. The former specifies the default working directory while the latter is the root directory.

Most of the time our users open a dedicated console for running this tool and never do any other work there. In that case c:\ is the same as c:. But occasionally someone switches to the substed drive and goes to a subdirectory to do some other work. This makes c: different from c:\ and on those infrequent occasions the script fails.

We fix it by adding a special case to not remove the trailing backslash for the special case of a root directory explicitly specified.

This example was not intended to teach you what to do, but to give a reasonable example of how problems can creep in.

The above batch file still has some issues that I will leave unresolved for now (the “IF” statements may fail with certain strings containing double quote characters).

The Problem With the Existing Way of Looking at Things

Consider the following set of partial command lines:

["Argument With Spaces"]

[Argument" "With" "Spaces]

["Argument "With" Spaces"]

[Argument" With Sp"aces]

["Argument With Spaces]

["Ar"g"um"e"n"t" W"it"h Sp"aces""]

Before continuing, take a few moments and try to pick out the outer and the embedded quoted strings.

You may be surprised to learn that all of these are interpreted exactly the same way by the command line parser— as a single argument: [Argument With Spaces].

But how can all of these possibly generate the same argument? One of the quotes is not even closed!

The command line parser rules must be either inconsistent, incomprehensible, or just plain stupid.

But the problem isn’t with the rules, but simply that the conventional way of looking at things is wrong. We need a different way of analyzing command lines that’s easy to apply and always works.

(for an even more extreme and humorous example take a look at 50 Ways to Say Hello)

A Better Way To Look at Things

We come now to a very important concept.

Double quote characters in a command line string have no relation to the boundaries between arguments and do not necessarily enclose arguments. Each individual double quote character by itself acts simply as a switch to enable or disable the recognition of space as a divider between arguments.

Attempting to find pairs of double quote characters that enclose meaningful chunks of text is possibly the biggest conceptual mistake that people make when looking at a complicated command line.

This point is extremely important (possibly the most important concept in this article) so I’m going to discuss it in depth before I move on to explain how a command line is received and processed by an executable program.

Each of the command line parsers we will consider (parse_cmdline, CommandLineToArgvW, and cmd.exe) has a set of characters that in some contexts it considers special (examples are whitespace, double quote, caret, and backslash) and which cause some action to be taken when one of them is encountered— such as beginning a new argument (and removing the special character).

In other contexts the parser treats these same characters like regular text. So at any given time the parser is in one of two states— recognizing (i.e., interpreting) special characters, or ignoring them.

The Parser States Named

In order to be explicit when referring to these two states in the text, I invented names for them. I call the first state InterpretSpecialChars and the second state IgnoreSpecialChars . They correspond to what some writers refer to as being outside or inside a double quoted string. I purposely avoided giving them names containing quote or quoting because doing so reinforces the misconception that double quote characters somehow delineate arguments. The colors shown are used later to show the parser state in images I created to demonstrate how example command lines are parsed.

I originally considered calling these InterpretWhitespace and IgnoreWhitespace, but I wanted to use the same state names when discussing cmd.exe where the state governs the interpretation of a different set of special characters, not whitespace.

The key to understanding any command line, no matter how complex, is to pay attention to which of these two states the parser is in at any given time and understanding what causes the state to change.

The Last Example Explained

Let’s examine the last example in the above list to see how it is parsed, character by character, from left to right. The special character that we will see either interpreted or ignored, depending on the parser state, is the space character.

You’ll see that the parser state is IgnoreSpecialChars when both the spaces are read, so they do not cause a new argument to be started.

Each line below represents a single character read from the command line. The first column is the actual character read, followed by the parser state before processing the character, the action that was taken, and the value of the partial argument after processing the character.

Here’s the command line again:

["Ar"g"um"e"n"t" With Sp"aces""]

Char read: State when char read:    Action:                        Argument (after processing char):
           InterpretSpecialChars    Start                          []
    "      InterpretSpecialChars    Go to IgnoreSpecialChars       []
    A      IgnoreSpecialChars       Add char [A]                   [A]
    r      IgnoreSpecialChars       Add char [r]                   [Ar]
    "      IgnoreSpecialChars       Go to InterpretSpecialChars    [Ar]
    g      InterpretSpecialChars    Add char [g]                   [Arg]
    "      InterpretSpecialChars    Go to IgnoreSpecialChars       [Arg]
    u      IgnoreSpecialChars       Add char [u]                   [Argu]
    m      IgnoreSpecialChars       Add char [m]                   [Argum]
    "      IgnoreSpecialChars       Go to InterpretSpecialChars    [Argum]
    e      InterpretSpecialChars    Add char [e]                   [Argume]
    "      InterpretSpecialChars    Go to IgnoreSpecialChars       [Argume]
    n      IgnoreSpecialChars       Add char [n]                   [Argumen]
    "      IgnoreSpecialChars       Go to InterpretSpecialChars    [Argumen]
    t      InterpretSpecialChars    Add char [t]                   [Argument]
    "      InterpretSpecialChars    Go to IgnoreSpecialChars       [Argument]
           IgnoreSpecialChars       Add char [ ]                   [Argument ]
    W      IgnoreSpecialChars       Add char [W]                   [Argument W]
    i      IgnoreSpecialChars       Add char [i]                   [Argument Wi]
    t      IgnoreSpecialChars       Add char [t]                   [Argument Wit]
    h      IgnoreSpecialChars       Add char [h]                   [Argument With]
           IgnoreSpecialChars       Add char [ ]                   [Argument With ]
    S      IgnoreSpecialChars       Add char [S]                   [Argument With S]
    p      IgnoreSpecialChars       Add char [p]                   [Argument With Sp]
    "      IgnoreSpecialChars       Go to InterpretSpecialChars    [Argument With Sp]
    a      InterpretSpecialChars    Add char [a]                   [Argument With Spa]
    c      InterpretSpecialChars    Add char                       [Argument With Spac]
    e      InterpretSpecialChars    Add char [e]                   [Argument With Space]
    s      InterpretSpecialChars    Add char [s]                   [Argument With Spaces]
    "      InterpretSpecialChars    Go to IgnoreSpecialChars       [Argument With Spaces]
    "      IgnoreSpecialChars       Go to InterpretSpecialChars    [Argument With Spaces]

Quoting Defined

Enabling or disabling the recognition of special characters (switching between InterpretSpecialChars and IgnoreSpecialChars) is done by strategically placing double quote characters at specific locations on the command line— wherever you want the state to change. This is under the control of the person who writes the command line. The act of placing double quote characters for this purpose is how I define the term quoting. It is not the delineation of a piece of text by enclosing it in a pair of double quote characters.

Quoting is the placement of individual double quote characters on the command line to control the switching between InterpretSpecialChars and IgnoreSpecialChars.

Since double quote characters work individually and not in pairs, there are no such concept as a dangling (unclosed) or mismatched quote as discussed by other writers. The command line parser simply ends in one or the other of the two states. If there is an even number of un-escaped double quote characters on the command line (we will define escape later) the parser ends in InterpretSpecialChars. If there is an odd number of un-escaped double quote characters it ends in IgnoreSpecialChars.

What would traditionally be called a quoted string (text between two double quote characters) can span multiple arguments or be contained completely within an argument— more evidence that it’s hopeless to use pairs of double quote characters to find the arguments!

The seemingly arcane parse rules in the Microsoft documentation actually make sense once you understand that they don’t directly tell you how the command line is split into arguments, but merely describe what causes the parser to switch state.

By training your mind to scan a command line from left to right like the parser does, instead of trying to pick out the quoted chunks, you will have little problem understanding or correctly generating even the most complicated command line.

We’ll talk more about this when we go over the specifics of the parsers later.

How a Program Receives it’s Command Line

Everyone knows the purpose of command line arguments— to customize a particular run of a program or a script. This section covers how an executable program interprets the command line string received from CreateProcess. Elsewhere I will cover the behavior of cmd.exe. For now, keep the following in mind:

The way cmd.exe interprets the command line is different and completely independent of how an executable program interprets the command line. You must separate your thinking about what cmd.exe does from your thinking about what an executable program does.

From a programmer’s perspective, a Windows application written in C or C++ begins execution in main or WinMain and makes available (as function parameters) either an array of individual arguments (main’s argv[]) or a single command line string (WinMain’s lpCmdLine). Both types of application also have available an alternate source for the same set of arguments contained in argv[]— the global variables __argc and __argv. These are filled in before main or WinMain is called so many programmers believe that their programs receive the command line already split up into individual arguments, perhaps by Windows itself. But that is not the case.

A program receives a single command line string which the program itself, not the operating system, splits into individual arguments.

This string normally consists of the executable name followed by the actual arguments. I will have a lot more to say about this, but for now the important thing to remember is there is just one command line string.

All programs executed under Windows are ultimately started by CreateProcess (or variants such as CreateProcessAsUser). CreateProcess is a complex subject and I’m only going to look at two of the parameters it accepts: the executable name of the program to start (lpApplicationName) and the command line string (lpCommandLine), both of which are optional. Obviously, you have to tell Windows what program to run, so if the first parameter is not supplied (is NULL) then the program name must be given at the start of the command line string. By convention, even if you do specify the program name in the first parameter, you’re supposed to repeat it at the beginning of the command line (if you supply one), but this is not enforced and leaving it out can cause problems. If you don’t supply a command line Windows creates one containing just the program name.

A program does not know how its name was specified and most programs blindly assume the first argument is the program name.

The first command line argument is usually interpreted differently than the rest of the command line or ignored altogether. A Windows GUI app will not even see the first argument (at least not in lpCmdLine). If it’s something important it will get lost:

[ImportantArgument-0 Argument-1] is received in lpCmdLine as just [Argument-1]

To avoid problems caused by special handling of the first argument, or the assumption that it is the executable name, always include the program name at the beginning of any command line string you supply to CreateProcess.

The Microsoft documentation for CreateProcess implies that including the program name at the start of lpCommandLine is optional, something only “C programmers generally” do. After we show how the parsers give special treatment to the first argument, you will understand why I recommend that you always specify the executable name as the first thing on the command line. An interesting thing to note (which we won’t pursue) is that if you specify the executable name in the first argument, the PATH is not searched, but if you specify NULL for the first argument, the PATH is used. For security reasons, Microsoft recommends that you always supply the first argument to CreateProcess and always enclose the executable name at the beginning of the command line string in quotes, though this is strictly only necessary if the path contains spaces.

How the Command Line is Split

Programs generally split their command line by treating a sequence of one or more whitespace characters (space, tab) as the separator between arguments. I say generally because a program is free to do whatever it wants and there are no standards to guide us. There is not even universal agreement on the set of characters that constitute whitespace.

A program can interpret its command line any way it wants. There is no method to ensure with 100 percent certainty that a given command line is correct or guarantee how it will be broken up into individual components.

This may sound hopeless, but fortunately there’s only a small set of facilities for parsing the command line that most programs use. Once you understand these, you will understand how the vast majority of programs behave.

A program that links with the Microsoft C/C++ runtime library automatically splits the command line by calling a function named parse_cmdline and passes the result to main in argc and argv, or to WinMain through the global variables __argc and __argv.

A Windows GUI program can access the command line string (minus the program name) through the lpCmdLine argument to WinMain, and any program can access the full command line, including the program name (or whatever else was specified at the start of the command line when CreateProcess was called) by calling the aptly named GetCommandLine function. The program can then split the resulting string explicitly either by calling CommandLineToArgvW or a custom command line parser.

The splitting rule used by both these parsers is simple: splitting always occurs on a whitespace boundary, and only when the parser state is InterpretSpecialChars.

Almost all programs use the arguments generated by either parse_cmdline or CommandLineToArgvW, so we will focus on these.

Getting a View on Things

In the next post I’ll go into great detail about the algorithms used by parse_cmdline and CommandLineToArgvW to split a command line. We’ll see some ugliness caused by a special Microsoft rule governing backslashes and we’ll go over the special handling of the first argument. But first I’m going to give you some tools for looking at things (later I’ll give you additional tools that may actually make your job easier).

Download DumpArgs Project
Download RunTest Project

Both of these are console programs.

The first utility is called DumpArgs and is used to see exactly how a given command line is split into arguments.

DumpArgs first displays the command that it received, both ANSI and Unicode in case there’s a difference. Since it is a console app, it already has the command line split into arguments by parse_cmdline (in argv[]). The program retrieves the raw command line using GetCommandLine and splits it again using CommandLineToArgvW then outputs both sets of arguments for comparison.

Simply run it with any command line you want.

Example Run of DumpArgs

Command line:

DumpArgs “First NotSecond” Second!

Output:

Unicode Command Line (from GetCommandLineW): [dumpargs  "First NotSecond" Second!]


              00 01 02 03 04 05 06 07 | 08 09 0a 0b 0c 0d 0e 0f
              -------------------------------------------------
   00000000:  64 00 75 00 6d 00 70 00 | 61 00 72 00 67 00 73 00   d.u.m.p. | a.r.g.s.
   00000010:  20 00 20 00 22 00 46 00 | 69 00 72 00 73 00 74 00   _._.".F. | i.r.s.t.
   00000020:  20 00 4e 00 6f 00 74 00 | 53 00 65 00 63 00 6f 00   _.N.o.t. | S.e.c.o.
   00000030:  6e 00 64 00 22 00 20 00 | 53 00 65 00 63 00 6f 00   n.d."._. | S.e.c.o.
   00000040:  6e 00 64 00 21 00 00 00                             n.d.!...


ANSI Command Line (from GetCommandLineA): [dumpargs  "First NotSecond" Second!]


              00 01 02 03 04 05 06 07 | 08 09 0a 0b 0c 0d 0e 0f
              -------------------------------------------------
   00000000:  64 75 6d 70 61 72 67 73 | 20 20 22 46 69 72 73 74   dumpargs | __"First
   00000010:  20 4e 6f 74 53 65 63 6f | 6e 64 22 20 53 65 63 6f   _NotSeco | nd"_Seco
   00000020:  6e 64 21 00                                         nd!.


CommandLineToArgvW Found 3 Argument(s)
   arg 0   = [dumpargs]
   arg 1   = [First NotSecond]
   arg 2   = [Second!]

Command Line Arguments From argv Array (argc = 3):
   argv[0] = [dumpargs]
   argv[1] = [First NotSecond]
   argv[2] = [Second!]

I show the source code below, but you can download a zip archive containing a pre-built executable, source code, and a Visual Studio project.

Notes on the VS solution (these notes apply to both the DumpArgs project and the StartTest project, below):

Just unzip the archive to an empty directory and open the .sln file.
You can switch between ANSI and Unicode build by first clicking on the project in the Solution Explorer in Visual Studio and selecting Project->DumpArgs Properties. Under Configuration Properties->General you will see Character Set in the right pane. Select the one you want (counter-intuitively, you select Multi-Byte Character Set for an ANSI build).

// DumpArgs.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

void APIENTRY DumpHex (unsigned char* Buffer, int Length) // Displays a buffer of bytes
{
for (int BufIdx = 0; BufIdx < Length; ++BufIdx) {
   if ((BufIdx % 256) == 0) {
      // Print header every 256 bytes
      _tprintf (_T("\n              00 01 02 03 04 05 06 07 | 08 09 0a 0b 0c 0d 0e 0f\n"));
      _tprintf (_T("              -------------------------------------------------\n"));
      }
   if ((BufIdx % 16) == 0) {
      // Print buffer offset at start of lines
      _tprintf (_T("   %08lx:  "), BufIdx);
      }
   _tprintf (_T("%02x "), Buffer[BufIdx]);
   // Output separator in middle (but not after last byte):
   if (((BufIdx % 16) == 7) && (BufIdx < (Length - 1))) {
        _tprintf (_T("| "));
        }
     if (((BufIdx % 16) == 15) || (BufIdx == (Length - 1))) {
        // Pad last line with spaces if necessary
        int Padding = 3 * (16 - ((BufIdx % 16) + 1));
        if (Padding > 21) {
         Padding += 2; // For where separator would have been
         }
      while (Padding--) {
         _tprintf (_T(" "));
         }
      // Output printable characters
      _tprintf (_T("  "));
      for (int CharIdx = 16 * (BufIdx / 16); CharIdx <= BufIdx; ++CharIdx) {
         _tprintf (_T("%c"), isprint(Buffer[CharIdx])?Buffer[CharIdx]==' '?'_':Buffer[CharIdx]:'.');
         // Output separator in middle (but not after last byte):
         if (((CharIdx % 16) == 7) && (CharIdx != BufIdx)) {
            _tprintf (_T(" | "));
            }
         }
      _tprintf (_T("\n"));
      }
   }
}

int _tmain(int argc, _TCHAR* argv[])
{
wchar_t* CommandLineUnicode = GetCommandLineW();
wprintf(L"Unicode Command Line (from GetCommandLineW): [%s]\n\n", CommandLineUnicode);
DumpHex((unsigned char*)CommandLineUnicode, (2 * wcslen(CommandLineUnicode)) + 2);
_tprintf (_T("\n\n"));

char* CommandLineAnsi = GetCommandLineA();
printf("ANSI Command Line (from GetCommandLineA): [%s]\n\n", CommandLineAnsi);
DumpHex((unsigned char*)CommandLineAnsi, 1 + strlen(CommandLineAnsi));
_tprintf (_T("\n\n"));

int NumArgs = 0;
wchar_t** Args = CommandLineToArgvW(CommandLineUnicode, &NumArgs);
_tprintf (_T("CommandLineToArgvW Found %d Argument(s)\n"), NumArgs);
for (int arg = 0; arg < NumArgs; ++arg) {
    wprintf (L"   arg %s%d   = [%s]\n", ((NumArgs >= 10) && (arg < 10))?L" ":L"", arg, Args[arg]);
   }

_tprintf (_T("\nCommand Line Arguments From argv Array (argc = %d):\n"), argc);
for (int arg = 0; arg < argc; ++arg) {
    _tprintf (_T("   argv[%s%d] = [%s]\n"), ((argc >= 10) && (arg < 10))?" ":"", arg, argv[arg]);
   }
_tprintf (_T("\n"));
LocalFree(Args);

return 0;
}

The next utility, RunTest, lets you drive DumpArgs with specific command lines without worrying about how cmd.exe first mangles the command line.

You specify each command line in a file named CommandLines.txt which must be a plain ANSI text file (not Unicode) in the current directory. DumpArgs must also be in the current directory.

Put each command line on a separate line in CommandLines.txt exactly as you want DumpArgs to receive it. Empty lines are ignored and so are lines starting with a semicolon (in the first column), so you can include comments.

RunTest reads the each command line from CommandLines.txt and starts DumpArgs using CreateProcess, passing it the specified command line in the lpCommandLine parameter. It waits up to 5 seconds for DumpArgs to Finish. RunTest explicitly specifies DumpArgs.exe in lpApplicationName and does not insert DumpArgs.exe at the beginning of lpCommandLine so that you can test what happens if you pass something other than the executable name as the first command line argument (for example, to see how the first argument is parsed differently).

As I did for DumpArgs, I again show the source code here and you can download a zip archive containing pre-built executables (for both DumpArgs and RunTest), source code and a Visual Studio project for RunTest and a sample CommandLines.txt file.

// RunTest.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
/*
 * Very simple-minded read line function:
 *
 *    - Writes the next line into Buffer and returns true if more data, false if no data or error.
 *    - Trusts that Buffer and BufSize are both valid.
 *    - Always NULL terminates Buffer, unless Buffer is 0 bytes.
 *    - Works only with ANSI file (not Unicode).
 *    - Skips empty lines and lines that start with ";" (for comments)
 *    - Silently truncates line if Buffer too small (but consumes entire line).
 *
 */
bool ReadLine (HANDLE hFile, char* Buffer, int BufSize)
{
bool Result = false;
try {
   if (BufSize >= 1) {
      int BytesWritten = 0;
      char* BufPtr = Buffer;
      while (1) {
         char Char;
         DWORD BytesRead;
         if ((ReadFile(hFile, &Char, sizeof(char), &BytesRead, NULL) != 0) && (BytesRead == sizeof(char))) {
            if (Char == 0x0a) {
               if (BytesWritten > 0) {
                  if (Buffer[0] == ';') { // Ignore comments
                     Result = false;
                     BytesWritten = 0;
                     BufPtr = Buffer;
                     }
                  else {
                     break;
                     }
                  }
               }
            else if ((Char != 0x0d) && (BytesWritten++ < (BufSize - 1))) {
               *BufPtr++ = Char;
               Result = true;
               }
            }
         else {
            break;
            }
         }
      *BufPtr = '\0';
      }
   }
catch (...)
   {
   Result = false;
   }
return Result;
}

int _tmain(int argc, _TCHAR* argv[])
{
HANDLE hFile = CreateFileA (".\\CommandLines.txt", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
if (hFile == INVALID_HANDLE_VALUE) {
   _tprintf(_T("\n\nERROR: Could not find CommandLines.txt in the current directory\n\n"));
   return 1;
   }
while (1) {
   char CommandLine[1024];
   if (ReadLine (hFile, CommandLine, 1024) == false) {
      break;
      }
   // Create the process:
   STARTUPINFOA si;
   PROCESS_INFORMATION pi;
   memset (&si, 0, sizeof(si));
   memset (&pi, 0, sizeof(pi));
   GetStartupInfoA(&si);
   si.cb = sizeof(si);
   si.dwFlags = STARTF_USESHOWWINDOW;

   if (CreateProcessA (".\\DumpArgs.exe", CommandLine, NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi) != 0) {
      // Successfully created the process. Wait for it to finish:
      if (WaitForSingleObject(pi.hProcess, 5000) == WAIT_TIMEOUT) {
         _tprintf (_T("Timed out waiting for process to exit\n"));
         }
      }
   else {
      _tprintf (_T("CreateProcess failed\n"));
      }
   }
CloseHandle (hFile);
return 0;
}

Now that you understand how a command line is given to a program, and have some tools for looking at things, we can begin next time to look closely at exactly how a command is parsed.

Источник