Return the input C-style escaped string where all instances of C-style escape sequences are converted to the corresponding ASCII characters or left intact if they are not convertible.
More...
Return the input C-style escaped string where all instances of C-style escape sequences are converted to the corresponding ASCII characters or left intact if they are not convertible.
Escape sequences are used in the programming languages C and C++ whose conventions are also followed by many other languages.
An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal.
Instead, it is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.
In C, all escape sequences consist of two or more characters, the first of which is the backslash, \
(called the escape character).
The remaining characters determine the interpretation of the escape sequence. For example, \n
is an escape sequence that denotes a newline character.
The following escape sequences are defined in standard C which commonly (but not always) represent an ASCII (frequently nongraphical) character:
Escape sequence | ASCII Octal | ASCII Decimal | ASCII Hex | ASCII Character Representation |
\(\ms{\a}\) | 07 | 07 | 07 | Alert (Beep, Bell) (added in C89) |
\(\ms{\b}\) | 10 | 08 | 08 | Backspace |
\(\ms{\f}\) | 14 | 12 | 0C | Formfeed Page Break |
\(\ms{\n}\) | 12 | 10 | 0A | Newline (Line Feed) |
\(\ms{\r}\) | 15 | 13 | 0D | Carriage Return |
\(\ms{\t}\) | 11 | 09 | 09 | Horizontal Tab |
\(\ms{\v}\) | 13 | 11 | 0B | Vertical Tab |
\(\ms{\\}\) | 134 | 92 | 5C | Backslash |
\(\ms{\'}\) | 47 | 39 | 27 | Apostrophe or single quotation mark |
\(\ms{\"}\) | 42 | 34 | 22 | Double quotation mark |
\(\ms{\?}\) | 77 | 63 | 3F | Question mark (used to avoid trigraphs) |
\(\ms{\nnn}\) | any | any | any | The byte whose numerical value is given by nnn interpreted as an octal number |
\(\ms{\xhh…}\) | any | any | any | The byte whose numerical value is given by hh… interpreted as a hexadecimal number |
\(\ms{\uhhhh}\) | none | none | none | Unicode code point below 10000 hexadecimal (added in C99) |
\(\ms{\Uhhhhhhhh}\) | none | none | none | Unicode code point where h is a hexadecimal digit |
The following remarks are in order:
Octal Escape Sequences
An octal escape sequence consists of \
followed by one, two, or three octal digits.
-
The octal escape sequence ends when it either contains three octal digits already, or the next character is not an octal digit.
For example,
-
\11
is a single octal escape sequence denoting a byte with numerical value 9
(11
in octal), rather than the escape sequence \1
followed by the digit 1
.
-
However,
\1111
is the octal escape sequence \111
followed by the digit 1
.
-
In order to denote the byte with numerical value
1
, followed by the digit 1
, one could use "\1""1"
, since C automatically concatenates adjacent string literals.
-
Some three-digit octal escape sequences may be too large to fit in a single byte.
This results in an implementation-defined value for the byte actually produced.
-
The escape sequence
\0
is a commonly used octal escape sequence, which denotes the null character, with value zero.
-
The procedures of this generic interface leave any octal sequence that is non-convertible to ASCII character intact (as is) in the output.
Hex Escape Sequences
A hex escape sequence must have at least one hex digit following \x
, with no upper bound.
-
It continues for as many hex digits as there are.
For example, \xABCDEFG
denotes the byte with the numerical value ABCDEF16
, followed by the letter G
, which is not a hex digit.
-
However, if the resulting integer value is too large to fit in a single byte, the actual numerical value assigned is implementation-defined.
Most platforms have 8
-bit character types, which limits a useful hex escape sequence to two hex digits.
However, hex escape sequences longer than two hex digits might be useful inside a wide character or wide string.
-
The procedures of this generic interface leave any hex sequence that is non-convertible to ASCII character intact (as is) in the output.
-
The Hex alphabetical digits, if any are present, must be upper-case letters.
Universal Character Names
The C99 standard also supports escape sequences that denote Unicode code points in string literals.
Such escape sequences are called universal character names and have the form \uhhhh
or \Uhhhhhhhh
, where h
stands for a hex digit.
-
Unlike other escape sequences considered, a universal character name may expand into more than one code unit.
-
The sequence
\uhhhh
denotes the code point hhhh
, interpreted as a hexadecimal number.
-
The sequence
\Uhhhhhhhh
denotes the code point hhhhhhhh
, interpreted as a hexadecimal number.
-
The code points located at
U+10000
or higher must be denoted with the \U
syntax, whereas lower code points may use \u
or \U
.
-
The code point is converted into a sequence of code units in the encoding of the destination type on the target system.
-
The procedures of this generic interface leave any UCN that is non-convertible to ASCII character intact (as is) in the output.
-
The Hex alphabetical digits in the UCN, if any are present, must be upper-case letters.
This functionality of this generic interface is highly useful for handling and maniulating C-style strings, for example, in processing the output of runtime shell commands, or processing user-specified strings that may or should contain nongraphical ASCII characters in a portable way.
- Parameters
-
[in,out] | str | : The scalar character of kind any supported by the processor (e.g., SK, SKA, SKD , or SKU) containing the C-style escaped string to be converted to ASCII.
- If the output argument
ascii is present, then str has intent(in) attribute.
- If the output argument
ascii is missing, then str has intent(inout) attribute.
On output, the contents of str(1:endloc) will be overwritten with the ASCII equivalent.
|
[out] | ascii | : The output scalar of the same type and kind as the input str of length type parameter equal to or larger than the length of str , containing the input string where all instances of escape seuquences with an ASCII representation are replaced with their corresponding ASCII character.
(optional. If missing, the output ASCII will be written to str .) |
[out] | endloc | : The output scalar of type integer of default kind IK, containing the position of the last character of the resulting ASCII-converted string (in either str or ascii ).
|
Possible calling interfaces ⛓
character(len(str,IK), kind(str)) :: ascii
integer(IK) :: endloc
Return the input C-style escaped string where all instances of C-style escape sequences are converted...
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
This module contains the uncommon and hardly representable ASCII characters as well as procedures for...
- Warning
- The
pure
procedure(s) documented herein become impure
when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1
.
By default, these procedures are pure
in release
build and impure
in debug
and testing
builds. The impurity of the procedures is caused by the dependence on other conditionally impure
procedure.
- Note
- Returning the output inside the input
str
should generally lead to faster runtime performance.
- See also
- getAsciiFromEscaped
isFailedList
Example usage ⛓
10 character(:),
allocatable :: strascii
11 character(
255) :: str, ascii
14 type(display_type) :: disp
20 call disp%show(
"call setAsciiFromEscaped(trim(str), ascii, endloc)")
25 call disp%show( ascii(
1:endloc) , deliml
= """" )
29 call disp%show(
"strascii = trim(str)")
31 call disp%show(
"call setAsciiFromEscaped(strascii, endloc)")
36 call disp%show( strascii(
1:endloc) , deliml
= """" )
37 call disp%show(
"strascii(1:endloc) == ascii(1:endloc)")
38 call disp%show( strascii(
1:endloc)
== ascii(
1:endloc) )
44 call disp%show(
"call setAsciiFromEscaped(trim(str), ascii, endloc)")
49 call disp%show( ascii(
1:endloc) , deliml
= """" )
53 call disp%show(
"strascii = trim(str)")
55 call disp%show(
"call setAsciiFromEscaped(strascii, endloc)")
60 call disp%show( strascii(
1:endloc) , deliml
= """" )
61 call disp%show(
"strascii(1:endloc) == ascii(1:endloc)")
62 call disp%show( strascii(
1:endloc)
== ascii(
1:endloc) )
68 call disp%show(
"call setAsciiFromEscaped(trim(str), ascii, endloc)")
73 call disp%show( ascii(
1:endloc) , deliml
= """" )
77 call disp%show(
"strascii = trim(str)")
79 call disp%show(
"call setAsciiFromEscaped(strascii, endloc)")
84 call disp%show( strascii(
1:endloc) , deliml
= """" )
85 call disp%show(
"strascii(1:endloc) == ascii(1:endloc)")
86 call disp%show( strascii(
1:endloc)
== ascii(
1:endloc) )
92 call disp%show(
"call setAsciiFromEscaped(trim(str), ascii, endloc)")
97 call disp%show( ascii(
1:endloc) , deliml
= """" )
101 call disp%show(
"strascii = trim(str)")
103 call disp%show(
"call setAsciiFromEscaped(strascii, endloc)")
107 call disp%show(
"strascii(1:endloc)")
108 call disp%show( strascii(
1:endloc) , deliml
= """" )
109 call disp%show(
"strascii(1:endloc) == ascii(1:endloc)")
110 call disp%show( strascii(
1:endloc)
== ascii(
1:endloc) )
114 call disp%show(
"str = '\nn\fn\04\? \r\b\\a\47\41\x009f\xZFa\u002C\U0000007E\U+2661'")
115 str
= '\nn\fn\04\? \r\b\\a\47\41\x009f\xZFa\u002C\U0000007E\U+2661'
116 call disp%show(
"call setAsciiFromEscaped(trim(str), ascii, endloc)")
121 call disp%show( ascii(
1:endloc) , deliml
= """" )
125 call disp%show(
"strascii = trim(str)")
127 call disp%show(
"call setAsciiFromEscaped(strascii, endloc)")
131 call disp%show(
"strascii(1:endloc)")
132 call disp%show( strascii(
1:endloc) , deliml
= """" )
133 call disp%show(
"strascii(1:endloc) == ascii(1:endloc)")
134 call disp%show( strascii(
1:endloc)
== ascii(
1:endloc) )
This is a generic method of the derived type display_type with pass attribute.
This is a generic method of the derived type display_type with pass attribute.
This module contains classes and procedures for input/output (IO) or generic display operations on st...
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Generate and return an object of type display_type.
Example Unix compile command via Intel ifort
compiler ⛓
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example Windows Batch compile command via Intel ifort
compiler ⛓
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
Example Unix / MinGW compile command via GNU gfortran
compiler ⛓
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example output ⛓
16strascii(
1:endloc)
== ascii(
1:endloc)
34strascii(
1:endloc)
== ascii(
1:endloc)
52strascii(
1:endloc)
== ascii(
1:endloc)
70strascii(
1:endloc)
== ascii(
1:endloc)
74str
= '\nn\fn\04\? \r\b\\a\47\41\x009f\xZFa\u002C\U0000007E\U+2661'
92strascii(
1:endloc)
== ascii(
1:endloc)
- Test:
- test_pm_strASCII
- Todo:
- High Priority: A performance benchmarking of the different interfaces of this generic interface should be added in the future.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
-
If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
-
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
- Copyright
- Computational Data Science Lab
- Author:
- Amir Shahmoradi, September 1, 2017, 11:02 PM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin
Definition at line 4733 of file pm_strASCII.F90.