ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_strASCII::setAsciiFromEscaped Interface Reference

Return the input C-style escaped string where all instances of C-style escape sequences are converted to the corresponding ASCII characters or left intact if they are not convertible.
More...

Detailed Description

Return the input C-style escaped string where all instances of C-style escape sequences are converted to the corresponding ASCII characters or left intact if they are not convertible.

Escape sequences are used in the programming languages C and C++ whose conventions are also followed by many other languages.
An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal.
Instead, it is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.
In C, all escape sequences consist of two or more characters, the first of which is the backslash, \ (called the escape character).
The remaining characters determine the interpretation of the escape sequence. For example, \n is an escape sequence that denotes a newline character.
The following escape sequences are defined in standard C which commonly (but not always) represent an ASCII (frequently nongraphical) character:

Escape sequence ASCII Octal ASCII Decimal ASCII Hex ASCII Character Representation
\(\ms{\a}\) 07 07 07 Alert (Beep, Bell) (added in C89)
\(\ms{\b}\) 10 08 08 Backspace
\(\ms{\f}\) 14 12 0C Formfeed Page Break
\(\ms{\n}\) 12 10 0A Newline (Line Feed)
\(\ms{\r}\) 15 13 0D Carriage Return
\(\ms{\t}\) 11 09 09 Horizontal Tab
\(\ms{\v}\) 13 11 0B Vertical Tab
\(\ms{\\}\) 134 92 5C Backslash
\(\ms{\'}\) 47 39 27 Apostrophe or single quotation mark
\(\ms{\"}\) 42 34 22 Double quotation mark
\(\ms{\?}\) 77 63 3F Question mark (used to avoid trigraphs)
\(\ms{\nnn}\) any any any The byte whose numerical value is given by nnn interpreted as an octal number
\(\ms{\xhh…}\) any any any The byte whose numerical value is given by hh… interpreted as a hexadecimal number
\(\ms{\uhhhh}\) none none none Unicode code point below 10000 hexadecimal (added in C99)
\(\ms{\Uhhhhhhhh}\) none none none Unicode code point where h is a hexadecimal digit

The following remarks are in order:

Octal Escape Sequences

An octal escape sequence consists of \ followed by one, two, or three octal digits.

  1. The octal escape sequence ends when it either contains three octal digits already, or the next character is not an octal digit.
    For example,
    1. \11 is a single octal escape sequence denoting a byte with numerical value 9 (11 in octal), rather than the escape sequence \1 followed by the digit 1.
    2. However, \1111 is the octal escape sequence \111 followed by the digit 1.
  2. In order to denote the byte with numerical value 1, followed by the digit 1, one could use "\1""1", since C automatically concatenates adjacent string literals.
  3. Some three-digit octal escape sequences may be too large to fit in a single byte.
    This results in an implementation-defined value for the byte actually produced.
  4. The escape sequence \0 is a commonly used octal escape sequence, which denotes the null character, with value zero.
  5. The procedures of this generic interface leave any octal sequence that is non-convertible to ASCII character intact (as is) in the output.

Hex Escape Sequences

A hex escape sequence must have at least one hex digit following \x, with no upper bound.

  1. It continues for as many hex digits as there are.
    For example, \xABCDEFG denotes the byte with the numerical value ABCDEF16, followed by the letter G, which is not a hex digit.
  2. However, if the resulting integer value is too large to fit in a single byte, the actual numerical value assigned is implementation-defined.
    Most platforms have 8-bit character types, which limits a useful hex escape sequence to two hex digits.
    However, hex escape sequences longer than two hex digits might be useful inside a wide character or wide string.
  3. The procedures of this generic interface leave any hex sequence that is non-convertible to ASCII character intact (as is) in the output.
  4. The Hex alphabetical digits, if any are present, must be upper-case letters.

Universal Character Names

The C99 standard also supports escape sequences that denote Unicode code points in string literals.
Such escape sequences are called universal character names and have the form \uhhhh or \Uhhhhhhhh, where h stands for a hex digit.

  1. Unlike other escape sequences considered, a universal character name may expand into more than one code unit.
  2. The sequence \uhhhh denotes the code point hhhh, interpreted as a hexadecimal number.
  3. The sequence \Uhhhhhhhh denotes the code point hhhhhhhh, interpreted as a hexadecimal number.
  4. The code points located at U+10000 or higher must be denoted with the \U syntax, whereas lower code points may use \u or \U.
  5. The code point is converted into a sequence of code units in the encoding of the destination type on the target system.
  6. The procedures of this generic interface leave any UCN that is non-convertible to ASCII character intact (as is) in the output.
  7. The Hex alphabetical digits in the UCN, if any are present, must be upper-case letters.

This functionality of this generic interface is highly useful for handling and maniulating C-style strings, for example, in processing the output of runtime shell commands, or processing user-specified strings that may or should contain nongraphical ASCII characters in a portable way.

Parameters
[in,out]str: The scalar character of kind any supported by the processor (e.g., SK, SKA, SKD , or SKU) containing the C-style escaped string to be converted to ASCII.
  1. If the output argument ascii is present, then str has intent(in) attribute.
  2. If the output argument ascii is missing, then str has intent(inout) attribute.
    On output, the contents of str(1:endloc) will be overwritten with the ASCII equivalent.
[out]ascii: The output scalar of the same type and kind as the input str of length type parameter equal to or larger than the length of str, containing the input string where all instances of escape seuquences with an ASCII representation are replaced with their corresponding ASCII character.
(optional. If missing, the output ASCII will be written to str.)
[out]endloc: The output scalar of type integer of default kind IK, containing the position of the last character of the resulting ASCII-converted string (in either str or ascii).


Possible calling interfaces

use pm_kind, only: IK
character(len(str,IK), kind(str)) :: ascii
integer(IK) :: endloc
call setAsciiFromEscaped(str, endloc)
call setAsciiFromEscaped(str, ascii, endloc)
Return the input C-style escaped string where all instances of C-style escape sequences are converted...
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540
This module contains the uncommon and hardly representable ASCII characters as well as procedures for...
Definition: pm_strASCII.F90:61
Warning
The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds. The impurity of the procedures is caused by the dependence on other conditionally impure procedure.
Note
Returning the output inside the input str should generally lead to faster runtime performance.
See also
getAsciiFromEscaped
isFailedList


Example usage

1program example
2
3 use pm_kind, only: IK
4 use pm_kind, only: SK ! all other processor kinds are also supported.
5 use pm_io, only: display_type
7
8 implicit none
9
10 character(:), allocatable :: strascii
11 character(255) :: str, ascii
12 integer(IK) :: endloc
13
14 type(display_type) :: disp
15 disp = display_type(file = "main.out.F90")
16
17 call disp%skip()
18 call disp%show("str = ''")
19 str = ''
20 call disp%show("call setAsciiFromEscaped(trim(str), ascii, endloc)")
21 call setAsciiFromEscaped(trim(str), ascii, endloc)
22 call disp%show("endloc")
23 call disp%show( endloc )
24 call disp%show("ascii(1:endloc)")
25 call disp%show( ascii(1:endloc) , deliml = """" )
26 call disp%skip()
27
28 call disp%skip()
29 call disp%show("strascii = trim(str)")
30 strascii = trim(str)
31 call disp%show("call setAsciiFromEscaped(strascii, endloc)")
32 call setAsciiFromEscaped(strascii, endloc)
33 call disp%show("endloc")
34 call disp%show( endloc )
35 call disp%show("strascii(1:endloc)")
36 call disp%show( strascii(1:endloc) , deliml = """" )
37 call disp%show("strascii(1:endloc) == ascii(1:endloc)")
38 call disp%show( strascii(1:endloc) == ascii(1:endloc) )
39 call disp%skip()
40
41 call disp%skip()
42 call disp%show("str = 'A'")
43 str = 'A'
44 call disp%show("call setAsciiFromEscaped(trim(str), ascii, endloc)")
45 call setAsciiFromEscaped(trim(str), ascii, endloc)
46 call disp%show("endloc")
47 call disp%show( endloc )
48 call disp%show("ascii(1:endloc)")
49 call disp%show( ascii(1:endloc) , deliml = """" )
50 call disp%skip()
51
52 call disp%skip()
53 call disp%show("strascii = trim(str)")
54 strascii = trim(str)
55 call disp%show("call setAsciiFromEscaped(strascii, endloc)")
56 call setAsciiFromEscaped(strascii, endloc)
57 call disp%show("endloc")
58 call disp%show( endloc )
59 call disp%show("strascii(1:endloc)")
60 call disp%show( strascii(1:endloc) , deliml = """" )
61 call disp%show("strascii(1:endloc) == ascii(1:endloc)")
62 call disp%show( strascii(1:endloc) == ascii(1:endloc) )
63 call disp%skip()
64
65 call disp%skip()
66 call disp%show("str = '\paramonte'")
67 str = '\paramonte'
68 call disp%show("call setAsciiFromEscaped(trim(str), ascii, endloc)")
69 call setAsciiFromEscaped(trim(str), ascii, endloc)
70 call disp%show("endloc")
71 call disp%show( endloc )
72 call disp%show("ascii(1:endloc)")
73 call disp%show( ascii(1:endloc) , deliml = """" )
74 call disp%skip()
75
76 call disp%skip()
77 call disp%show("strascii = trim(str)")
78 strascii = trim(str)
79 call disp%show("call setAsciiFromEscaped(strascii, endloc)")
80 call setAsciiFromEscaped(strascii, endloc)
81 call disp%show("endloc")
82 call disp%show( endloc )
83 call disp%show("strascii(1:endloc)")
84 call disp%show( strascii(1:endloc) , deliml = """" )
85 call disp%show("strascii(1:endloc) == ascii(1:endloc)")
86 call disp%show( strascii(1:endloc) == ascii(1:endloc) )
87 call disp%skip()
88
89 call disp%skip()
90 call disp%show("str = '\\nn\004\t'")
91 str = '\\nn\004\t'
92 call disp%show("call setAsciiFromEscaped(trim(str), ascii, endloc)")
93 call setAsciiFromEscaped(trim(str), ascii, endloc)
94 call disp%show("endloc")
95 call disp%show( endloc )
96 call disp%show("ascii(1:endloc)")
97 call disp%show( ascii(1:endloc) , deliml = """" )
98 call disp%skip()
99
100 call disp%skip()
101 call disp%show("strascii = trim(str)")
102 strascii = trim(str)
103 call disp%show("call setAsciiFromEscaped(strascii, endloc)")
104 call setAsciiFromEscaped(strascii, endloc)
105 call disp%show("endloc")
106 call disp%show( endloc )
107 call disp%show("strascii(1:endloc)")
108 call disp%show( strascii(1:endloc) , deliml = """" )
109 call disp%show("strascii(1:endloc) == ascii(1:endloc)")
110 call disp%show( strascii(1:endloc) == ascii(1:endloc) )
111 call disp%skip()
112
113 call disp%skip()
114 call disp%show("str = '\nn\fn\04\? \r\b\\a\47\41\x009f\xZFa\u002C\U0000007E\U+2661'")
115 str = '\nn\fn\04\? \r\b\\a\47\41\x009f\xZFa\u002C\U0000007E\U+2661'
116 call disp%show("call setAsciiFromEscaped(trim(str), ascii, endloc)")
117 call setAsciiFromEscaped(trim(str), ascii, endloc)
118 call disp%show("endloc")
119 call disp%show( endloc )
120 call disp%show("ascii(1:endloc)")
121 call disp%show( ascii(1:endloc) , deliml = """" )
122 call disp%skip()
123
124 call disp%skip()
125 call disp%show("strascii = trim(str)")
126 strascii = trim(str)
127 call disp%show("call setAsciiFromEscaped(strascii, endloc)")
128 call setAsciiFromEscaped(strascii, endloc)
129 call disp%show("endloc")
130 call disp%show( endloc )
131 call disp%show("strascii(1:endloc)")
132 call disp%show( strascii(1:endloc) , deliml = """" )
133 call disp%show("strascii(1:endloc) == ascii(1:endloc)")
134 call disp%show( strascii(1:endloc) == ascii(1:endloc) )
135 call disp%skip()
136
137end program example
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example output
1
2str = ''
3call setAsciiFromEscaped(trim(str), ascii, endloc)
4endloc
5+0
6ascii(1:endloc)
7""
8
9
10strascii = trim(str)
11call setAsciiFromEscaped(strascii, endloc)
12endloc
13+0
14strascii(1:endloc)
15""
16strascii(1:endloc) == ascii(1:endloc)
17T
18
19
20str = 'A'
21call setAsciiFromEscaped(trim(str), ascii, endloc)
22endloc
23+1
24ascii(1:endloc)
25"A"
26
27
28strascii = trim(str)
29call setAsciiFromEscaped(strascii, endloc)
30endloc
31+1
32strascii(1:endloc)
33"A"
34strascii(1:endloc) == ascii(1:endloc)
35T
36
37
38str = '\paramonte'
39call setAsciiFromEscaped(trim(str), ascii, endloc)
40endloc
41+10
42ascii(1:endloc)
43"\paramonte"
44
45
46strascii = trim(str)
47call setAsciiFromEscaped(strascii, endloc)
48endloc
49+10
50strascii(1:endloc)
51"\paramonte"
52strascii(1:endloc) == ascii(1:endloc)
53T
54
55
56str = '\\nn\004\t'
57call setAsciiFromEscaped(trim(str), ascii, endloc)
58endloc
59+5
60ascii(1:endloc)
61"\nn␄ "
62
63
64strascii = trim(str)
65call setAsciiFromEscaped(strascii, endloc)
66endloc
67+5
68strascii(1:endloc)
69"\nn␄ "
70strascii(1:endloc) == ascii(1:endloc)
71T
72
73
74str = '\nn\fn\04\? \r\b\\a\47\41\x009f\xZFa\u002C\U0000007E\U+2661'
75call setAsciiFromEscaped(trim(str), ascii, endloc)
76endloc
77+29
78ascii(1:endloc)
79"
80n␌n␄?
81␈\a'! f\xZFa,~\U+2661"
82
83
84strascii = trim(str)
85call setAsciiFromEscaped(strascii, endloc)
86endloc
87+29
88strascii(1:endloc)
89"
90n␌n␄?
91␈\a'! f\xZFa,~\U+2661"
92strascii(1:endloc) == ascii(1:endloc)
93T
94
95
Test:
test_pm_strASCII
Todo:
High Priority: A performance benchmarking of the different interfaces of this generic interface should be added in the future.


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2017, 11:02 PM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 4733 of file pm_strASCII.F90.


The documentation for this interface was generated from the following file: