Generate and return the best-guess field separator of a (sequential-access) file stored in the input file
.
More...
Generate and return the best-guess field separator of a (sequential-access) file stored in the input file
.
The current algorithm takes an input string from the user containing the set of best-guess single-character separators.
It then counts the frequency of the separators in the first two lines from the current file position.
If the file format is list-directed or CSV, the searcher will skip singly or doubly quoted strings in each line.
The search will be also skipped into the next line if an opening quotation does not close within the same line (characteristic of CSV files).
Finally, the first input separator whose frequency is the same in both lines is returned as the potential field separator of the file.
At the end of the search, the file position is returned to the original point at entry to the file.
- Parameters
-
[in] | file | : The input scalar character of default kind SK containing the path to the file whose separator is to be inferred.
On input, the file position will be rewound to the starting point in the file and the file is closed upon return.
If search from a particular point in the file is desirable, open the file first and pass the file unit as input to this generic interface.
(optional. It must be present if and only if the input argument unit is missing.) |
[in] | unit | : The input scalar integer of default kind IK containing the unit of the connected file whose separator is to be inferred.
(optional. It must be present if and only if the input argument file is missing.) |
[in] | seps | : The input object that can be,
-
a scalar of type
character of default kind SK of arbitrary length type parameter, each character of which will be considered as a potential field separator in the file records.
This form of inputting seps is useful when all potential separators are single characters.
-
a vector of type css_type container of arbitrary size, each element of which will be considered as a potential field separator in the file records.
This form of inputting seps is useful when the potential separators have differing length type parameters.
If there is only one separator candidate, simply wrap the scalar container in the Fortran intrinsic vector notation [seps] and pass it to the interface.
In such a case, the output sep is either
-
the input
seps implying that sep is the potential separator in the file record, or
-
is an empty string implying that the input
seps is not the record separator, or otherwise, an error occurred.
|
[in] | form | : The input scalar constant that can be any of the following:
-
the constant csv or an object of type csv_type, implying that the contents quoted strings must be excluded from the search for the field separators.
-
the constant fld or an object of type fld_type, implying that the contents quoted strings must be excluded from the search for the field separators.
Additionally, any multiple adjacent appearances of (unquoted) blank characters will be counted as one separator instance.
(optional. default = unknown) |
[out] | nfield | : The input scalar integer of default kind IK containing the number of fields (i.e., the number of output sep instances per file record plus 1 ) identified in each row of file, separated by the output separator.
If no separator is identified (i.e., the output sep is empty) and no runtime IO error occurs, the output value for nfield is 1 implying that there is only one field in the file record.
(optional.) |
[in,out] | iomsg | : The input/output scalar character of default kind SK containing the error message, if any error occurs.
A length type parameter value of LEN_IOMSG is generally sufficient for iomsg to contain the output error messages.
(optional. If missing, no information other than an empty output sep will be given if the algorithm fails.) |
- Returns
sep
: The output allocatable
scalar character
of default kind SK that will contain the inferred separator if the algorithm returns successfully.
Otherwise, the output sep
will be set to an empty string if,
-
the target file does not exist,
-
an IO error occurs while reading the file,
-
the algorithm fails to detect the separator.
Possible calling interfaces ⛓
character(:, SK), allocatable :: sep
sep
= getFieldSep(unit, seps, form, nfield,
iomsg = iomsg)
sep
= getFieldSep(file, seps, form, nfield,
iomsg = iomsg)
Generate and return the best-guess field separator of a (sequential-access) file stored in the input ...
This module contains classes and procedures for input/output (IO) or generic display operations on st...
- Warning
- The condition
0 < len(seps)
must hold for the corresponding input arguments.
The condition 0 < size(seps)
must hold for the corresponding input arguments.
The condition all(0 < [(len(seps(i)val), i = 1, size(seps))]
must hold for the corresponding input arguments.
- See also
- setRecordFrom
getContentsFrom
setContentsFrom
isPreconnected
getFileUnit
Example usage ⛓
3 use iso_fortran_env,
only:
output_unit,
input_unit,
error_unit
17 character(:, SK),
allocatable :: file
18 type(display_type) :: disp
23 call disp%show(
"!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
24 call disp%show(
"!Search for single-character field separators.")
25 call disp%show(
"!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
30 character(
2,SKG),
allocatable :: table(:,:)
32 call disp%show(
"table = getUnifRand('aa', 'zz', 3_IK, 6_IK)")
35 call disp%show( table , deliml
= SK_
"""" )
38 call disp%show(
"if (0 /= getErrTableWrite(file, table, deliml = SKG_'''')) error stop 'table write failed.'")
39 if (
0 /= getErrTableWrite(file, table, deliml
= SKG_
''''))
error stop 'table write failed.'
40 call disp%show(
"getContentsFrom(file)")
42 call disp%show(
"getFieldSep(file, seps = SKG_';|, ')")
44 call disp%show(
"getFieldSep(file, seps = SKG_';|, ', form = csv)")
46 call disp%show(
"getFieldSep(file, seps = SKG_';|, ', form = csv, nfield = nfield)")
51 call disp%show(
"call setContentsTo(file, contents = repeat(SK_'""a"", '//getStr([1, 2, 3])//NLC, 2_IK))")
53 call disp%show(
"getContentsFrom(file)")
55 call disp%show(
"getFieldSep(file, seps = SKG_', ')")
57 call disp%show(
"getFieldSep(file, seps = SKG_' ,')")
60 call disp%show(
"call setContentsTo(file, contents = repeat(SK_'(1, -1), (2, -2), (3, -3)'//NLC, 2_IK))")
61 call setContentsTo(file, contents
= repeat(SK_
'(1, -1), (2, -2), (3, -3)'//NLC,
2_IK))
62 call disp%show(
"getContentsFrom(file)")
64 call disp%show(
"getFieldSep(file, seps = SKG_', ', nfield = nfield)")
68 call disp%show(
"getFieldSep(file, seps = SKG_' ,', nfield = nfield)")
72 call disp%show(
"getFieldSep(file, seps = SKG_', ', form = fld, nfield = nfield)")
76 call disp%show(
"getFieldSep(file, seps = SKG_' ,', form = fld, nfield = nfield)")
81 call disp%show(
"call setContentsTo(file, contents = SK_'""a,"",'//getStr([1, 2, 3])//NLC//SK_'""a"",'//getStr([1, 2, 3])//NLC)")
83 call disp%show(
"getContentsFrom(file)")
85 call disp%show(
"getFieldSep(file, seps = SKG_', ')")
87 call disp%show(
"getFieldSep(file, seps = SKG_', ', form = csv) ! quoted strings are respected in csv format.")
89 call disp%show(
"getFieldSep(file, seps = SKG_' ,')")
92 call disp%show(
"call setContentsTo(file, contents = SK_'""a"" '//getStr([1, 2, 3])//NLC//SK_'""a"" '//getStr([1, 2, 3])//NLC)")
94 call disp%show(
"getContentsFrom(file)")
96 call disp%show(
"getFieldSep(file, seps = SKG_' ')")
98 call disp%show(
"getFieldSep(file, seps = SKG_' ', form = fld) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
100 call disp%show(
"getFieldSep(file, seps = SKG_' ', form = fld, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
105 call disp%show(
"call setContentsTo(file, contents = SK_'""a'//NLC//'"" '//getStr([1, 2, 3])//NLC//SK_'""a"" '//getStr([1, 2, 3])//NLC) ! double-line csv field")
107 call disp%show(
"getContentsFrom(file)")
109 call disp%show(
"getFieldSep(file, seps = SKG_',')")
111 call disp%show(
"getFieldSep(file, seps = SKG_',', form = csv) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (csv) format.")
113 call disp%show(
"getFieldSep(file, seps = SKG_',', form = csv, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (csv) format.")
117 call disp%show(
"getFieldSep(file, seps = SKG_',', form = csv, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (csv) format.")
122 call disp%show(
"call setContentsTo(file, contents = SK_'""a'//NLC//NLC//'"" '//getStr([1, 2, 3])//NLC//SK_'""a"" '//getStr([1, 2, 3])//NLC) ! three-line FLD field")
124 call disp%show(
"getContentsFrom(file)")
126 call disp%show(
"getFieldSep(file, seps = SKG_' ')")
128 call disp%show(
"getFieldSep(file, seps = SKG_' ', form = fld) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
130 call disp%show(
"getFieldSep(file, seps = SKG_' ', form = fld, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
134 call disp%show(
"getFieldSep(file, seps = SKG_',', form = fld, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
142 call disp%show(
"!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
143 call disp%show(
"!Retrieve whole string field separators.")
144 call disp%show(
"!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
148 use pm_io,
only: css_type
150 character(
2,SKG),
allocatable :: table(:,:)
151 type(css_type),
allocatable :: seps(:)
153 call disp%show(
"table = getUnifRand('aa', 'zz', 3_IK, 6_IK)")
156 call disp%show( table , deliml
= SK_
"""" )
157 call disp%show(
"file = 'temp.temp'")
159 call disp%show(
"if (0 /= getErrTableWrite(file, table, deliml = SKG_'''')) error stop 'table write failed.'")
160 if (
0 /= getErrTableWrite(file, table, deliml
= SKG_
''''))
error stop 'table write failed.'
161 call disp%show(
"getContentsFrom(file)")
163 call disp%show(
"seps = [css_type(SK_'|||'), css_type(SK_',')]")
164 seps
= [css_type(SK_
'|||'), css_type(SK_
',')]
166 call disp%show( seps , deliml
= SK_
"""" )
167 call disp%show(
"getFieldSep(file, seps)")
169 call disp%show(
"getFieldSep(file, seps, form = csv)")
171 call disp%show(
"getFieldSep(file, seps, form = csv, nfield = nfield)")
176 call disp%show(
"call setContentsTo(file, contents = repeat(SK_'""a"", '//getStr([1, 2, 3])//NLC, 2_IK))")
178 call disp%show(
"getContentsFrom(file)")
180 call disp%show(
"seps = css_type([SK_',', SK_' '], trimmed = .true._LK)")
181 seps
= css_type([SK_
',', SK_
' '], trimmed
= .true._LK)
183 call disp%show( seps , deliml
= SK_
"""" )
184 call disp%show(
"getFieldSep(file, seps)")
186 call disp%show(
"getFieldSep(file, seps(2:1:-1))")
189 call disp%show(
"call setContentsTo(file, contents = repeat(SK_'(1, -1), (2, -2), (3, -3)'//NLC, 2_IK))")
190 call setContentsTo(file, contents
= repeat(SK_
'(1, -1), (2, -2), (3, -3)'//NLC,
2_IK))
191 call disp%show(
"getContentsFrom(file)")
193 call disp%show(
"getFieldSep(file, seps, nfield = nfield)")
197 call disp%show(
"getFieldSep(file, seps(2:1:-1), nfield = nfield)")
201 call disp%show(
"getFieldSep(file, seps, form = fld, nfield = nfield)")
205 call disp%show(
"getFieldSep(file, seps(2:1:-1), form = fld, nfield = nfield)")
210 call disp%show(
"call setContentsTo(file, contents = SK_'""a,"",'//getStr([1, 2, 3])//NLC//SK_'""a"",'//getStr([1, 2, 3])//NLC)")
212 call disp%show(
"getContentsFrom(file)")
214 call disp%show(
"getFieldSep(file, seps)")
216 call disp%show(
"getFieldSep(file, seps, form = csv) ! quoted strings are respected in csv format.")
218 call disp%show(
"getFieldSep(file, seps(2:1:-1))")
221 call disp%show(
"call setContentsTo(file, contents = SK_'""a"" '//getStr([1, 2, 3])//NLC//SK_'""a"" '//getStr([1, 2, 3])//NLC)")
223 call disp%show(
"getContentsFrom(file)")
225 call disp%show(
"getFieldSep(file, seps(2:2))")
227 call disp%show(
"getFieldSep(file, seps(2:2), form = fld) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
229 call disp%show(
"getFieldSep(file, seps(2:2), form = fld, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
234 call disp%show(
"call setContentsTo(file, contents = SK_'""a'//NLC//'"" '//getStr([1, 2, 3])//NLC//SK_'""a"" '//getStr([1, 2, 3])//NLC) ! double-line csv field")
236 call disp%show(
"getContentsFrom(file)")
238 call disp%show(
"getFieldSep(file, seps(1:1))")
240 call disp%show(
"getFieldSep(file, seps(1:1), form = csv) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (csv) format.")
242 call disp%show(
"getFieldSep(file, seps(1:1), form = csv, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (csv) format.")
246 call disp%show(
"getFieldSep(file, seps(1:1), form = csv, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (csv) format.")
251 call disp%show(
"call setContentsTo(file, contents = SK_'""a'//NLC//NLC//'"" '//getStr([1, 2, 3])//NLC//SK_'""a"" '//getStr([1, 2, 3])//NLC) ! three-line FLD field")
253 call disp%show(
"getContentsFrom(file)")
255 call disp%show(
"getFieldSep(file, seps(2:2))")
257 call disp%show(
"getFieldSep(file, seps(2:2), form = fld) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
259 call disp%show(
"getFieldSep(file, seps(2:2), form = fld, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
263 call disp%show(
"getFieldSep(file, seps(1:1), form = fld, nfield = nfield) ! multiple adjacent blanks count as a single separator in Fortran-list-directed (fld) format.")
Generate and return a scalar or a contiguous array of rank 1 of length s1 of randomly uniformly distr...
Generate and return the entire contents of the input unconnected file or the (remaining) contents of ...
Generate and return the iostat code resulting from writing the input table of rank 1 or 2 to the spec...
Write the input string contents to the input unconnected file.
This is a generic method of the derived type display_type with pass attribute.
This is a generic method of the derived type display_type with pass attribute.
Generate and return the conversion of the input value to an output Fortran string,...
This module contains classes and procedures for computing various statistical quantities related to t...
type(fld_type), parameter fld
This is a scalar parameter object of type fld_type that is exclusively used to signify the Fortran-li...
type(csv_type), parameter csv
This is a scalar parameter object of type csv_type that is exclusively used to signify the CSV file f...
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
This module contains classes and procedures for various string manipulations and inquiries.
character(*, SK), parameter NLC
The newline character of default kind SK as returned by new_line(SK_"a").
This module contains the generic procedures for converting values of different types and kinds to For...
Generate and return an object of type display_type.
Example Unix compile command via Intel ifort
compiler ⛓
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example Windows Batch compile command via Intel ifort
compiler ⛓
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
Example Unix / MinGW compile command via GNU gfortran
compiler ⛓
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example output ⛓
9"tl",
"be",
"yw",
"nn",
"by",
"uz"
10"ps",
"hw",
"kw",
"jh",
"dx",
"ae"
11"ax",
"xs",
"nd",
"xx",
"df",
"cm"
13if (
0 /= getErrTableWrite(file, table, deliml
= SKG_
''''))
error stop 'table write failed.'
15'tl',
'be',
'yw',
'nn',
'by',
'uz'
16'ps',
'hw',
'kw',
'jh',
'dx',
'ae'
17'ax',
'xs',
'nd',
'xx',
'df',
'cm'
38call setContentsTo(file, contents
= repeat(SK_
'(1, -1), (2, -2), (3, -3)'//NLC,
2_IK))
40(
1,
-1), (
2,
-2), (
3,
-3)
41(
1,
-1), (
2,
-2), (
3,
-3)
133"ww",
"cr",
"sz",
"mj",
"ag",
"kp"
134"ly",
"vy",
"hc",
"zw",
"nf",
"oj"
135"ru",
"uw",
"it",
"hs",
"cu",
"ey"
137if (
0 /= getErrTableWrite(file, table, deliml
= SKG_
''''))
error stop 'table write failed.'
139'ww',
'cr',
'sz',
'mj',
'ag',
'kp'
140'ly',
'vy',
'hc',
'zw',
'nf',
'oj'
141'ru',
'uw',
'it',
'hs',
'cu',
'ey'
143seps
= [css_type(SK_
'|||'), css_type(SK_
',')]
160seps
= css_type([SK_
',', SK_
' '], trimmed
= .true._LK)
168call setContentsTo(file, contents
= repeat(SK_
'(1, -1), (2, -2), (3, -3)'//NLC,
2_IK))
170(
1,
-1), (
2,
-2), (
3,
-3)
171(
1,
-1), (
2,
-2), (
3,
-3)
- Test:
- test_pm_io
- Todo:
- Normal Priority: An
optional
input argument maxlenfield
must be added to return the maximum inferred length of a field in all records of the table.
This option is potentially useful for parsing tables of string fields, where the proper field lengths is unknown a priori.
Without this option, the read
action may fail or otherwise, long fields would be truncated to fit the fixed length of table
fields.
- Todo:
- Normal Priority: The
optional
input arguments deliml
and delimr
of rank 1
of type css_type must be added to allow field recognition with arbitrary left/right delimiters within a record.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
-
If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
-
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
- Copyright
- Computational Data Science Lab
- Author:
- Amir Shahmoradi, Tuesday March 7, 2017, 3:50 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin
Definition at line 9407 of file pm_io.F90.