ParaMonte MATLAB 3.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
glob.m
Go to the documentation of this file.
1%> \brief
2%> Find all files and directory names matching the input pattern by expanding wildcards.<br>
3%>
4%> \details
5%> This function performs pattern matching of file and directory names, based on wildcard characters.<br>
6%> This function is similar to wildcard expansion performed by the Unix shell
7%> and Python ``glob.glob`` function, but it can handle by the Unix shell and
8%> Python glob.glob function, but it can handle more types of wildcards.<br>
9%> The following list highlights the key differences between
10%> this function and the MATLAB intrinsic ``dir()``.<br>
11%> <ol>
12%> <li> ``glob()`` supports wildcards for directories.<br>
13%> <li> ``glob()`` returns the directory part of ``pattern``.<br>
14%> <li> ``glob()`` returns a cell array of matching names.<br>
15%> <li> ``glob()`` does not return hidden files and directories that
16%> start with ``'.'`` unless explicitly specified in ``pattern``.<br>
17%> <li> ``glob()`` does not return ``'.'`` and ``'..'`` unless explicitly specified in ``pattern``.<br>
18%> <li> ``glob()`` adds a trailing file separator to directory names.<br>
19%> <li> ``glob()`` does not return the contents of a directory when a directory is specified.<br>
20%> To return contents of a directory, add a trailing ``'/*'``.<br>
21%> <li> ``glob()`` returns only directory names when a trailing file separator is specified.<br>
22%> <li> On Windows, ``glob()`` is not case sensitive, but it returns matching
23%> names exactly in the case as they are defined on the filesystem.<br>
24%> Case of host and sharename of a UNC path and case of drive
25%> letters will be returned as specified in ``pattern``.<br>
26%> </ol>
27%>
28%> \param[in] pattern : The input scalar MATLAB string, containing the search pattern.<br>
29%> Wildcards may be used for basenames and for the directory parts.<br>
30%> If pattern contains directory parts, then these will be included in the output ``pathList``.<br>
31%> Following wildcards can be used:<br>
32%> <ol>
33%> <li> ``*`` match zero or more characters.
34%> <li> ``?`` match any single character.
35%> <li> ``[ab12]`` match one of the specified characters.
36%> <li> ``[^ab12]`` match none of the specified characters
37%> <li> ``[a-z]`` match one character in range of characters
38%> <li> ``{a,b,c}`` matches any one of strings a, b or c<br>
39%> <li> All above wildcards do not match a file separator.<br>
40%> <li> ``**`` match zero or more characters including file separators.<br>
41%> This can be used to match zero or more directory parts
42%> and will recursively list matching names.<br>
43%> Beware that **symbolic linked directories or
44%> junctions may cause an infinite loop** when using the ``**``.<br>
45%> </ol>
46%> \param[in] anycase : The input scalar MATLAB logical.<br>
47%> If ``true``, the search will be case-sensitive.<br>
48%> If ``false``, the search will be case-insensitive.<br>
49%> On Windows, ``anycase`` is always reset to ``true`` even if user-specified.<br>
50%> (**optional**. default = ``false`` on Unix and ``true`` on Windows.)
51%>
52%> \return
53%> ``pathList`` : The output MATLAB cell array of strings containing the files
54%> or directories that match the path specified by string ``pattern``.<br>
55%> ``isdirList`` : The output MATLAB cell array of the same size as ``pathList``,
56%> each element of which is a MATLAB logical value that is ``true`` if
57%> and only if the corresponding element of ``pathList`` is a directory.<br>
58%>
59%> \interface{glob}
60%> \code{.m}
61%>
62%> [pathList, isdirList] = pm.sys.path.glob(pattern)
63%> [pathList, isdirList] = pm.sys.path.glob(pattern, anycase)
64%>
65%> \endcode
66%>
67%> \example{glob-raw}
68%> \code{.m}
69%>
70%> pm.sys.path.glob("*.m") % list all .m files in current directory.
71%> pm.sys.path.glob("baz/*") % list all files and directories in subdirectory "baz".
72%> pm.sys.path.glob("b*/*.m") % list all .m files in subdirectory names starting with "b".
73%> % The list will include the names of the matching subdirectories.
74%> pm.sys.path.glob("?z*.m") % list all .m files where the second character is 'z'.
75%> pm.sys.path.glob("baz.[ch]") % matches baz.c and baz.h
76%> pm.sys.path.glob("test.[^ch]") % matches test.a but not test.c or test.h
77%> pm.sys.path.glob("demo.[a-c]") % matches demo.a, demo.b, and demo.c
78%> pm.sys.path.glob("test.{foo,bar,baz}") % matches test.foo, test.bar, and test.baz
79%> pm.sys.path.glob(".*") % list all hidden files in current directory, excluding '.' and '..'
80%> pm.sys.path.glob("*/") % list all subdirectories.
81%> pm.sys.path.glob("**") % recursively list all files and directories,
82%> % starting in current directory (current directory name,
83%> % hidden files and hidden directories are excluded).
84%> pm.sys.path.glob("**.m") % list all m-files anywhere in directory tree,
85%> % including m-files in current directory. This
86%> % is equivalent with '**/*.m'.
87%> pm.sys.path.glob("foo/**/") % recursively list all directories, starting in directory 'foo'.
88%> pm.sys.path.glob("**/.svn/") % list all .svn directories in directory tree.
89%> pm.sys.path.glob("**/.*/**") % recursively list all files in hidden directories only.
90%> [paths, isdir] = pm.sys.path.glob('**'); paths(~isdir) % get all files in directory tree.
91%>
92%> \endcode
93%>
94%> \example{glob}
95%> \include{lineno} example/sys/path/glob/main.m
96%> \output{glob}
97%> \include{lineno} example/sys/path/glob/main.out.m
98%>
99%> \final{glob}
100%>
101%> Copyright (c) 2013, Peter van den Biggelaar
102%> All rights reserved.
103%>
104%> Redistribution and use in source and binary forms, with or without
105%> modification, are permitted provided that the following conditions are met:
106%>
107%> * Redistributions of source code must retain the above copyright
108%> notice, this list of conditions and the following disclaimer.
109%> * Redistributions in binary form must reproduce the above copyright
110%> notice, this list of conditions and the following disclaimer in
111%> the documentation and/or other materials provided with the distribution
112%>
113%> This software is provided by the copyright holders and contributors "as is"
114%> and any express or implied warranties, including, but not limited to, the
115%> implied warranties of merchantability and fitness for a particular purpose
116%> are disclaimed. in no event shall the copyright owner or contributors be
117%> liable for any direct, indirect, incidental, special, exemplary, or
118%> consequential damages (including, but not limited to, procurement of
119%> substitute goods or services; loss of use, data, or profits; or business
120%> interruption) however caused and on any theory of liability, whether in
121%> contract, strict liability, or tort (including negligence or otherwise)
122%> arising in any way out of the use of this software, even if advised of the
123%> possibility of such damage.
124%>
125%> \author
126%> \JoshuaOsborne, May 21 2024, 5:24 AM, University of Texas at Arlington<br>
127%> \FatemehBagheri, May 20 2024, 1:25 PM, NASA Goddard Space Flight Center (GSFC), Washington, D.C.<br>
128%> \AmirShahmoradi, May 16 2016, 9:03 AM, Oden Institute for Computational Engineering and Sciences (ICES), UT Austin<br>
129function [pathList, isdirList] = glob(pattern, anycase)
130
131 if isstring(pattern)
132 pattern = convertStringsToChars(pattern);
133 end
134
135 %%%%
136 %%%% check pattern input
137 %%%%
138
139 if ischar(pattern)
140 if isempty(pattern)
141 % return when pattern is empty
142 pathList = cell(0);
143 isdirList = false(0);
144 return
145 elseif size(pattern,1)>1
146 error('glob:invalidInput', 'pattern must be a single string.')
147 end
148 else
149 error('glob:invalidInput', 'pattern must be a string.')
150 end
151
152 %%%%
153 %%%% check anycase option
154 %%%%
155
156 if nargin < 2
157 anycase = [];
158 end
159 if ~isempty(anycase)
160 pm.introspection.verify(anycase, "logical", 1, "anycase");
161 else
162 % Windows is not case sensitive
163 % Unix is case sensitive
164 anycase = ispc;
165 end
166
167 %%%%
168 %%%% define function handle to regular expression function for the specified case sensitivity
169 %%%%
170
171 if anycase
172 regexp_fhandle = @regexpi;
173 else
174 regexp_fhandle = @regexp;
175 end
176
177 %%%%
178 %%%% only use forward slashes as file separator to prevent escaping backslashes in regular expressions
179 %%%%
180
181 filespec = strrep(pattern, '\', '/');
182
183 %%%%
184 %%%% split pathroot part from pattern
185 %%%%
186
187 if strncmp(filespec, '//',2)
188 if ispc
189 % pattern specifies a UNC path
190 % It is not allowed to get a directory listing of share names of a
191 % host with the DIR command.
192 % pathroot will contains e.g. //host/share/
193 pathroot = regexprep(filespec, '(^//+[^/]+/[^/]+/)(.*)', '$1');
194 filespec = regexprep(filespec, '(^//+[^/]+/[^/]+/)(.*)', '$2');
195 else
196 % for Unix, multiple leading file separators are equivalent with a single file separator
197 filespec = regexprep(filespec, '^/*', '/');
198 end
199 elseif strncmp(filespec, '/', 1)
200 % pattern specifies a absolute path
201 pathroot = '/';
202 filespec(1) = [];
203 elseif ispc && numel(filespec)>=2 && filespec(2) == ':'
204 % pattern specifies a absolute path starting with a drive letter
205 % check for a fileseparator after ':'. e.g. 'C:\'
206 if numel(filespec)<3 || filespec(3)~='/'
207 error('glob:invalidInput','Drive letter must be followed by '':\''.')
208 end
209 pathroot = filespec(1:3);
210 filespec(1:3) = [];
211 else
212 % pattern specifies a relative path
213 pathroot = './';
214 end
215
216 %%%% replace multiple file separators by a single file separator
217
218 filespec = regexprep(filespec, '/+', '/');
219
220 %%%% replace 'a**' with 'a*/**', where 'a' can be any character but not '/'
221
222 filespec = regexprep(filespec, '([^/])(\.\*\.\*)', '$1\*/$2');
223
224 %%%% replace '**a' with '**/*a', where a can be any character but not '/'
225
226 filespec = regexprep(filespec, '(\.\*\.\*)([^/])', '$1/\*$2');
227
228 %%%% split filespec into chunks at file separator
229
230 chunks = strread(filespec, '%s', 'delimiter', '/'); %#ok<FPARK>
231
232 %%%% add empty chunk at the end when filespec ends with a file separator
233
234 if ~isempty(filespec) && filespec(end)=='/'
235 chunks{end+1} = '';
236 end
237
238 %%%% translate chunks to regular expressions
239
240 for i=1:numel(chunks)
241 chunks{i} = glob2regexp(chunks{i});
242 end
243
244 %%%% determine file list using LS_REGEXP
245 %%%% this function requires that PATHROOT does not to contain any wildcards
246
247 if ~isempty(chunks)
248 list = ls_regexp(regexp_fhandle, pathroot, chunks{1:end});
249 else
250 list = {pathroot};
251 end
252 if strcmp(pathroot, './')
253 % remove relative pathroot from result
254 list = regexprep(list, '^\./', '');
255 end
256 if nargout == 2
257 % determine directories by checking for '/' at the end
258 I = regexp(list', '/$');
259 isdirList = ~cellfun('isempty', I);
260 end
261
262 %%%%
263 %%%% convert to standard file separators for PC
264 %%%%
265
266 if ispc
267 list = strrep(list, '/', '\');
268 end
269
270 %%%%
271 %%%% return output
272 %%%%
273
274 if nargout == 0
275 if ~isempty(list)
276 % display list
277 disp(string(list))
278 else
279 disp(['''' pattern ''' not found.']);
280 end
281 else
282 pathList = string(list');
283 end
284
285 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
286
287 function regexp_str = glob2regexp(glob_str)
288 %%%%
289 %%%% translate glob_str to regular expression string initialize
290 %%%%
291 regexp_str = '';
292 in_curlies = 0; % is > 0 within curly braces
293 %%%%
294 %%%% handle characters in glob_str one-by-one
295 %%%%
296 for c = glob_str
297
298 if any(c=='.()|+^$@%')
299 % escape simple special characters
300 regexp_str = [regexp_str '\' c]; %#ok<AGROW>
301
302 elseif c=='*'
303 % '*' should not match '/'
304 regexp_str = [regexp_str '[^/]*']; %#ok<AGROW>
305
306 elseif c=='?'
307 % '?' should not match '/'
308 regexp_str = [regexp_str '[^/]']; %#ok<AGROW>
309
310 elseif c=='{'
311 regexp_str = [regexp_str '(']; %#ok<AGROW>
312 in_curlies = in_curlies+1;
313 elseif c=='}' && in_curlies
314 regexp_str = [regexp_str ')']; %#ok<AGROW>
315 in_curlies = in_curlies-1;
316 elseif c==',' && in_curlies
317 regexp_str = [regexp_str '|']; %#ok<AGROW>
318
319 else
320 regexp_str = [regexp_str c]; %#ok<AGROW>
321 end
322 end
323 % replace original '**' (that has now become '[^/]*[^/]*') with '.*.*'
324 regexp_str = strrep(regexp_str, '[^/]*[^/]*', '.*.*');
325 end
326
327 function L = ls_regexp(regexp_fhandle, path, varargin)
328 % List files that match PATH/r1/r2/r3/... where PATH is a string without
329 % any wildcards and r1..rn are regular expresions that contain the parts of
330 % a filespec between the file separators.
331 % L is a cell array with matching file or directory names.
332 % REGEXP_FHANDLE contain a file handle to REGEXP or REGEXPI depending
333 % on specified case sensitivity.
334
335 % if first regular expressions contains '**', examine complete file tree
336 if nargin >= 3 && any(regexp(varargin{1}, '\.\*\.\*'))
337 L = ls_regexp_tree(regexp_fhandle, path, varargin{:});
338
339 else
340 % get contents of path
341 list = dir(path);
342
343 if nargin >= 3
344 if strcmp(varargin{1}, '\.') || strcmp(varargin{1}, '\.\.')
345 % keep explicitly specified '.' or '..' in first regular expression
346 if ispc && ~any(strcmp({list.name}, '.'))
347 % fix strange windows behaviour: root of a volume has no '.' and '..'
348 list(end+1).name = '.';
349 list(end).isdir = true;
350 list(end+1).name = '..';
351 list(end).isdir = true;
352 end
353 else
354 % remove '.' and '..'
355 list(strcmp({list.name},'.')) = [];
356 list(strcmp({list.name},'..')) = [];
357
358 % remove files starting with '.' specified in first regular expression
359 if ~strncmp(varargin{1},'\.',2)
360 % remove files starting with '.' from list
361 list(strncmp({list.name},'.',1)) = [];
362 end
363 end
364 end
365
366 % define shortcuts
367 list_isdir = [list.isdir];
368 list_name = {list.name};
369
370 L = {}; % initialize
371 if nargin==2 % no regular expressions
372 % return filename
373 if ~isempty(list_name)
374 % add a trailing slash to directories
375 trailing_fsep = repmat({''}, size(list_name));
376 trailing_fsep(list_isdir) = {'/'};
377 L = strcat(path, list_name, trailing_fsep);
378 end
379 elseif nargin==3 % last regular expression
380 % return list_name matching regular expression
381 I = regexp_fhandle(list_name, ['^' varargin{1} '$']);
382 I = ~cellfun('isempty', I);
383 list_name = list_name(I);
384 list_isdir = list_isdir(I);
385 if ~isempty(list_name)
386 % add a trailing slash to directories
387 trailing_fsep = repmat({''}, size(list_name));
388 trailing_fsep(list_isdir) = {'/'};
389 L = strcat(path, list_name, trailing_fsep);
390 end
391
392 elseif nargin==4 && isempty(varargin{2})
393 % only return directories when last regexp is empty
394 % return list_name matching regular expression and that are directories
395 I = regexp_fhandle(list_name, ['^' varargin{1} '$']);
396 I = ~cellfun('isempty', I);
397 % only return directories
398 list_name = list_name(I);
399 list_isdir = list_isdir(I);
400 if any(list_isdir)
401 % add a trailing file separator
402 L = strcat(path, list_name(list_isdir), '/');
403 end
404 else
405 % traverse for list_name matching regular expression
406 I = regexp_fhandle(list_name, ['^' varargin{1} '$']);
407 I = ~cellfun('isempty', I);
408 for name = list_name(I)
409 L = [L ls_regexp(regexp_fhandle, [path char(name) '/'], varargin{2:end})]; %#ok<AGROW>
410 end
411 end
412 end
413 end
414
415 function L = ls_regexp_tree(regexp_fhandle, path, varargin)
416 % use this function when first argument of varargin contains '**'
417 % build list of complete directory tree
418 % if any regexp starts with '\.', keep hidden files and directories
419 I = regexp(varargin, '^\\\.');
420 I = ~cellfun('isempty', I);
421 keep_hidden = any(I);
422 list = dir_recur(path, keep_hidden);
423 L = {list.name};
424 % make one regular expression of all individual regexps
425 expression = [regexptranslate('escape',path) sprintf('%s/', varargin{1:end-1}) varargin{end}];
426 % note that /**/ must also match zero directories
427 % replace '' with (|/)
428 expression = regexprep(expression, '/\.\*\.\*/', '(/\.\*\.\*/|/)');
429 % return matching names
430 if ~isempty(varargin{end})
431 % determing matching names ignoring trailing '/'
432 L_no_trailing_fsep = regexprep(L, '/$', '');
433 I = regexp_fhandle(L_no_trailing_fsep, ['^' expression '$']);
434 else
435 % determing matching names including trailing '/'
436 I = regexp_fhandle(L, ['^' expression '$']);
437 end
438 I = cellfun('isempty', I);
439 L(I) = [];
440 end
441
442 function d = dir_recur(startdir, keep_hidden)
443 % determine recursive directory contents
444 % get directory contents
445 d = dir(startdir);
446 % remove hidden files
447 if keep_hidden
448 % only remove '.' and '..'
449 d(strcmp({d.name},'.')) = [];
450 d(strcmp({d.name},'..')) = [];
451 else
452 % remove all hidden files and directories
453 d(strncmp({d.name},'.',1)) = [];
454 end
455 if ~isempty(d)
456 % add trailing fileseparator to directories
457 trailing_fsep = repmat({''}, size(d));
458 trailing_fsep([d.isdir]) = {'/'};
459 % prefix startdir to name and postfix fileseparator for directories
460 dname = strcat(startdir, {d.name}, trailing_fsep');
461 [d(:).name] = deal(dname{:});
462 % recurse into subdirectories
463 for subd = {d([d.isdir]).name}
464 d = [d; dir_recur(char(subd), keep_hidden)]; %#ok<AGROW>
465 end
466 end
467 end
468
469end
function verify(in varval, in vartype, in maxlen, in varname)
Verify the type and number of elements of the input varval match the specified input vartype and maxl...
function name(in vendor)
Return the MPI library name as used in naming the ParaMonte MATLAB shared library directories.
function list()
Return a list of MATLAB strings containing the names of OS platforms supported by the ParaMonte MATLA...
function glob(in pattern, in anycase)
Find all files and directory names matching the input pattern by expanding wildcards.
function glob2regexp(in glob_str)
function ls_regexp(in regexp_fhandle, in path, in varargin)
function dir_recur(in startdir, in keep_hidden)
function ls_regexp_tree(in regexp_fhandle, in path, in varargin)
excluded
Definition: show.m:173
function which(in vendor)
Return the a MATLAB string containing the path to the first mpiexec executable binary found in system...