10

Considering the following code

x = input('Input an array: ');

If the user types [1 2 3], variable x will be assigned that numeric vector. Similarly, if they type {1, [2 3], 'abc'}, variable x will be a cell array containing those values. Fine.

Now, if the user types [sqrt(2) sin(pi/3)], variable x will be assigned the resulting values: [1.414213562373095 0.866025403784439]. That's because the provided data is evaluated by input:

input Prompt for user input.
result = input(prompt) displays the prompt string on the screen, waits for input from the keyboard, evaluates any expressions in the input, and returns the value in result. [...]

This can cause problems. For example, what happens if the user types addpath('c:\path\to\folder') as input? Since the input is evaluated, it's actually a command which will be executed by Matlab. So the user can get to add a folder to the path. Worse yet, if they input path(''), the path will be effectively changed to nothing, and Matlab will stop working properly.

Another potential source of problems is that

[...] To evaluate expressions, input accesses variables in the current workspace.

For example, if the user inputs fprintf(1,'%f', varname) and varname is an existing numeric array, the user will know its current value.

This behaviour is probably by design. Users of a Matlab program are trusted when inputting data, much like they are trusted not to press Control-C to halt the program (and then issue all commands or inspect all variables they like!).

But in certain cases the programmer may want to have a more "secure" input function, by which I mean

  1. prevent any function calls when evaluating user input; and
  2. prevent the input from accessing variables of the program.

So [1 2] would be valid input, but [sqrt(2) sin(pi/3)] or path('') would not because of item 1; and [1 2 3 varname(1)] would be invalid too because of item 2.

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 1
    Thanks for sharing! If it is only about reading an array of numbers, you could again read the input as string, and use `str2num` on the string. Or would there be any other drawbacks with that approach? – hbaderts Oct 14 '15 at 11:34
  • @hbaderts For that particular case (array of numbers) I think that would be a very good approach: simple and effective. It won't work for more general input, though (cell arrays, char arrays etc) – Luis Mendo Oct 14 '15 at 11:39
  • Are any operations, e.g. *,+,-,/ etc valid? Or do you only want numbers, text or a combination of numeric and char (stored in a cell)? Should all text be enclosed in `'`? – matlabgui Oct 14 '15 at 13:00
  • Operations are valid, but a solution in which they are not valid is fine too. Arrays will be combinations of numbers, logical values, chars; whether "standard" arrays or cell arrays. Examples: `[1 2 3]`, `{1 2 'abc'}`, `['ABC' 68]`, `[1+cos(2) 2]`(not valid: there's a function call), `{'cos(1)' 'sin(1)'}` (valid: they are strings) – Luis Mendo Oct 14 '15 at 13:23
  • 2
    I assume you're familiar with little Bobby Tables? Failure to do input validation is always a BadThing(TM) . I can't imagine why (ok, yes I can) MathWorks allowed stuff like this to get into production. – Carl Witthoft Oct 14 '15 at 13:45
  • @CarlWitthoft Yes, I read about that at xkcd :-) The thing is, `input`'s documentation recognizes it works that way. So it seems to be a feature, not a bug. Anyway, it would be nice that Matlab included a "secure input" function. It seems they have already done most of the work in `getcallinfo` – Luis Mendo Oct 14 '15 at 13:59
  • I need to add to this discussion. Note that all circumstances where you would use `input`, the user can Ctrl+C stop the program and run arbitrary commands on the MATLAB command line. There is no need to protect a MATLAB installation against a MATLAB user. – Cris Luengo Mar 21 '21 at 14:37
  • @CrisLuengo I agree (see the third-to-last paragraph in my question) – Luis Mendo Mar 21 '21 at 16:47
  • 1
    @LuisMendo: sorry, didn’t see that part. I was reacting to the short-sighted comment by Carl “I can't imagine why (ok, yes I can) MathWorks allowed stuff like this to get into production.” – Cris Luengo Mar 21 '21 at 16:56

2 Answers2

7

I have found a not very satisfactory solution (and I'd love to read about a better one). It uses a semi-documented function and implies saving the user input to a temporary file. The function, referred to in Yair Altman's blog, is getcallinfo. According to help getcallinfo:

getcallinfo

Returns called functions and their first and last lines
This function is unsupported and might change or be removed without notice in a future version.

This function solves issue 1 (prevent function calls). As for issue 2 (prevent access to variables), it would suffice to evaluate the input within a function, so that it can't see other variables. Apparently (see example 2 below), getcallinfo detects not only called functions, but variables too. Anyway, it's probably a good idea to do the evaluation in the isolated scope of a function.

The procedure is then:

  1. Use the string version of input to prevent evaluation:

    x = input('Input an array: ', 's');
    
  2. Save the string to a file:

    filename = 'tmp.m';
    fid = fopen(filename,'w');
    fprintf(fid, '%s',x);
    fclose(fid);
    
  3. Check the input string with getcallinfo to detect possible function calls:

    gci = getcallinfo(filename);
    if ~isempty(gci.calls.fcnCalls.names)
        %// Input includes function calls: ignore / ask again / ...
    else
        x = evalinput(x); %// evaluate input in a function
    end
    

where evalinput is the following function

function x = evalinput(x)
x = eval(x);

Example 1

Consider

x = input('Input an array: ', 's');

with user input

[sqrt(2) sin(pi/3)]

Then

filename = 'tmp.m';
fid = fopen(filename,'w');
fprintf(fid, '%s',x);
fclose(fid);
gci = getcallinfo(filename);

produces a non-empty gci.calls.fcnCalls.names,

>> gci.calls.fcnCalls.names
ans = 
    'sqrt'    'sin'    'pi'

which tells us that the user input would call functions sqrt, sin and pi if evaluated. Note that operators such as / are not detected as functions.

Example 2

y = [10 20 30];
x = input('Input an array: ', 's');

User enters

[1 y y.^2]

Then

filename = 'tmp.m';
fid = fopen(filename,'w');
fprintf(fid, '%s',x);
fclose(fid);
gci = getcallinfo(filename);

produces

>> gci.calls.fcnCalls.names
ans = 
    'y'    'y'

So variables are detected by getcallinfo as if they were functions.

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 3
    @AnderBiguri Thanks! Yair Altman's blog is an amazing source of weird stuff :-) – Luis Mendo Oct 14 '15 at 12:05
  • Just wondering: if you build a dialog box with `inputdlg` , there's a config option `options.Interpreter` . Can this be used to force input to be treated as either numeric or string (and thus NOT a function or path call)? – Carl Witthoft Oct 14 '15 at 13:48
  • 1
    @CarlWitthoft That option seems to control rendering of the _prompt_ string (tex or plain) http://es.mathworks.com/help/matlab/ref/inputdlg.html – Luis Mendo Oct 14 '15 at 13:56
  • 1
    Any reason why you cannot use regex or some pattern matching to identify approved vs. non-approved strings, then evaluate them if they pass? – pragmatist1 Oct 14 '15 at 22:33
  • @pragmatist1 I thought about that when I was dealing with this problem a few days ago. At first glance it seemed complicated. The regexp should be able to identify when a function name is within a string, and ignore it in that case. But pehaps it's simpler thab I thought: a potential function name is within a string if and only if it has an odd number of quote signs to the left. It's a promising approach. Care to write an answer about that? If you don't, maybe I'll look into that in the future. Thanks for the suggestion in any case! – Luis Mendo Oct 14 '15 at 22:46
  • I'll give it a stab later tonight. Is it safe to assume that rules 1 and 2 you've listed above are the only ones? Also it is not clear to me: are you rejecting any invalid input, or are you "cleaning it up" and removing the function calls? – pragmatist1 Oct 14 '15 at 23:02
  • @pragmatist1 Great! No hurry of course. Well, those rules are the only ones I could think of as being important. So in principle yes. The idea is to prevent malicuous input, meaning input that could _execute_ something or _inspect_ the program state when being evaluated. If you think some other rule would have to be included for that, please do it – Luis Mendo Oct 14 '15 at 23:05
1

If I understand your question correctly, it is possible to use regular expressions to accomplish what you are trying to do.

No function or variable calls

At its simplest, this checks to making sure there are no alphabetical characters in the input string. The expression would then be, for x containing input:

expr = '[a-zA-Z]';
x = input('Input an array: ', 's');
valid = isempty(regexp(x,expr));

This alone works for the few examples you give above.

Allowing some functions or variables

Suppose you want to allow the user to access some variables or functions, maybe simple trig functions, or pi or what have you, then it's no longer so simple. I've been playing around with an expression like below:

expr = '(?!cos\(|pi|sin\()[a-zA-Z]+

But it doesn't quite do what is expected. It will match in( in sin. If you know regex better than I, you can massage that to work properly.

Otherwise, an alternative would be to do this:

isempty(regexp(regexprep(x,'(sin\(|cos\(|pi|x)',''),expr))

so that you remove the strings you are interested in.

Hope this helps.

Update: In order to allow imaginary/exp values, and paths

The new expression to match becomes

expr = '[iIeE][a-zA-Z]+';

This ignores i/I and e/E (you can augment this as you see fit). You can also do a two character limit by switching to \{2,}, though I people can still have one character anonymous functions..

The other part, to check for input becomes:

isempty(regexp(regexprep(x,'(sin\(|cos\(|pi|x|''(.*?)'')',''),expr))

now you can exclude custom functions (you can always have this as an array and join them together by a |) and paths.

Here are a few examples tested along with the usual:

Passes

'[1+2i, 34e12]'
'''this is a path'''
'[cos(5), sin(3+2i)]'

Fails

'[1+2ii, 34e12]'
'this is not a path'
'''this is a path'' this is not'
pragmatist1
  • 919
  • 6
  • 16
  • Thanks. You should allow characters `e`, `E`, `i`, `j`, so that numbers such as `-1.2e-3j` are accepted. I think a good criterion to screen functions could be to forbid _two_ alphabetic characters together. Also, things like `'Fear is the path to the dark side'` should be allowed too (`path` appears but is within within a string) – Luis Mendo Oct 15 '15 at 06:28
  • Ah yes, I don't know why that didn't cross my mind. I will update my answer accordingly. – pragmatist1 Oct 15 '15 at 12:35
  • Ok I updated it. Let me know if it needs changing. I'm not regex guru, but hopefully this does the job! – pragmatist1 Oct 15 '15 at 16:33
  • Actually I don't want to remove the bad parts from the string, just to detect if the string contains at least one such bad part (and reject the whole string in that case). So I think something like `regexp(x, '^[^'']*(''[^'']*''[^'']*)*[a-zA-Z]{2}')` works (gives non-empty if the string has two letters together outside of a string). I'll have to do more thorough testing, but it's promising. – Luis Mendo Oct 15 '15 at 23:54
  • Yes, my use regexprep was not to sanitize a string, but to remove potentially allowed functions/variables before running the general expression against it. Either case, hopefully it works out for you! – pragmatist1 Oct 16 '15 at 00:20