SQLite Forum

Efficiently checking if a TEXT column exclusively contains integers or floating point numbers (as strings)
Login
I basically use the equivalent of a regex to find out if a whup of text is an integer or real.

```
**  integer             if the field data looks like an integer,
**                      (regex "^ *(\+|-)?\d+ *$"),
**                      then an integer will be returned as
**                      provided by the compiler and platform
**                      runtime strtoll function
**                      otherwise the field will be processed as
**                      text as defined above
**
**  real                if the field data looks like a number,
**                      (regex "^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$")
**                      then a double will be returned as
**                      provided by the compiler and platform
**                      runtime strtold function otherwise the
**                      field will be processed as text as
**                      defined above
**
```

There is a regex extension in the standard extensions place in the SQLite3 repository that can do this, but I did it by intelli-force in the following C code:

```
/*
**
** Determine affinity of field
**
** ignore leading space
** then may have + or -
** then may have digits or . (if . found then type=real)
** then may have digits (if another . then not number)
** then may have e (if found then type=real)
** then may have + or -
** then may have digits
** then may have trailing space
*/
static int vsv_isValidNumber(char *arg)
{
    char *start;
    char *stop;
    int isValid = 0;
    int hasDigit = 0;

    start = arg;
    stop = arg + strlen(arg) - 1;
    while (start <= stop && *start==' ')                // strip spaces from begining
    {
        start++;
    }
    while (start <= stop && *stop==' ')                 // strip spaces from end
    {
        stop--;
    }
    if (start > stop)
    {
        goto vsv_end_isValidNumber;
    }
    if (start <= stop && (*start=='+' || *start=='-'))  // may have + or -
    {
        start++;
    }
    if (start <= stop && isdigit(*start))               // must have a digit to be valid
    {
        hasDigit = 1;
        isValid = 1;
    }
    while (start <= stop && isdigit(*start))            // bunch of digits
    {
        start++;
    }
    if (start <= stop && *start=='.')                   // may have .
    {
        isValid = 2;
        start++;
    }
    if (start <= stop && isdigit(*start))
    {
        hasDigit = 1;
    }
    while (start <= stop && isdigit(*start))            // bunch of digits
    {
        start++;
    }
    if (!hasDigit)                                      // no digits then invalid
    {
        isValid = 0;
        goto vsv_end_isValidNumber;
    }
    if (start <= stop && (*start=='e' || *start=='E'))  // may have 'e' or 'E'
    {
        isValid = 3;
        start++;
    }
    if (start <= stop && isValid == 3 && (*start == '+' || *start == '-'))
    {
        start++;
    }
    if (start <= stop && isValid == 3 && isdigit(*start))
    {
        isValid = 2;
    }
    while (start <= stop && isdigit(*start))            // bunch of digits
    {
        start++;
    }
    if (isValid == 3)
    {
        isValid = 0;
    }
vsv_end_isValidNumber:
    if (start <= stop)
    {
        isValid = 0;
    }
    return isValid;
}
```

This will return 0, 1, or 2.  0 means not NUMERIC, 1 means matches the integer regex above, and 2 means matches the float regex above.  Whether the number is actually an integer or a real must be determined "by other means" because the number 1.1e2 "looks like" a float but is really just the integer 110.  (So really, if the result is NOT 0 then the affinity is NUMERIC).