Skip to main content

Overflow While Parsing IPv4 Addresses (Part 3)

I went for a hike in the mountains this afternoon, and I realized that a comment in my ip overflow bug fix was inaccurate. I wrote:

The digitToInt function is avoided because it supports more than ASCII digits and is therefore slower than ord d - 48.

One should indeed avoid digitToInt, but I misremembered the details. It only supports ASCII digits, but it supports hex digits and therefore does various range checking. I checked the source to confirm and realized that there is room for further improvement!

isDigit :: Char -> Bool
isDigit c = (fromIntegral (ord c - ord '0') :: Word) <= 9
digitToInt :: Char -> Int
digitToInt c
  | (fromIntegral dec::Word) <= 9 = dec
  | (fromIntegral hexl::Word) <= 5 = hexl + 10
  | (fromIntegral hexu::Word) <= 5 = hexu + 10
  | otherwise = errorWithoutStackTrace ("Char.digitToInt: not a digit " ++ show c) -- sigh
  where
    dec = ord c - ord '0'
    hexl = ord c - ord 'a'
    hexu = ord c - ord 'A'

One should follow Alexis King’s advice: “Parse, don’t validate!” For example, here is a function that parses a decimal digit, using a different function name primarily to distinguish it from the function above. (A function that parses hex digits could be named maybeHexDigitToInt.)

maybeDigitToInt :: Char -> Maybe Int
maybeDigitToInt c =
    let d = Char.ord c - 48
    in  if d >= 0 && d <= 9 then Just d else Nothing

This function checks that the character is a digit and coverts it to an Int. This function is also total, while the digitToInt function is not total because it can throw an error.

As for the ip overflow bug fix, I updated the readOctet function to the following:

readOctet :: TextRead.Reader Word
readOctet t = do
  let (digits, rest) = Text.span Char.isDigit t
      go d f n =
        let d' = Char.ord d - 48
            n' = n * 10 + d'
        in  if d' >= 0 && d' <=9 && n' <= 255 then f n' else Nothing
  when (Text.null digits) $ Left "octet does not start with a digit"
  case Text.foldr go Just digits 0 of
    Just n  -> Right (fromIntegral n, rest)
    Nothing -> Left ipOctetSizeErrorMsg

(Note: This code is written to match the style of the ip source.)

The relevant comment now reads:

The isDigit and digitToInt functions are avoided in order to avoiding checking the range more than once. This implementation calls ord (once) and uses the result for both the range check and the calculation.