Overflow While Parsing IPv4 Addresses (Part 3)
I went for a hike in the mountains this afternoon, and I realized that a comment in my ip overflow bug fix was inaccurate. I wrote:
The
digitToInt
function is avoided because it supports more than ASCII digits and is therefore slower thanord d - 48
.
One should indeed avoid digitToInt, but I misremembered the details. It only supports ASCII digits, but it supports hex digits and therefore does various range checking. I checked the source to confirm and realized that there is room for further improvement!
isDigit :: Char -> Bool
isDigit c = (fromIntegral (ord c - ord '0') :: Word) <= 9
digitToInt :: Char -> Int
digitToInt c
| (fromIntegral dec::Word) <= 9 = dec
| (fromIntegral hexl::Word) <= 5 = hexl + 10
| (fromIntegral hexu::Word) <= 5 = hexu + 10
| otherwise = errorWithoutStackTrace ("Char.digitToInt: not a digit " ++ show c) -- sigh
where
= ord c - ord '0'
dec = ord c - ord 'a'
hexl = ord c - ord 'A' hexu
One should follow Alexis King’s advice: “Parse,
don’t validate!” For example, here is a function that parses a
decimal digit, using a different function name primarily to distinguish
it from the function above. (A function that parses hex digits could be
named maybeHexDigitToInt
.)
maybeDigitToInt :: Char -> Maybe Int
=
maybeDigitToInt c let d = Char.ord c - 48
in if d >= 0 && d <= 9 then Just d else Nothing
This function checks that the character is a digit and coverts it to
an Int
. This function is also total, while the digitToInt
function is not total because it can throw an error.
As for the ip overflow
bug fix, I
updated the readOctet
function to the following:
readOctet :: TextRead.Reader Word
= do
readOctet t let (digits, rest) = Text.span Char.isDigit t
=
go d f n let d' = Char.ord d - 48
= n * 10 + d'
n' in if d' >= 0 && d' <=9 && n' <= 255 then f n' else Nothing
$ Left "octet does not start with a digit"
when (Text.null digits) case Text.foldr go Just digits 0 of
Just n -> Right (fromIntegral n, rest)
Nothing -> Left ipOctetSizeErrorMsg
(Note: This code is written to match the style of the ip source.)
The relevant comment now reads:
The
isDigit
anddigitToInt
functions are avoided in order to avoiding checking the range more than once. This implementation callsord
(once) and uses the result for both the range check and the calculation.