Collapse SQL Whitespace (Part 2)
In the Collapse
SQL Whitespace blog entry, I presented my implementation of
sqlCollapseWhitespace, a function that collapses whitespace
and removes comments from PostgreSQL queries. My motivation
is to take advantage of Linus’s law and
get feedback. I generally try to avoid parsing SQL because there are so
many edge cases, as well as functionality that I am (blissfully?)
unaware of. This blog entry presents an updated version of the function
that supports formatting of statements that include semicolons.
I learned that some PostgreSQL statements make use of the semicolon,
so a semicolon does not necessarily mark the end of a
statement. For example, a CREATE RULE
statement may be defined with multiple commands that are separated by
semicolons. It is interesting that dollar-quoted strings are not used
for this.
CREATE [ OR REPLACE ] RULE name AS ON event
TO table_name [ WHERE condition ]
DO [ ALSO | INSTEAD ] { NOTHING | command | ( command ; command ... ) }
where event can be one of:
SELECT | INSERT | UPDATE | DELETE
My previous implementation collapsed whitespace after such a semicolon delimiter to a newline. The result is still valid, but it does not meet my goal of only using newlines in between statements.
Thanks go to Marcelo Zabani, the author of Codd.
This issue is easily resolved by parsing parenthesis and keeping track of the number of open parenthesis. A semicolon ends a statement only when there are no open parenthesis. The updated function is included below, and the full source is available on GitHub.
Code
sqlCollapseWhitespace :: String -> String
sqlCollapseWhitespace = goSpace 0 Nothing
where
dropStdComment :: String -> String
dropStdComment = dropWhile (/= '\n')
dropCStyleComment :: Int -> String -> String
dropCStyleComment !count = \case
('*':'/':xs)
| count < 1 -> xs
| otherwise -> dropCStyleComment (count - 1) xs
('/':'*':xs) -> dropCStyleComment (count + 1) xs
(_x:xs) -> dropCStyleComment count xs
[] -> []
goSpace :: Int -> Maybe Char -> String -> String
goSpace !numParens mC xs = case dropWhile isSpace xs of
('-':'-':ys) -> goSpace numParens mC $ dropStdComment ys
('/':'*':ys) -> goSpace numParens mC $ dropCStyleComment 0 ys
[] -> []
ys -> case mC of
Just c -> c : goSql numParens ys
Nothing -> goSql numParens ys
goSql :: Int -> String -> String
goSql !numParens = \case
('"':xs) -> '"' : goIdentDQ numParens xs
('\'':xs) -> '\'' : goStringSQ numParens xs
('$':xs) -> '$' : goStringDSBegin numParens "" xs
('(':xs) -> '(' : goSql (numParens + 1) xs
(')':xs) -> ')' : goSql (max 0 $ numParens - 1) xs
(';':xs)
| numParens > 0 -> ';' : goSql numParens xs
| otherwise -> ';' : goSpace numParens (Just '\n') xs
('-':'-':xs) -> goSpace numParens (Just ' ') $ dropStdComment xs
('/':'*':xs) -> goSpace numParens (Just ' ') $ dropCStyleComment 0 xs
('E':'\'':xs) -> 'E' : '\'' : goStringE numParens xs
(x:xs)
| isSpace x -> goSpace numParens (Just ' ') xs
| otherwise -> x : goSql numParens xs
[] -> []
goIdentDQ :: Int -> String -> String
goIdentDQ !numParens = \case
('"':'"':xs) -> '"' : '"' : goIdentDQ numParens xs
('"':xs) -> '"' : goSql numParens xs
(x:xs) -> x : goIdentDQ numParens xs
[] -> []
goStringSQ :: Int -> String -> String
goStringSQ !numParens = \case
('\'':'\'':xs) -> '\'' : '\'' : goStringSQ numParens xs
('\'':xs) -> goStringSQEnd numParens False xs
(x:xs) -> x : goStringSQ numParens xs
[] -> []
goStringSQEnd :: Int -> Bool -> String -> String
goStringSQEnd !numParens hasSpace = \case
('\n':xs) -> goStringSQEndNewline numParens xs
('-':'-':xs) ->
-- avoid unused @hasSpace@
goStringSQEndNewline numParens $ dropStdComment xs
('/':'*':xs) -> goStringSQEnd numParens True $ dropCStyleComment 0 xs
xs@(y:ys)
| isSpace y -> goStringSQEnd numParens True ys
| hasSpace -> '\'' : ' ' : goSql numParens xs
| otherwise -> '\'' : goSql numParens xs
[] -> "'"
goStringSQEndNewline :: Int -> String -> String
goStringSQEndNewline !numParens xs = case dropWhile isSpace xs of
('\'':ys) -> goStringSQ numParens ys
('-':'-':ys) -> goStringSQEndNewline numParens $ dropStdComment ys
('/':'*':ys) -> goStringSQEndNewline numParens $ dropCStyleComment 0 ys
[] -> "'"
ys -> '\'' : ' ' : goSql numParens ys
goStringE :: Int -> String -> String
goStringE !numParens = \case
('\\':x:xs) -> '\\' : x : goStringE numParens xs
('\'':'\'':xs) -> '\'' : '\'' : goStringE numParens xs
('\'':xs) -> '\'' : goSql numParens xs
(x:xs) -> x : goStringE numParens xs
[] -> []
goStringDSBegin :: Int -> String -> String -> String
goStringDSBegin !numParens acc = \case
('$':xs) -> '$' : goStringDS numParens (reverse acc) xs
(x:xs) -> x : goStringDSBegin numParens (x : acc) xs
[] -> []
goStringDS :: Int -> String -> String -> String
goStringDS !numParens tag = \case
('$':xs) -> '$' : goStringDSEnd numParens tag (tag, xs)
(x:xs) -> x : goStringDS numParens tag xs
[] -> []
goStringDSEnd :: Int -> String -> (String, String) -> String
goStringDSEnd !numParens tag = \case
(t:ts, x:xs)
| x == t -> x : goStringDSEnd numParens tag (ts, xs)
| x == '$' -> x : goStringDSEnd numParens tag (tag, xs)
| otherwise -> x : goStringDS numParens tag xs
([], '$':xs) -> '$' : goSql numParens xs
(_ts, xs) -> goStringDS numParens tag xs