Haskell Exercise for AI

There was another Show HN for the AI-powered search engine Phind this past week (this one, previous one), and I tried it out in order to determine if it deserves a spot on my browser home screen. I tested Phind with a Haskell exercise that is related to something that I am currently working on: minimizing whitespace in a PostgreSQL query. I then compared the response with that of ChatGPT.

Exercise

The exercise is simple to state:

Write a Haskell function that minimizes whitespace in PostgreSQL queries.

From the wording, it is clear that the function deals with a query string, so the type of the function is String -> String. As with many other programming languages, whitespace (spaces, tabs, newlines, etc.) is not “significant” in SQL, which essentially means that developers can visually format the query using whitespace, so that it is easier to read, without changing the meaning.

When SQL is parsed, whitespace is used to delimit tokens. Some whitespace is required, but it generally does not matter what or how much whitespace is used. Minimizing whitespace is done by replacing all such whitespace with a single space. One cannot simply replace all whitespace in the source, however, because SQL may contain string literals. Whitespace in string literals is significant and must not be changed. A whitespace minimization function needs to minimize whitespace according to the context within the source string.

I am specifically interested in minimizing whitespace in PostgreSQL (10+) queries. In addition to standard single quote strings, PostgreSQL supports escaped strings and dollar-quoted strings, all described in the PostreSQL documentation.

Note that there are some challenges that should not be overlooked.

Two (single-quoted) string constants that are only separated by whitespace with at least one newline are concatenated and effectively treated as if the string had been written as one constant. (Note that this is standard SQL!)
Dollar-quoted strings use tags that follow the same rules as unquoted identifiers.
Dollar-quoted strings can be nested.

How does AI handle this exercise?

Phind

Travis: Write a Haskell function that minimizes whitespace in PostgreSQL queries

Phind: To write a Haskell function that minimizes whitespace in PostgreSQL queries, you can use the sql quasiquoter provided by the postgresql-simple package. The sql quasiquoter tries to minimize whitespace and remove standard SQL comments while constructing a query string of type Query hackage.haskell.org.

Here’s an example of how to use the sql quasiquoter:

{-# LANGUAGE QuasiQuotes #-}
import Database.PostgreSQL.Simple
import Database.PostgreSQL.Simple.SqlQQ

queryMinimizedWhitespace :: Connection -> LocalTime -> LocalTime -> String -> IO [(ColumnA, ColumnB)]
queryMinimizedWhitespace conn beginTime endTime string = do
  query conn [sql|
    SELECT column_a, column_b
    FROM table1 NATURAL JOIN table2
    WHERE ? <= time AND time < ?
      AND name LIKE ?
    ORDER BY size DESC
    LIMIT 100
  |] (beginTime, endTime, string)

However, the sql quasiquoter’s whitespace reducer is currently incomplete, and it might not handle all cases correctly hackage.haskell.org. It preserves whitespace inside standard SQL string literals but can get confused by non-standard PostgreSQL string literal syntax, extended escape string syntax, quoted identifiers, and other similar constructs.

One possible approach to minimize whitespace in PostgreSQL queries is to write a custom Haskell function that removes unnecessary whitespace from query strings. This function can be used to process the query string before passing it to the query function provided by the postgresql-simple package hackage.haskell.org.

Here’s an example of a custom whitespace minimizer:

import Data.Char (isSpace)
import Data.List (intercalate)

minimizeWhitespace :: String -> String
minimizeWhitespace = unwords . map (intercalate " " . words) . lines

queryWithMinimizedWhitespace :: Connection -> String -> IO [(ColumnA, ColumnB)]
queryWithMinimizedWhitespace conn string = do
  let queryStr = minimizeWhitespace [sql|
    SELECT column_a, column_b
    FROM table1 NATURAL JOIN table2
    WHERE ? <= time AND time < ?
      AND name LIKE ?
    ORDER BY size DESC
    LIMIT 100
  |]
  query conn (fromString queryStr) (beginTime, endTime, string)

Keep in mind that this custom minimizer might not handle all edge cases correctly, and you might need to adjust it to fit your specific requirements.

In summary, you can use the sql quasiquoter provided by the postgresql-simple package to minimize whitespace in PostgreSQL queries. However, its whitespace reducer is incomplete, and you might need to write a custom function to handle specific cases.

Commentary: I really like that it pointed out the sql quasiquoter in postgresql-simple! It is interesting that it points out the limitations of this function without using the same wording in the documentation. The example is fine. The SQL query in the example is the one in the documentation, and is also formatted differently!

I am already very familiar with that sql quasiquoter. It indeed does not handle PostgreSQL string syntax, but note that it also has a few issues with standard SQL strings. (Perhaps I should submit a fix…)

It always drops the first character, because it expects it to be whitespace. It therefore mangles queries where the first character is not whitespace. Note that I need to minimize whitespace for comparison, and many query strings do not have leading whitespace.
It does not handle newline-separated string concatenation correctly. The newlines are replaced with a space, which means that the single string is split into two and no longer parses correctly.

The custom function minimizes all whitespace, without regard to string constants. This is a complete failure. The implementation is fine, but it demonstrates that Phind does not really understand the goal.

The example using the custom function is also unsatisfactory because it still uses the sql quasiquoter. The same query is used, which conveniently does not include any string literals, much less string literals that contain consecutive whitespace.

When I did these tests, I did not have much time and skipped directly to asking about dollar-quoted strings. I wanted to start over in order to give Phind the same level of questions that I gave ChatGPT, but providing the same prompt results in different responses each time. For example, my latest attempt results in a function that uses regular expressions to collapse whitespace, again without regard to string literals. The search results are not reproducible, but note that Phind keeps a history of sessions and threads so that you can access previous query results.

Travis: How can I add support for dollar-quoted strings?

Phind: To add support for dollar-quoted strings in the minimizeWhitespace function, you’ll need to modify the function to handle the dollar-quoted string constants properly. One approach is to split the input string into segments using the dollar signs as delimiters and then minimize whitespace only for segments that are not part of a dollar-quoted string.

Here’s a modified version of the minimizeWhitespace function that supports dollar-quoted strings:

import Data.List (intercalate)
import Data.List.Split (splitOn)

minimizeWhitespace :: String -> String
minimizeWhitespace input = concatMap processSegment segments
  where
    segments = splitOn "$" input
    processSegment segment
      | even (length (splitOn "$" (takeWhile (/= '\n') segment))) = intercalate " " (words segment)
      | otherwise = segment

This function splits the input string into segments using the dollar sign as a delimiter. It then processes each segment, minimizing the whitespace for segments that are outside of dollar-quoted strings (i.e., those that have an even number of dollar signs before the first newline character) and leaving the other segments unchanged.

Here’s an example of how this function works with a query that includes dollar-quoted strings:

main :: IO ()
main = putStrLn $ minimizeWhitespace [sql|
  SELECT column_a, column_b
  FROM table1 NATURAL JOIN table2
  WHERE ? <= time AND time < ?
    AND name LIKE ?
  ORDER BY size DESC
  LIMIT 100;

  -- Dollar-quoted string example
  DO
  $$
  DECLARE
    film_count INTEGER;
  BEGIN
    SELECT COUNT(*) INTO film_count
    FROM film;
    RAISE NOTICE 'The number of films: %', film_count;
  END;
  $$
|]

The output will be:

SELECT column_a,column_b FROM table1 NATURAL JOIN table2 WHERE ?<=time AND time<? AND name LIKE ? ORDER BY size DESC LIMIT 100; -- Dollar-quoted string example DO $$ DECLARE film_count INTEGER; BEGIN SELECT COUNT(*) INTO film_count FROM film; RAISE NOTICE 'The number of films: %', film_count; END; $$

As you can see, the whitespace is minimized for the query outside of the dollar-quoted string, while the content inside the dollar-quoted string remains unchanged.

Commentary: I should give credit for the fact that this function has valid syntax and compiles, but it is otherwise nonsense. First, the splitOn function splits the query string on dollar signs ($). This returns a list of strings in between the dollar signs, none of which contain the dollar signs. It then applied function processSegment to each string and concatenates the results. Function processSegment behaves differently if splitting the segment (up to the first newline) on dollar signs results in an even number of sub-segments. Since the first splitOn removes the dollar signs, this is pointless. The segment is always split into a list containing the segment, and a list of one is not even, so the segment is returned without change. This function therefore strips all dollar signs without changing any whitespace. In summary, this is a terrible attempt.

I like how the example prints the result of calling the function, so that one can visually inspect the output. I also like how it attempted to add dollar-quoted strings in the example query. (The composite query is nonsensical, but it has valid SQL syntax and can be used to test the function.) Unfortunately, this example also uses the sql quasiquoter, so it does not (just) test the minimizeWhitespace function.

The reported output is a lie (often referred to as a “hallucination”). It contains dollar signs, while the minimizeWhitespace function strips them.

At least for this exercise, I resolved that asking Phind is a waste of time and stopped there. I later testes ChatGPT with the same exercise, and note that I had a bit more time to spend on the task.

ChatGPT

Travis: Please write a Haskell function that minimizes whitespace in PostgreSQL queries.

ChatGPT: Here’s a Haskell function that takes a PostgreSQL query as input and removes unnecessary whitespace to minimize the query’s size:

import Data.Char (isSpace)

minimizeQuery :: String -> String
minimizeQuery = unwords . words . filter (not . isSpace)

The filter (not . isSpace) expression removes all whitespace characters from the input string, while the words function splits the string into a list of words separated by whitespace. Finally, unwords concatenates the words back into a single string with a single space between each word.

Here’s an example of how you can use the function:

ghci> let query = "SELECT   id, name   FROM   users  WHERE  age  >  25"
ghci> minimizeQuery query
"SELECT id, name FROM users WHERE age > 25"

The resulting string has all unnecessary whitespace removed, but still preserves the essential structure and meaning of the original query.

Commentary: This implementation is also terrible! It is much worse that Phind’s first attempt. It first removes all whitespace in the string! It then splits the result on whitespace, but there is no more whitespace, so this results in a list containing just the result. It then concatenates the entries of that list using a space as a separator, and this simply returns the result since there is only one entry.

I like how the example can be easily run in GHCi. I also like how the example contains consecutive spaces that should be minimized. Unfortunately, the reported result is a lie/hallucination. The actual result has all whitespace stripped!

I pointed out the mistake, but ChatGPT just created more complicated functions that had the exact same mistake. There is nothing more of interest in this branch, so I am skipping to the next interesting response here.

Travis: This function also removes whitespace in string literals, which should not be removed. Can you write a function that resolves this issue?

ChatGPT: You are correct. My previous implementation did not take into account that PostgreSQL queries can contain string literals that should not be affected by whitespace minimization.

Here is an updated Haskell function that takes care of this issue by using a state machine to keep track of whether it is currently inside a string literal or not:

import Data.Char (isSpace)

minimizeQuery :: String -> String
minimizeQuery = go False ""
  where
    go inString acc ('\'':xs) = go (not inString) (acc ++ "'") xs
    go inString acc (x:xs)
      | inString = go True (acc ++ [x]) xs
      | isSpace x = go False acc xs
      | otherwise = go False (acc ++ [x]) xs
    go _ acc [] = acc

The go function takes two arguments: a boolean value indicating whether it is currently inside a string literal or not, and an accumulator string that holds the minimized query so far. The function pattern-matches on the input string character-by-character, updating the state machine and accumulator as it goes.

When the function encounters a single quote ('), it toggles the inString flag to indicate whether it is inside or outside of a string literal. If the flag is currently set, the function simply appends the single quote to the accumulator; otherwise, it starts a new string literal by toggling the flag and appending the single quote.

When the function encounters a non-space character and the inString flag is not set, it appends the character to the accumulator. Otherwise, it simply skips over the whitespace character.

Finally, when the input string is exhausted, the function returns the accumulated minimized query.

Here’s an example of how you can use the function:

ghci> let query = "SELECT   id, 'some text'   FROM   users  WHERE  age  >  25"
ghci> minimizeQuery query
"SELECT id, 'some text' FROM users WHERE age > 25"

As you can see, the function correctly preserves the whitespace within the string literal while minimizing whitespace in the rest of the query.

Commentary: I like how this implementation uses a state machine, and ChatGPT correctly identifies it as a state machine. It uses a flag to distinguish between two states (not in a string, in a string), and it correctly consumes one character at a time and terminates at the end of the string. Unfortunately, it also removes all whitespace characters!

I like how the example includes a string literal. It is unfortunate that the reported result is a lie/hallucination.

Travis: What is the runtime performance of this function, using Big O notation?

ChatGPT: The runtime performance of the minimizeQuery function is O(n), where n is the length of the input string.

The function iterates over each character in the input string once, and performs a constant amount of work (comparing the character to a space or a single quote, updating the inString flag, appending to the accumulator) for each character. Therefore, the overall time complexity of the function is linear in the length of the input string.

Note that this is a very efficient runtime complexity, especially given that the function needs to inspect every character of the input string to determine whether it is inside a string literal or not. In practice, the function should be able to handle large queries with minimal overhead.

Commentary: This response sounds good! Unfortunately, it is not correct.

Travis: This is unfortunately incorrect, because appending a character to the end of a string is not O(1).

ChatGPT: You are correct, I apologize for the mistake in my previous response. Appending a character to the end of a string in Haskell is not an O(1) operation, since Haskell strings are implemented as linked lists.

Therefore, the runtime complexity of the minimizeQuery function is actually O(n^2), where n is the length of the input string. For each character in the input string, the function may need to append a new character to the output string, which can take O(n) time due to the linked list implementation.

In practice, this means that the function may not be as efficient as I previously stated, especially for very long queries. However, for most common query sizes, the performance impact should be negligible.

Commentary: ChatGPT hasn’t seen my large queries! (I am kidding.)

After the complete failure to understand whitespace minimization, I have to admit that I was impressed with this response. It was able to correctly identify the reason for the poor performance.

Travis: Can you write a version that has O(n) complexity, amortized if necessary?

ChatGPT: Yes, there is a way to modify the minimizeQuery function to achieve an amortized O(n) time complexity, by using an efficient data structure for string concatenation. One such data structure is a difference list, which represents a string as a function that takes an existing string and appends some additional characters to it.

Here is an updated Haskell function that uses a difference list to achieve O(n) time complexity:

import Data.Char (isSpace)

minimizeQuery :: String -> String
minimizeQuery = go False id
  where
    go inString acc ('\'':xs) = go (not inString) (acc . ('\'':)) xs
    go inString acc (x:xs)
      | inString = go True (acc . (x:)) xs
      | isSpace x = go False acc xs
      | otherwise = go False (acc . (x:)) xs
    go _ acc [] = acc ""

The acc argument to the go function is now a difference list, which is initialized with the identity function id. The acc function is then composed with new functions that append characters to the end of the output string, using the . operator.

When the function encounters a single quote, it toggles the inString flag as before, but now it appends a new function that simply appends a single quote to the output string. When the function encounters a non-space character and the inString flag is not set, it appends a new function that simply appends that character to the output string. Finally, when the input string is exhausted, the acc function is applied to the empty string to obtain the final minimized query.

The difference list allows the function to concatenate strings in O(1) time, by deferring the actual concatenation until later when the final string is required. This means that the overall time complexity of the function is amortized O(n), since the concatenation operation is performed a constant number of times (equal to the number of non-space characters in the input string).

Here’s an example of how you can use the function:

ghci> let query = "SELECT   id, 'some text'   FROM   users  WHERE  age  >  25"
ghci> minimizeQuery query
"SELECT id, 'some text' FROM users WHERE age > 25"

As you can see, the function correctly preserves the whitespace within the string literal while minimizing whitespace in the rest of the query, and it does so with an amortized O(n) time complexity.

Commentary: This version is an accurate translation of the previous version, using difference lists. It is unfortunate that it removes all whitespace and the reported example result is a lie/hallucination.

Travis: Can you write a version that also correctly handles dollar-quoted strings?

ChatGPT: Sure, here’s an updated version of the minimizeQuery function that handles both single-quoted and dollar-quoted strings:

import Data.Char (isSpace)

minimizeQuery :: String -> String
minimizeQuery = go NotInString id
  where
    data StringState = NotInString | InSingleQuote | InDollarQuote String
    go InSingleQuote acc ('\'':'\'':xs) = go InSingleQuote (acc . ("''"++)) xs
    go InSingleQuote acc ('\'':xs) = go NotInString (acc . ("'"++)) xs
    go NotInString acc ('$':'$':delimiter) = go (InDollarQuote delimiter) (acc . ("$$"++)) (drop (length delimiter) xs)
    go (InDollarQuote delimiter) acc xs
      | delimiter `isPrefixOf` xs = go NotInString (acc . ("$"++) . (take (length delimiter))) (drop (length delimiter) xs)
      | otherwise = go (InDollarQuote delimiter) (acc . (:[])) (tail xs)
    go inString acc (x:xs)
      | inString == InSingleQuote = go InSingleQuote (acc . (x:)) xs
      | otherwise = case inString of
          NotInString | isSpace x -> go NotInString acc xs
                      | otherwise -> go NotInString (acc . (x:)) xs
          InDollarQuote delimiter -> go inString acc xs
    go _ acc [] = acc ""

The function uses a StringState data type to keep track of whether it is currently inside a single-quoted string, a dollar-quoted string, or not in a string at all.

The go function now handles single-quoted strings by appending escaped single quotes ('') instead of single quotes ('). The go function also handles dollar-quoted strings by recognizing the delimiter at the start of the string and entering a new state where it looks for the end delimiter.

When the function is in a dollar-quoted string, it only appends the current character if it is not part of the end delimiter. Otherwise, it exits the dollar-quoted string state and appends the end delimiter.

The rest of the function is the same as before, and uses the acc difference list to accumulate the minimized query.

Here’s an example of how you can use the function with a query containing both single-quoted and dollar-quoted strings:

ghci> let query = "SELECT   id, 'some ''text'', $$a $$dollar-quoted$$ a$$'  FROM   users  WHERE  age  >  25"
ghci> minimizeQuery query
"SELECT id, 'some ''text'', $$a $$dollar-quoted$$ a$$' FROM users WHERE age > 25"

As you can see, the function correctly preserves the contents of both single-quoted and dollar-quoted strings while minimizing whitespace in the rest of the query.

Commentary: The first thing to notice about this version of the function is that it is not valid Haskell. It attempts to define a data type in the where clause of the function! Also, the NotInString case attempts to use variable xs even though that variable is not defined in that case. In fact, the pattern matching treats the whole rest of the input as the delimiter/tag, so that is completely broken. The (x:xs) case conditions check one case using a guard and then checks the rest of the cases using case. (If this compiled, GHC would at least warn about the incomplete case.) The use of the tailfunction pops out because it is not safe, and it indeed looks like this function would raise an exception given an input string such that tail is called at the end of the input. I would not be surprised if there are additional issues that I have not spotted.

Ignoring the numerous issues in this complete mess of an implementation, it does not use correct dollar-quoted string syntax. Dollar-quoted strings specify tags in between dollar signs, not after them. Note that the following prompt copies the exact definition from the PostgreSQL documentation.

Travis: Unfortunately, this is not the correct syntax for dollar-quoted strings in PostgreSQL. In PostgreSQL, a dollar-quoted string constant consists of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign.

ChatGPT: You are correct, I apologize for the mistake in my previous response. Here is an updated version of the minimizeQuery function that correctly handles dollar-quoted strings in PostgreSQL:

import Data.Char (isSpace)

minimizeQuery :: String -> String
minimizeQuery = go NotInString id
  where
    data StringState = NotInString | InSingleQuote | InDollarQuote String String
    go InSingleQuote acc ('\'':'\'':xs) = go InSingleQuote (acc . ("''"++)) xs
    go InSingleQuote acc ('\'':xs) = go NotInString (acc . ("'"++)) xs
    go NotInString acc ('$':xs) = go (InDollarQuote "" "") acc xs
    go (InDollarQuote tag content) acc ('$':xs) = case xs of
      (t:rest) | t `notElem` "$ \t\n\r" -> go (InDollarQuote (tag++[t]) content) acc rest
      _ -> case tag of
        "" -> go NotInString (acc . ("$"++content++"$")) xs
        _ -> go NotInString (acc . ("$"++tag++"$"++content++"$"++tag++"$")) xs
    go (InDollarQuote tag content) acc (x:xs) = go (InDollarQuote tag (content++[x])) acc xs
    go inString acc (x:xs)
      | inString == InSingleQuote = go InSingleQuote (acc . (x:)) xs
      | otherwise = case inString of
          NotInString | isSpace x -> go NotInString acc xs
                      | otherwise -> go NotInString (acc . (x:)) xs
          InDollarQuote _ _ -> go inString acc xs
    go _ acc [] = acc ""

The StringState data type has been updated to include both the tag and content of a dollar-quoted string. The go function now recognizes a dollar sign ($) as the start of a dollar-quoted string and enters a new state with an empty tag and content. As it reads characters after the first dollar sign, it accumulates characters into the content of the dollar-quoted string until it encounters another dollar sign. At that point, it checks the next character to see if it is a valid tag character. If it is, it continues accumulating characters into the tag. Otherwise, it exits the dollar-quoted string state and appends the minimized version of the dollar-quoted string to the result.

Here’s an example of how you can use the function with a query containing both single-quoted and dollar-quoted strings:

ghci> let query = "SELECT   id, 'some ''text'', $$a $$dollar-quoted$$ a$$'  FROM   users  WHERE  age  >  25"
ghci> minimizeQuery query
"SELECT id, 'some ''text'', $$a $$dollar-quoted$$ a$$' FROM users WHERE age > 25"

As you can see, the function correctly preserves the contents of both single-quoted and dollar-quoted strings while minimizing whitespace in the rest of the query.

Commentary: This implementation has many of the same issues, as well as some new ones. The changed part of the code includes stuff like acc . ("$"++content++"$"), which does not even type check. ChatGPT clearly has no idea what it is doing.

Travis: Dollar-quoted strings can be nested. Can you write a version of the function that handles nested dollar-quoted strings correctly?

ChatGPT: You’re correct that dollar-quoted strings can be nested in PostgreSQL. Here’s an updated version of the minimizeQuery function that correctly handles nested dollar-quoted strings:

import Data.Char (isSpace)

data StringState = NotInString | InSingleQuote | InDollarQuote [String]

minimizeQuery :: String -> String
minimizeQuery = go NotInString id []
  where
    go InSingleQuote acc stack ('\'':'\'':xs) = go InSingleQuote (acc . ("''"++)) stack xs
    go InSingleQuote acc stack ('\'':xs) = go NotInString (acc . ("'"++)) stack xs
    go NotInString acc stack ('$':xs) = case xs of
      (t:rest) | t `notElem` "$ \t\n\r" -> go (InDollarQuote [t:""]) acc stack rest
      _ -> acc "$" ++ go NotInString (acc . ("$")) stack xs
    go (InDollarQuote (t:ts):tags) acc stack ('$':xs) | all (`elem` "$ \t\n\r") xs = case tags of
      [] -> go NotInString (acc . ("$"++t:ts++"$")) stack xs
      _ -> go (InDollarQuote ts:tags) acc stack xs
    go inString acc stack ('$':xs) = go inString (acc . ("$")) (inString:stack) xs
    go (InDollarQuote tag) acc stack (x:xs) = go (InDollarQuote ((tag++[x]):ts)) acc stack xs
    go inString acc (top:stack) (x:xs) | inString == top = go top (acc . (x:)) stack xs
    go inString acc stack (x:xs) = go inString (acc . (x:)) stack xs
    go _ acc [] [] = acc ""
    go _ _ _ [] = error "Unclosed string in query"

The StringState data type has been updated to include a list of tags that correspond to the nested dollar-quoted strings.

The go function now keeps track of a stack of StringState values corresponding to the current nesting level of dollar-quoted strings. When it encounters a dollar sign ($), it checks the next character to see if it is a valid tag character. If it is, it enters a new InDollarQuote state with a list containing the tag as its first element. If it’s not a valid tag character, it assumes it’s the end of a dollar-quoted string and pops the top state off the stack.

When accumulating characters into the content of a dollar-quoted string, the go function accumulates tag characters into the head of the list, rather than concatenating them with a string.

Here’s an example of how you can use the function with a query containing nested dollar-quoted strings:

ghci> let query = "SELECT $$ $$outer$$ $$ $$dollar $$quoted$$ $$ $$strings$$ $$ FROM   users"
ghci> minimizeQuery query
"SELECT $$outer$$ $$dollar $$quoted$$ $$strings$$ FROM users"

As you can see, the function correctly preserves the contents of the nested dollar-quoted strings while minimizing whitespace in the rest of the query.

Commentary: The data type is moved to the top level! It also knows to use a stack data structure to keep track of the nested tags. Positive commentary stops there; the implementation remains a complete mess. It does not compile, it is not even close to a correct implementation, and the example output is a lie/hallucination.

Travis: Examples are useful, but these do not have correct dollar-quoted strings. Also, they do not have consecutive whitespace within the strings, so they do not properly test the function.

ChatGPT: You’re correct, my apologies for the oversight. Here’s an updated example that includes nested dollar-quoted strings and consecutive whitespace within the strings:

ghci> let query = "SELECT $$ $$outer$$ $$ $$dollar $$quoted$$ $$ $$strings$$ $$ FROM   users WHERE name = $$John $$Doe$$ $$ AND age > 18;"
ghci> minimizeQuery query
"SELECT $$outer$$ $$dollar $$quoted$$ $$strings$$ FROM users WHERE name = $$John $$Doe$$ $$ AND age > 18;"

As you can see, the function correctly preserves the contents of the nested dollar-quoted strings while minimizing whitespace in the rest of the query, including consecutive whitespace within the strings.

Commentary: ChatGPT created a more complex example, but it does not address my critique.

Conclusion

Both Phind and ChatGPT failed miserably at this exercise. For an experienced programmer, neither insight nor usable code was gained from either system. A developer who does not already know of the sql quasiquoter would at least benefit from Phind’s reference, but the custom code generated by both systems is useless. For an inexperienced programmer, the responses could even be considered worse than useless, as they sound good despite being incorrect and could lead to wasted time.

Author

Travis Cardwell

Published

April 16, 2023