Postgres’s invisible data or the curious case of the intangible length

A few days ago at Curve, our developers had some problems dealing with data coming from our database and they asked for help. Apparently, a query that was working in dev (TM), did not work as expected in production.

Performing a sum on a certain set of rows was succeeding, whether a simple select was mysteriously failing. In fact, it was supposed to return an integer such as  “3” of length “1” was actually returning a solid integer “3” and length (“7“). Strange, isn’t it?

We’re running a fairly updated version of PostgreSQL, a database that I personally really like. It’s a battle tested, 20 years in active development solid SQL engine.

So, I connected with PSQL and looked at it with our Data Engineer. Things were pretty much like this :

SELECT id, LENGHT(id)

FROM table

WHERE name = ‘john’;


3, 7

(1 rows)

Now, this is really strange, isn’t it? We were both a bit surprised about this: “Did we just stumble upon the Postgres bug of the decade?”. On top of this, we started performing simple actions on it this row, using standard SQL math operators, like SUM. Everything was working well! However, describing the column we discovered that, instead of being an integer, as we expected, it was a :

VARCHAR(36)

Pretty curious, isn’t it? At this point, we had an idea of what was going, and a simple google search helped us: (http://bit.ly/2vvDMNM). We re-executed the query with a simple casting and found our bug:

SELECT (encode(id::bytea, ‘escape’)), LENGHT(id)

FROM table

WHERE name = ‘john’;


\u203\u20, 7

(1 rows)

So, turns out that the portion of code that was doing INSERT, actually inserted way more than the, expected, integer, adding instead some Unicode chars parsed from data coming from a 3rd party (and maybe even the type of the column was done to accommodate this?).

These Unicode characters are not displayed by PSQL but are definitely stored in the server and served back to an application doing a SELECT.

But why did the math operators succeed? My guess is that this is due to the implicit type casting done by Postgres ( http://bit.ly/2fepTgS ).

In the end, probably not a complicated problem, but definitely a curious one 🙂

Advertisements