Skip to content

Series data type doesn't reflect "best practices" for time-series schemas #18

@ixmatus

Description

@ixmatus

I love the aeson-esque interface that you've built but there's a glaring misstep in that the ToSeriesData is being treated a lot like tables of data instead of series of data.

Let's say I have a datatype that's an instance of ToSeriesData that looks something like this:

data Reading = Reading UTCTimeEpoch UUID DeviceType ReadingType Double
data DeviceType = Plug | Switch
data ReadingType = Watts | Volts | Temp

Let's say you have a series named "device_readings" that go into it. Works great at small-scale but the minute you reach millions of points you're suddenly hitting performance problems because InfluxDB isn't designed to handle that type of querying (SELECT uuid FROM device_readings WHERE ...) and filtering on, say, the device type column. You'll traverse the entire key space of that specific series to do that because underneath Influx is just a dumb key-value store.

If there's 20million keys in the device_readings series, that's really severe pain and you've just tremendously fucked yourself because migrating that data to another schema could take quite a bit of time...

This is my major beef with InfluxDB because they wanted to keep a "SQL like" interface to the data but the underlying model definitely will not handle the kind of queries that you CAN run on it. This is also their fault for not urgently writing up a document on "Schema Design".

TempoDB got it right. Your series name should contain the key, category, and attributes you want to "query".

So instead of a series name like: device_readings. It would instead look like: device_readings.2c9e4570-9b35-0131-c7ce-48e0eb16f719.Watts.Dimmer. You will then, efficiently, be able to query the data you want by being able to construct the key from known categories, ID's, and attributes. The datatype then looks extremely simple:

data Reading = Reading UTCTimeEpoch Double

What I would love to is a data type that can give us a structured and easy way of building series names from a key, a category, and some attributes! Which is what I wish this library was doing, instead of following a more table like model.

I'm going to throw together my ideas in a fork and see what you think of them. Because right now I'm building series names with functions and its ugly, I would rather do it with specialized data types and instances of a class like ToSeriesName or something similar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions