Skip to content

Conversation

@ueli-g
Copy link

@ueli-g ueli-g commented Sep 26, 2024

This removes a defined prefix from measurement names which might otherwise be shared between many measurements in the same data bucket.

When writing to a range of different buckets, routing to the corresponding out_influxdb instances happens on tag matches. This change allows to match on tag prefixes, but strip them from the measurement name. This avoids having identical prefixes for all measurement names in the same data bucket.

To achieve this, read from char tag[] with an offset when writing the measurement name, provided the prefix matches the tag completely and the overlap is at most tag_length - 1 characters.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

fluent/fluent-bit-docs#1468

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Optional prefix-stripping for InfluxDB output: when configured, the specified prefix is removed from tags before they are emitted.
  • Configuration

    • Added a new per-output setting to specify the prefix to remove from tags (and its length is handled automatically).

@ueli-g ueli-g force-pushed the influxdb-strip-tag-prefix branch from d6e4d82 to cb44828 Compare September 26, 2024 14:16
@ueli-g
Copy link
Author

ueli-g commented Sep 30, 2024

Example configuration file for this change - how to write to different buckets in the same DB without adding measurement name prefixes:

[SERVICE]
    flush           1
    Daemon          off
    Log_Level       debug

[INPUT]
    Name        dummy
    Tag         foo.somedata
    Dummy             {"msg": "This is foo", "value": 1.3123}

[INPUT]
    Name        dummy
    Tag         bar.stream.importantmessage
    Dummy             {"msg": "completed", "ID": "1234", "tags": ["ID"]}

[INPUT]
    Name        dummy
    Tag         bar.stream.somesensor
    Dummy             {"value": 1, "tags": ["source", "yours"]}

[OUTPUT]
    Name          influxdb
    Match         foo.*
    strip_prefix  foo.
    Host          localhost
    Port          8086
    Bucket        foo-bucket
    Org           foobarorg
    HTTP_Token    my-super-secret-auth-token

[OUTPUT]
    Name          influxdb
    Match         bar.*
    strip_prefix  bar.stream.
    Host          localhost
    Port          8086
    Bucket        bar-bucket
    Org           foobarorg
    HTTP_Token    my-super-secret-auth-token

@ueli-g
Copy link
Author

ueli-g commented Sep 30, 2024

Debug log output

Fluent Bit v3.2.0
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __  
|  ___| |                | |   | ___ (_) |         |____ |/  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/09/30 08:25:33] [ info] Configuration:
[2024/09/30 08:25:33] [ info]  flush time     | 1.000000 seconds
[2024/09/30 08:25:33] [ info]  grace          | 5 seconds
[2024/09/30 08:25:33] [ info]  daemon         | 0
[2024/09/30 08:25:33] [ info] ___________
[2024/09/30 08:25:33] [ info]  inputs:
[2024/09/30 08:25:33] [ info]      dummy
[2024/09/30 08:25:33] [ info]      dummy
[2024/09/30 08:25:33] [ info]      dummy
[2024/09/30 08:25:33] [ info] ___________
[2024/09/30 08:25:33] [ info]  filters:
[2024/09/30 08:25:33] [ info] ___________
[2024/09/30 08:25:33] [ info]  outputs:
[2024/09/30 08:25:33] [ info]      influxdb.0
[2024/09/30 08:25:33] [ info]      influxdb.1
[2024/09/30 08:25:33] [ info] ___________
[2024/09/30 08:25:33] [ info]  collectors:
[2024/09/30 08:25:33] [ info] [fluent bit] version=3.2.0, commit=cb44828011, pid=7064
[2024/09/30 08:25:33] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2024/09/30 08:25:33] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/09/30 08:25:33] [ info] [cmetrics] version=0.9.6
[2024/09/30 08:25:33] [ info] [ctraces ] version=0.5.5
[2024/09/30 08:25:33] [ info] [input:dummy:dummy.0] initializing
[2024/09/30 08:25:33] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2024/09/30 08:25:33] [debug] [dummy:dummy.0] created event channels: read=24 write=25
[2024/09/30 08:25:33] [ info] [input:dummy:dummy.1] initializing
[2024/09/30 08:25:33] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only)
[2024/09/30 08:25:33] [debug] [dummy:dummy.1] created event channels: read=26 write=27
[2024/09/30 08:25:33] [ info] [input:dummy:dummy.2] initializing
[2024/09/30 08:25:33] [ info] [input:dummy:dummy.2] storage_strategy='memory' (memory only)
[2024/09/30 08:25:33] [debug] [dummy:dummy.2] created event channels: read=28 write=29
[2024/09/30 08:25:33] [debug] [influxdb:influxdb.0] created event channels: read=30 write=31
[2024/09/30 08:25:33] [debug] [output:influxdb:influxdb.0] host=localhost port=8086
[2024/09/30 08:25:33] [debug] [influxdb:influxdb.1] created event channels: read=32 write=33
[2024/09/30 08:25:33] [debug] [output:influxdb:influxdb.1] host=localhost port=8086
[2024/09/30 08:25:33] [debug] [router] match rule dummy.0:influxdb.0
[2024/09/30 08:25:33] [debug] [router] match rule dummy.1:influxdb.1
[2024/09/30 08:25:33] [debug] [router] match rule dummy.2:influxdb.1
[2024/09/30 08:25:33] [ info] [sp] stream processor started
[2024/09/30 08:25:34] [debug] [task] created task=0x7f7e1802d730 id=0 OK
[2024/09/30 08:25:34] [debug] [task] created task=0x7f7e1802d8b0 id=1 OK
[2024/09/30 08:25:34] [debug] [task] created task=0x7f7e1802da00 id=2 OK
[2024/09/30 08:25:34] [debug] [upstream] KA connection #42 to localhost:8086 is connected
[2024/09/30 08:25:34] [debug] [http_client] not using http_proxy for header
[2024/09/30 08:25:34] [debug] [upstream] KA connection #43 to localhost:8086 is connected
[2024/09/30 08:25:34] [debug] [http_client] not using http_proxy for header
[2024/09/30 08:25:34] [debug] [upstream] KA connection #44 to localhost:8086 is connected
[2024/09/30 08:25:34] [debug] [http_client] not using http_proxy for header
[2024/09/30 08:25:34] [debug] [output:influxdb:influxdb.0] http_do=0 OK
[2024/09/30 08:25:34] [debug] [upstream] KA connection #42 to localhost:8086 is now available
[2024/09/30 08:25:34] [debug] [output:influxdb:influxdb.1] http_do=0 OK
[2024/09/30 08:25:34] [debug] [upstream] KA connection #43 to localhost:8086 is now available
[2024/09/30 08:25:34] [debug] [out flush] cb_destroy coro_id=0
[2024/09/30 08:25:34] [debug] [task] destroy task=0x7f7e1802d730 (task_id=0)
[2024/09/30 08:25:34] [debug] [out flush] cb_destroy coro_id=0
[2024/09/30 08:25:34] [debug] [task] destroy task=0x7f7e1802d8b0 (task_id=1)
[2024/09/30 08:25:34] [debug] [output:influxdb:influxdb.1] http_do=0 OK
[2024/09/30 08:25:34] [debug] [upstream] KA connection #44 to localhost:8086 is now available
[2024/09/30 08:25:34] [debug] [out flush] cb_destroy coro_id=1
[2024/09/30 08:25:34] [debug] [task] destroy task=0x7f7e1802da00 (task_id=2)
[2024/09/30 08:25:35] [debug] [task] created task=0x7f7e18039050 id=0 OK
[2024/09/30 08:25:35] [debug] [task] created task=0x7f7e1802d090 id=1 OK
[2024/09/30 08:25:35] [debug] [task] created task=0x7f7e1802d950 id=2 OK
[2024/09/30 08:25:35] [debug] [upstream] KA connection #42 to localhost:8086 has been assigned (recycled)
[2024/09/30 08:25:35] [debug] [http_client] not using http_proxy for header
[2024/09/30 08:25:35] [debug] [upstream] KA connection #43 to localhost:8086 has been assigned (recycled)
[2024/09/30 08:25:35] [debug] [http_client] not using http_proxy for header
[2024/09/30 08:25:35] [debug] [upstream] KA connection #44 to localhost:8086 has been assigned (recycled)
[2024/09/30 08:25:35] [debug] [http_client] not using http_proxy for header
[2024/09/30 08:25:35] [debug] [output:influxdb:influxdb.0] http_do=0 OK
[2024/09/30 08:25:35] [debug] [upstream] KA connection #42 to localhost:8086 is now available
[2024/09/30 08:25:35] [debug] [out flush] cb_destroy coro_id=1
[2024/09/30 08:25:35] [debug] [task] destroy task=0x7f7e18039050 (task_id=0)
[2024/09/30 08:25:35] [debug] [output:influxdb:influxdb.1] http_do=0 OK
[2024/09/30 08:25:35] [debug] [upstream] KA connection #44 to localhost:8086 is now available
[2024/09/30 08:25:35] [debug] [out flush] cb_destroy coro_id=3
[2024/09/30 08:25:35] [debug] [task] destroy task=0x7f7e1802d950 (task_id=2)
[2024/09/30 08:25:35] [debug] [output:influxdb:influxdb.1] http_do=0 OK
[2024/09/30 08:25:35] [debug] [upstream] KA connection #43 to localhost:8086 is now available
[2024/09/30 08:25:35] [debug] [out flush] cb_destroy coro_id=2
[2024/09/30 08:25:35] [debug] [task] destroy task=0x7f7e1802d090 (task_id=1)

@ueli-g
Copy link
Author

ueli-g commented Sep 30, 2024

valgrind memcheck output

==17905== Memcheck, a memory error detector
==17905== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==17905== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==17905== Command: ./build/bin/fluent-bit -c ./test.conf
==17905== 
Fluent Bit v3.2.0
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __  
|  ___| |                | |   | ___ (_) |         |____ |/  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/09/30 08:36:39] [ info] Configuration:
[2024/09/30 08:36:39] [ info]  flush time     | 1.000000 seconds
[2024/09/30 08:36:39] [ info]  grace          | 5 seconds
[2024/09/30 08:36:39] [ info]  daemon         | 0
[2024/09/30 08:36:39] [ info] ___________
[2024/09/30 08:36:39] [ info]  inputs:
[2024/09/30 08:36:39] [ info]      dummy
[2024/09/30 08:36:39] [ info]      dummy
[2024/09/30 08:36:39] [ info]      dummy
[2024/09/30 08:36:39] [ info] ___________
[2024/09/30 08:36:39] [ info]  filters:
[2024/09/30 08:36:39] [ info] ___________
[2024/09/30 08:36:39] [ info]  outputs:
[2024/09/30 08:36:39] [ info]      influxdb.0
[2024/09/30 08:36:39] [ info]      influxdb.1
[2024/09/30 08:36:39] [ info] ___________
[.........]
==17905== 
==17905== HEAP SUMMARY:
==17905==     in use at exit: 0 bytes in 0 blocks
==17905==   total heap usage: 5,464 allocs, 5,464 frees, 12,504,833 bytes allocated
==17905== 
==17905== All heap blocks were freed -- no leaks are possible
==17905== 
==17905== For lists of detected and suppressed errors, rerun with: -s
==17905== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@ueli-g ueli-g force-pushed the influxdb-strip-tag-prefix branch from d4aa84c to c45887d Compare September 30, 2024 08:41
@ueli-g ueli-g marked this pull request as ready for review September 30, 2024 08:53
@ueli-g ueli-g changed the title influxdb: allow stripping of tag prefix out_influxdb: allow stripping of tag prefix Sep 30, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2025

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jan 1, 2025
@ueli-g
Copy link
Author

ueli-g commented Jan 6, 2025

this PR is pending for review

@github-actions github-actions bot removed the Stale label Jan 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2025

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Sep 7, 2025
Removes a configured prefix from measurement names

Signed-off-by: Ueli Graf <[email protected]>
@ueli-g ueli-g force-pushed the influxdb-strip-tag-prefix branch from c45887d to 79bc485 Compare September 29, 2025 13:16
@coderabbitai
Copy link

coderabbitai bot commented Sep 29, 2025

Walkthrough

Adds per-instance tag prefix stripping to the InfluxDB output plugin: a new strip_prefix config is stored as prefix/prefix_len, used at format time to optionally remove the prefix from record tags, and freed on plugin exit. (49 words)

Changes

Cohort / File(s) Change Summary
InfluxDB output: prefix handling
plugins/out_influxdb/influxdb.c, plugins/out_influxdb/influxdb.h
Add per-instance strip_prefix support: read/store prefix and prefix_len in init, conditionally strip the prefix when building InfluxDB line protocol (adjust header/key emission), add config_map entry, and free prefix in exit; extend struct flb_influxdb with char *prefix and prefix_len.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CFG as Config Loader
  participant OI as InfluxDB Instance
  participant FM as Formatter
  participant EM as Emitter

  CFG->>OI: cb_influxdb_init (read `strip_prefix`)
  Note right of OI: store `prefix` and `prefix_len`
  loop For each record
    FM->>FM: receive record tag
    alt tag startsWith(prefix) and tag.length > prefix_len
      FM->>FM: compute offset, produce tag_without_prefix
      Note right of FM #dff0d8: (new/changed path)
    else
      FM->>FM: use original tag
    end
    FM->>EM: influxdb_bulk_append_header(tag_used, len)
    EM-->>EM: emit line protocol
  end
  OI-->>OI: cb_influxdb_exit
  Note right of OI #f8d7da: free(prefix) if allocated
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble prefixes from the start,
A tidy tag is little art.
Hop—trimmed names glide into the stream,
Line protocol neat as a dream.
Rabbit cheers: clean data, hop and part! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "out_influxdb: allow stripping of tag prefix" is directly related to the main change in the changeset. The title clearly identifies the component (out_influxdb) and the primary functionality being added (ability to strip tag prefix from measurement names). The changeset confirms this purpose: it introduces a new strip_prefix configuration property and implements the logic to conditionally remove this prefix when constructing InfluxDB line protocol data. The title is concise, specific, and avoids generic or vague terminology. A teammate reviewing the history would immediately understand that this PR adds prefix stripping capability to the InfluxDB output plugin.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79bc485 and 2d115fa.

📒 Files selected for processing (2)
  • plugins/out_influxdb/influxdb.c (5 hunks)
  • plugins/out_influxdb/influxdb.h (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/out_influxdb/influxdb.c (2)
src/flb_output.c (1)
  • flb_output_get_property (1096-1099)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
🔇 Additional comments (4)
plugins/out_influxdb/influxdb.h (1)

59-61: LGTM: Fields added to support prefix stripping.

The new prefix and prefix_len fields are appropriately placed and will store the configured tag prefix for stripping during InfluxDB line protocol formatting.

plugins/out_influxdb/influxdb.c (3)

387-394: LGTM: Initialization handles both configured and default cases.

The code correctly reads the strip_prefix configuration property and defaults to an empty string when not specified, ensuring ctx->prefix is always valid.


622-624: LGTM: Proper cleanup of allocated prefix.

The cleanup correctly frees the allocated ctx->prefix with a NULL check, following the same pattern used for other dynamically allocated fields.


728-732: LGTM: Configuration property properly defined.

The strip_prefix configuration map entry is correctly defined with an appropriate description and follows the same pattern as sequence_tag.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
plugins/out_influxdb/influxdb.h (1)

59-62: Fields look fine; keep length type consistent with existing code.

No blockers. If you later use size_t with memcmp/strlen, cast appropriately to avoid sign/width issues when calling libc.

plugins/out_influxdb/influxdb.c (3)

136-141: Guard against invalid lengths when appending header.

With the above fix, prefix_offset is bounded. If you want an extra safety net:

-        ret = influxdb_bulk_append_header(bulk_head,
-                                          tag + prefix_offset,
-                                          tag_len - prefix_offset,
+        ret = influxdb_bulk_append_header(bulk_head,
+                                          tag + prefix_offset,
+                                          (int)(tag_len - prefix_offset),
                                           seq,
                                           ctx->seq_name, ctx->seq_len);

Explicit cast avoids accidental promotion if the callee expects int.


382-390: Check strdup() failure for strip_prefix.

Tiny robustness gap: ctx->prefix is used unconditionally; guard OOM.

     tmp = flb_output_get_property("strip_prefix", ins);
     if (!tmp) {
-        ctx->prefix = flb_strdup("");
+        ctx->prefix = flb_strdup("");
     } else {
-        ctx->prefix = flb_strdup(tmp);
+        ctx->prefix = flb_strdup(tmp);
     }
-    ctx->prefix_len = strlen(ctx->prefix);
+    if (!ctx->prefix) {
+        flb_errno();
+        return -1;
+    }
+    ctx->prefix_len = (int) strlen(ctx->prefix);

723-728: Config map entry reads well; consider clarifying docs.

Doc string could mention: “Only stripped when tag strictly longer than the prefix.” Matches implementation and avoids empty measurement names.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d21e8de and 79bc485.

📒 Files selected for processing (2)
  • plugins/out_influxdb/influxdb.c (5 hunks)
  • plugins/out_influxdb/influxdb.h (1 hunks)
🔇 Additional comments (1)
plugins/out_influxdb/influxdb.c (1)

617-620: Good lifecycle hygiene.

prefix is freed alongside other dynamic members.

Comment on lines +77 to 79
int prefix_match = 0;
int prefix_offset = 0;
msgpack_object map;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix OOB read on tag and per-record state leak (prefix_offset).

  • tag is a raw buffer with explicit length; strncmp(tag, …, ctx->prefix_len) can read past tag when prefix_len > tag_len.
  • prefix_offset is not reset inside the event loop; a prior match can incorrectly affect subsequent records.

Apply:

@@
-    char tmp[128];
-    int prefix_match = 0;
-    int prefix_offset = 0;
+    char tmp[128];
+    int prefix_offset = 0;
@@
-        prefix_match = strncmp(tag, ctx->prefix, ctx->prefix_len);
-        if (prefix_match == 0) {
-            if (tag_len > ctx->prefix_len) {
-                prefix_offset = ctx->prefix_len;
-            }
-        }
+        /* reset per-record */
+        prefix_offset = 0;
+        /* safe compare: only when prefix is non-empty and tag is longer */
+        if (ctx->prefix_len > 0 && tag_len > ctx->prefix_len) {
+            if (memcmp(tag, ctx->prefix, (size_t) ctx->prefix_len) == 0) {
+                prefix_offset = ctx->prefix_len;
+            }
+        }

And ensure memcmp is declared:

 #include <stdio.h>
+#include <string.h>

This prevents undefined reads and guarantees no negative length is passed to influxdb_bulk_append_header.

Also applies to: 129-139

🤖 Prompt for AI Agents
In plugins/out_influxdb/influxdb.c around lines 77-79 (and similarly 129-139),
fix the OOB read and state leak by treating tag as a length-delimited buffer:
obtain tag_len from tag.via.raw.size and only compare up to that length (do not
call strncmp with ctx->prefix_len if ctx->prefix_len > tag_len); use memcmp with
the minimum of the two lengths or explicitly require tag_len >= ctx->prefix_len
before matching to avoid undefined reads, and reset prefix_offset to 0 at the
start of each event/record iteration so previous matches do not carry over; also
ensure memcmp is declared by including the proper header (string.h) so the
comparator is defined, and guard calls to influxdb_bulk_append_header so no
negative length is ever passed.

@eschabell
Copy link

@ueli-g can you resolve the conflicts shown above please?

@github-actions github-actions bot removed the Stale label Oct 24, 2025
@ueli-g
Copy link
Author

ueli-g commented Oct 24, 2025

@eschabell thanks for the nudge. I did address the valid bug that prefix_offset needs to be reset in every iteration.

Could you have another look please and let me know if there is anything else I should address? I deliberately did not switch to memcmp as suggested by the AI (and some other nitpicks) to stay in line with the existing style rather than updating this everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants