Skip to content

Cardano node memory leak in 1.3.0 and 1.4.0 #460

@johnalotoski

Description

@johnalotoski

Cardano node 1.3.0 is observed to consume increasing amounts of memory and CPU over time until the process OOMs. The following figure illustrates the typical observation:

cardano-node-1-3-0-leak

This is observed on all chains tested. The problem is observed when using the following flags (other combinations may also result in the issue):

--tracing-verbosity-normal --trace-block-fetch-decisions --trace-chain-db --trace-mempool --trace-forge +RTS -N2 -A10m -qg -qb -M3G -RTS

The following figure illustrates the heap profile from one example, where STACK is increasing with time:

cardano-node-heap-profile-example

With profiling and the -xc option, the following two logging-related exception traces are observed in the console:

*** Exception (reporting due to +RTS -xc): (IND_STATIC), stack trace: 
  Cardano.BM.Counters.Linux.readProcNet,
  called from Cardano.BM.Counters.Linux.takeMeasurements.selectors,
  called from Cardano.BM.Counters.Linux.takeMeasurements.\,
  called from Cardano.BM.Counters.Linux.takeMeasurements,
  called from Cardano.BM.Counters.Linux.readCounters,
  called from Control.Concurrent.Async.asyncUsing.\.\,
  called from Control.Concurrent.Async.asyncUsing.\,
  called from Control.Concurrent.Async.asyncUsing,
  called from Control.Concurrent.Async.async,
  called from Cardano.Config.Logging.startCapturingMetrics,
  called from Cardano.Config.Logging.loggingCardanoFeatureInit,
  called from Cardano.Config.Logging.createLoggingFeature,
  called from Main.initializeAllFeatures,
  called from Cardano.Common.TopHandler.toplevelExceptionHandler,
  called from Main.main

*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: 
  Control.Concurrent.Async.Timer.Internal.timerLoop.readCmd,
  called from Control.Concurrent.Async.Timer.Internal.timerLoop,
  called from Control.Concurrent.Async.withAsyncUsing.\.\,
  called from Control.Concurrent.Async.withAsyncUsing.\,
  called from Control.Concurrent.Async.withAsyncUsing,
  called from Control.Concurrent.Async.withAsync,
  called from UnliftIO.Internals.Async.withAsync.\,
  called from Control.Monad.IO.Unlift.withRunInIO,
  called from UnliftIO.Internals.Async.withAsync,
  called from Control.Concurrent.Async.Timer.Internal.withAsyncTimer,
  called from Cardano.BM.Data.MessageCounter.sendAndResetAfter,
  called from Control.Concurrent.Async.asyncUsing.\.\,
  called from Control.Concurrent.Async.asyncUsing.\,
  called from Control.Concurrent.Async.asyncUsing,
  called from Control.Concurrent.Async.async,
  called from Cardano.BM.Backend.Monitoring.spawnDispatcher,
  called from Cardano.BM.Backend.Monitoring.realizefrom,
  called from Cardano.BM.Backend.Monitoring.plugin,
  called from Cardano.Config.Logging.loggingCardanoFeatureInit,
  called from Cardano.Config.Logging.createLoggingFeature,
  called from Main.initializeAllFeatures,
  called from Cardano.Common.TopHandler.toplevelExceptionHandler,
  called from Main.main

When the logging module is disabled from cardano node using the TurnOnLogging = false flag or through direct code change, the leak does not appear.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingbyronRequired for a Byron mainnet: replace the old core nodes with cardano-node.priority highissues/PRs that MUST be addressed. The release can't happen without this;

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions