Skip to content

Operation and Maintenance Event Notes

Neil MartinsenBurrell edited this page Mar 10, 2025 · 2 revisions

Notes on Events dealt with by O&M

Our O&M rotation takes notes in those weekly issues about what they have dealt with. Sometimes events occur that are more than just routine. To help with learning how to efficiently respond to future events, we take notes here in this page.

Copy the following template below and fill it out:

Date: YYYY-MM-DD

What we observed:

What happened and how did we figure it out?:

What did we do?:

Events

Date: 2025-03-10

What we observed: GSA inventory changes were not being harvested.

What happened and how did we figure it out?: New Relic logs showed many, many Solr errors from harvesting starting on 2025-03-06. We stopped the automated harvest process to help with debugging (#1563)We looked in catalog.data.gov for the ID that was leading to the errors. We discovered a single data.json from DOJ that was causing the Solr errors. That harvest job was preventing other harvest jobs such as GSA's from ever being harvested

What did we do?: We stopped the DOJ data.json from being harvested. That stopped the errors. We then re-started automated harvesting (#1564). We will make a follow-up ticket to figure out what is wrong with the DOJ source (#5124). We will also consider making an easier way to pause/restart automated harvesting without having to make pull requests that change the schedule.

Clone this wiki locally