Data Engineers Reveal Their Biggest Pain Points: Silos, Cleaning, and Context Switching
A founder's simple question to data engineers—"What's currently extremely painful & annoying to do in your job?"—unleashed a wave of candid feedback about the daily frustrations of working with data. The responses show that the most significant challenges are often not technical but are rooted in organizational structure, data quality, and inefficient workflows.
The Bureaucratic Gauntlet: Silos and Permissions
The most frequently cited and time-consuming problem is gaining access to data in the first place. Professionals described a frustrating gauntlet of "org silos, security, and permissions" that can stall a project before it even begins. A common scenario involves needing production data for analysis, but the team that owns the data is (understandably) unwilling to grant direct database access. Instead, they provide an API.
However, these APIs are typically designed for application use cases—accessing single records at a relatively slow rate—not for the bulk data extraction needed for exploratory analysis. This often leads to a painful process of trying to convince the other team to build a more appropriate bulk solution, sometimes only after the data engineer's repeated requests effectively perform a denial-of-service attack on the existing API.
The Never-Ending Task of Data Cleaning
Once access is granted, the next major hurdle is data quality. The "human factor" in data entry creates a seemingly endless task of cleaning and standardization. One of the most vivid examples shared was from a professional dealing with a dataset of 2.5 billion rows where the country name "Austria" was misspelled over 11,000 different ways due to accents, spaces, different encodings, and simple typos. This single example highlights a problem that scales across hundreds of other fields, from city names to product codes.
While some build custom solutions, like using a 'dictionary' of valid values to pre-validate entries or even auto-correct simple misspellings, these methods have limits. They can't easily catch more complex, combinatorial errors, such as a user entering a valid city and state that don't belong together (e.g., "London, Texas").
The Friction of Modern Tooling and "Work About the Work"
Even with clean data, the tools and workflows themselves introduce significant friction. One founder described the soul-crushing feeling of doing the same thing multiple times in different windows—what they termed "work about the work."
A typical request might follow this pattern:
- A data request comes in via Slack.
- The analysis is performed, and a conclusion is shared back in the Slack thread.
- A Jira ticket must be created to formally document the task and its outcome.
- A Notion document is written to preserve the findings for the record.
This cycle of copying and pasting context between communication layers (chat) and execution layers (tasks, docs) is seen as a major source of "fake work" and a waste of human potential.
Other tooling-related frustrations include:
- Excel Hell: The difficulty of working with proprietary Excel spreadsheets that lack documentation, hide functions, and are difficult to reuse or repurpose.
- Schema Exploration: A specific need was identified for a tool that can quickly summarize all the different schema "variations" within a single JSON column in a database, a common challenge with semi-structured data.
- Human-Centric Needs: Beyond tooling, there's a need for better human processes, such as having a dedicated Product Owner for data products to handle requirements gathering, a task that often falls to the engineers themselves.