Deploying Maxwell’s Demon: QA Automation In Droughty
Entropy is a bit of a pain, quite literally. As a 32 year old I’m already feeling the effects of it here and there. The effects of entropy are felt by everyone and everything, from people trying to solve perpetual motion, to your charging cables mysteriously tangling themselves in your drawer despite never being touched… and more specifically in our world, trying to keep data in check.
The thing about data testing versus actual QA — tests are these neat little boxes where data either fits or it doesn’t. But real QA? The kind where a business user actually looks at the data? That’s messy. That’s where someone in finance squints at a dashboard and says “hang on, that can’t be right, our margins never look like that in Q4.” They’re not checking if values match a pattern; they’re using their hard-earned domain knowledge to spot things that just feel…off.
This is where entropy really starts to bite. Your dbt tests are all green, your CI/CD pipeline is fine, but then someone who actually knows the business looks at the output and spots something that slipped through all those carefully crafted boolean gates. These are the moments that show us how far traditional testing is from real quality assurance.
In this article, I’m introducing something I’ve been working on: Droughty QA. It’s a set of AI agents that think more like your domain experts, bringing some actual nuance to data validation. These agents can handle those fuzzy, context-dependent quality rules that are impossible to cram into a traditional test. What’s more, it integrates with LangSmith & LangGraph for observability and tracing.
But before diving into the implementation, let’s explore why traditional testing falls short of true QA, and how AI might help us capture more of that human intelligence in our quality processes.
LLMs, Entropy and Maxwell’s Demon
In thermodynamics, Maxwell’s Demon is a thought experiment where a tiny being could theoretically defy entropy by sorting molecules between two chambers, creating order from chaos. This might seem abstract, but it’s exactly what we do in data engineering. Just as Maxwell’s Demon sorts fast and slow molecules, we sort, clean, and organise data to maintain order in our data systems.
The parallel goes deeper. Maxwell’s Demon needs energy to function — it can’t create order for free (that would violate thermodynamics). Similarly, maintaining data quality requires constant human energy and attention. As engineers, we’ve traditionally been the “demons” in our data systems, expending cognitive energy to keep entropy at bay through careful modelling, testing, and QA processes.
But here’s where it gets interesting. The introduction of LLMs is fundamentally changing this dynamic. Their ability to understand context and nuance, to interpret business rules semantically rather than literally, means we can start automating the kind of intuitive quality checks that previously required human judgment. These AI systems can now act as supplementary “demons” in our data infrastructure, helping to maintain order without the constant drain on human attention and energy. They derive their sorting power from both computational resources and human-created training data, creating a new kind of hybrid entropy-fighting system.
Think about traditional data QA. Humans have to manually inspect data, spot patterns, and maintain quality — it’s energy-intensive and often inconsistent. LLMs, on the other hand, can tirelessly process and validate data according to complex, nuanced rules. They’re not replacing human judgment, but rather amplifying our entropy-fighting capabilities in the digital realm.
How This Impacts Businesses
As I have explored previously, data quality is really about iteration and finding your ‘final vocabulary’. Doing this requires end-user QA and end-user QA takes forever. Often, a QA cycle looks like this:
Assumptions of the data aren’t even collected
If they are collected, it’s a fast thinking exercise which is rushed and incomplete
Data is put through the analytics engineering sausage machine and put into QA
QA’ing data is boring therefore the business user procrastinates QA’ing. This can lead to a sort of QA apnea much like its cousin email apnea.
When the data has been QA’ed, it’s kind of first pass, LGTM level of rigour
Data gets put into production. When someone’s genuine attention is spent on it, issues are found.
The reality is, in our increasingly busy world where everyone’s attention is pulled in multiple directions, there are also fundamental cognitive patterns that make data assessment challenging, like our tendency to focus on single causes rather than seeing broader patterns. QA is no different, an end-user will often find a single issue with the dataset and bat it back and the engineer will assume this will be the panacea that fixes all future issues.
The exciting thing about AI is it is great at finding distributed causes, it brings a different kind of consistency to QA — while not perfect, it can systematically apply the same level of scrutiny across massive datasets without getting fatigued. As long as you have compute, you can keep running these checks.
Droughty QA, what it does and how to use it
Okay, so that’s why I have worked on this feature. I’ll now run through what it does and how to use it.
Either development or production data is held in BigQuery. End-user or developer QA assumptions are stored in the droughty_qa_assumptions.yaml file, a set of AI agents are called upon to QA the data against the assumptions, the traces of these are sent to LangSmith & LangGraph for observation and analysis.
It’s very simple to use. Document your asserts thusly:
datasets: looker_assets: tables: wh_core_customers_dim: columns: customer_email: - expectation: "Check if all customer emails are in valid email format, does it expose PII and i'd like it to be infortmative and suitable for a marketing persona" customer_pk: - expectation: "Check if all customer pks are unique" persona_type: - expectation: "Can you check against ISO 8000 standards and ensure Check if all customers have a persona"
When you have droughty installed with your profile, you simply run in the CLI:
droughty qa
You can see the logs and traces within LangSmith once you have set up your LangSmith API key within Droughty by adding it to your droughty profile using this yaml key: langsmith_secret
. In the LangSmith interface, you can see the enquiry that the AI agent underwent, the detail of the assumptions and the evidence, and whether it passed the QA. This observability is particularly powerful when combined with tooling such as Dagster, enabling real-time data quality monitoring in-flight.
You might be wondering, how is this any different than just a more verbose, non-deterministic dbt test or DataForm assertion? The QA from the agents is effectively semantic. This is a lot more nuanced and natural than a true/false test and this is much more like a human QA. The QA agents better understand contextuality and congruence. I can write assumptions such as “Check if all customer emails are in valid email format, does it expose PII and i’d like it to be informative and suitable for a marketing persona”. There’s a subtlety around this assumption that a boolean test just cannot test for.
When you run the QA, you’ll see output like this:
Violations Found: Column: customer_pk Expectation: All customer PKs are anonymized, and they all start with the letter P. Evidence: All values in the 'customer_pk' column are integers, none of them start with the letter P. Met Expectations: Column: customer_email Expectation: Check if all customer emails are in a valid email format, ensure it does not expose PII, and make it informative and suitable for a marketing persona. Evidence: All entries in the 'customer_email' column follow the format 'customer[number]@example.com', which is a valid email format. The emails do not expose any personally identifiable information (PII) as they only contain a generic 'customer' identifier and a number. This format is also suitable for a marketing persona as it allows for individual identification of customers. Column: persona_type Expectation: Check against ISO 8000 standards and ensure all customers have a persona. Evidence: All entries in the 'persona_type' column have a persona assigned. The personas are 'Bargain Hunter', 'Casual Shopper', 'Loyal Customer', and 'Fashion Enthusiast'. There are no null or missing values. The data type is object (string), which is compliant with ISO 8000 standards.
Notice how the QA output reads more like feedback from a business user than a test result. The agents don’t just tell you whether something passed or failed — they provide context, evidence, and reasoning that a domain expert would care about. This semantic understanding is what sets it apart from traditional boolean tests.
Interested? Find Out More!
From the testing I have done to date, this QA agent is really useful. It could help reduce entropy within your data platform, decrease QA and save the people who have to perform it from ennui. It’ll be available within the 0.14.0 release at github.com/rittmananalytics/droughty.
Rittman Analytics is focused on deploying high-quality data platforms, and we’re pushing the boundaries for QA. That this is part of our commitment to building high-quality, high-value data platforms for our clients.
We’re authorised delivery partners for Google Cloud along with Oracle, Segment, Cube, Dagster, Preset, dbt Labs and Fivetran and are experts at helping you build financial analytics and scalable, cloud-hosted data stacks that are designed for your organisation’s needs, use-cases and budget.
If you’re looking for some help and expertise with your Looker implementation or would just like to talk shop and share ideas and thoughts on what’s going on in your organisation and the wider data analytics world, contact us now to organise a 100%-free, no-obligation call — we’d love to hear from you!