Don’t Waste Your Time With Big Data

Volume 139, No. 16, October 10, 2014

Can we improve with data? Not without starting with a theory, as well as knowing how it was collected, by whom, and in what context.

Take the current flavor of the month: BIG DATA. Sounds great! BIG DATA… SUPER COMPUTERS, everything you wanted to know since the beginning of time, without having to roll up your sleeves or leave your desk, like the VP of
Sales for a now defunct company that balked at having to travel to visit and talk to customers when I advised it.

“A Desk is a Dangerous Place from Which To View The World” John Le Carré
Garbage in garbage out, huge amounts of garbage in, huge amounts out.

I did an analysis for someone in the dairy industry in the sState of Wisconsin a number of years ago, right before the final phase-out of European subsidies. I looked at the amount of goat cheese imported in the US, the aim of which was to define the scale of potential opportunities for US producers.

Having soiled my hand at this type of research before, I was pretty sure the available data would be tainted, so I used what I call Triangulation, which, if you have read my columns religiously you will know, means to check at least three sources and then
• If one says one thing and the others something else, you really don’t know.
• If two say the same thing, either they are in collusion or you can have measured confidence.
• If all three say the same thing, there is probably some truth in it.

So I chose to look at the US Customs Data, the European Union data and the United Nations FAO statistics. Oh, and I called up some power players in the industry who imported and sold goat cheese who owed me some favors.

Human Nature
According to the US Import Database one amount of goat cheese was imported into the US the year before1. According to the European database, double that amount was exported to the US. FAO just copied the US figures.

But, according to my contacts, just one company imported more into the US than any of the sources I had checked. Why the disconnect? The US import data comes from importers who MUST voluntarily report. Since there is a big difference in how much import duty you pay depending on the item, and the classifications are ambivalent, it would be reasonable to assume some advantageous classification is taking place. IN fact, as I pointed out in my earlier column, the entire classification system was originally designed by the importers, at the request of the government. Hmmmm.

The EEC was phasing out years of subsidies, but being based on export, every single gram exported was reported by the producers! While goat cheese itself did not get a direct subsidy, in France, they had applied the subsidies strategically to lower logistics costs. By that time volume had grown where they were a moot point. At the time I did the study, you could move a piece of cheese from a farm in rural France all the way across the Atlantic to the docks in NY for the price it took to move it from the western half of the US.

FAO just took the statistic from the US, just take some data without knowing how well it was collected, a number I had seen as quoted in the media a number of times.

Scan Data
I then looked at reported supermarket sales in the goat cheese category. That would seem to work a bit better, since the supermarket companies simply sell their scan data to the services and have no vested interest in fudging or inflating it.

BUT for the fact that when I was a supermarket executive, in a very well run supermarket chain, we were only able to achieve only about 80 percent scan accuracy, which means that 20 percent of the items didn’t scan correctly or at all, and had to be hand keyed with no identifier… so the statistics that everyone uses are at best incomplete, and probability theory doesn’t work in this situation. You either need 100 percent or random sampling.

The easy assumption would be to assume that the other 20 percent was similar to the 80, as most people do, and therein lies a BIG LIE, because you can’t. All you can know is you don’t know, oddly enough because the sampling wasn’t homogenous or random. (another column, my loyal readers.) And it only included markets that report their data, all of them chains and most not in high volume specialty food markets.

I reported that there is no way to know how much goat cheese is imported into the US, but it is certainly more than what the Customs data reports, and concluded with

A THEORY: there was probably a great deal of room for disciplined American producers to replace imports on supermarket shelves once the European subsidies were phased out.

I was right. A great deal of imported goat cheese from disciplined producers has “replaced” imported. What was not foreseen was that many of those would be European companies who think more strategically that made “partnerships” with the American companies.

But the milk suppliers remain undisciplined, could not supply enough for demand for high quality milk, and an even larger amount of frozen curd is being imported into the US.

So even incomplete, altered, and perhaps inaccurate data at the service of a good theory was able to reasonably predict and help at least some people. Big or small, it is not just the data and its pedigree, it’s the knowing the context that can help you craft a good theory to test to make things get better. And here’s the catch.

One American company had already benefitted, Montchevre. I don’t know how they figured it out as it was before I collected the data, but I would venture it was more than luck.

What Data Should I Use?
Which begs the question, could there be a way to provide the data needed accurately? Sure, if the classification system was set up to match what really is and the data collected in every port of entry was based on randomly verified bills of lading, input into a super computer, integrated and collated and spit out and interpreted correctly.

Lots of work for a small producer to get this done, so stick to little data, from good sources and trust going to where your customers are and interacting with them. If you do choose data use data like actual quantities sold on invoice or at retail.

In the absence of actual, do your best. Make a theory and test it. If watching what is real, like money, pounds of cheese sold, etc., and your theory is right, it should foretell a change in the results. If not, you have a bum theory and go back to the drawing board, and always worry about the context, the meaning hidden in the data, not the data itself. In other words, take the measure back to where it is REAL.

Dan Strongin runs a training and consulting company focused on delivering affordable online solutions to everyday business problems, including his udemy course: Understand Your Business, Earn More Money. Dan can be reached via email at or by phone at (408) 512-1086, or you can visit and blog or get discounts on his courses on his site:

Dan Strongin encourages your comments regarding this column. Comments can be made anonymously to


Other Strongin Articles written for Cheese Reporter

dot LaClare Farms
dot Collaborationists in our Industry!
Risk Management vs. Risk Prevention
dot Jack Booted Cheese Thugs
dot Towards a Safer Food Supply
dot Lies, Damned Lies and Dairy Safety: How Poorly Applied Statistics Could Lead to the Worse Public Policy
dot Is Dairy Safe Is The Wrong Question, Part 1
dot Not All Data Is Information
dot Start From Where You Are
dot Learning About Your Customer
dot The Vision Thing
dot Customer Service? NOT!
dot Collaboration: The Road To A Better Future
dot Resolution
dot Water

dot In Memoriam: Ignazio Vella 1928-2011
dot Of Cheese, Seals, And Deming
dot In Their Own Words: Lettie Kilmoyer
dot In Their Own Words: Fritz Maytag
dot In Their Own Words: Paula Lambert
dot Show Me the Money: Cost Accounting
dot Cost Accounting Chokes, Part 2: Inventory

dot Cost Accounting Is Choking Your Business, Part 1
dot It Ain’t Over ‘til It’s Over
dot Raw Reason
dot A Story For The Holiday Season, Part II
dot A Story For The Holiday Season
dot Truth In Labeling
dot This Too Shall Pass or "What were we thinking?"
dot Marketing Language That Resonates
dot When Will We Ever Learn?
dot Cheese Competitions In The Context Of Marketing

dot Economy
dot Even The Best Laid Plans Go Astray
dot Root Causes: Communication
dot Partners
dot Diamond Cutting:
It's What You Don't Know That Can Hurt You
dot Integrity and Ethics
dot Pricing:  The Perceived Value
Designing the Effective Sell Sheet
Common Sense
It All Begins in The Mouth
Of Cars...

The Gathering Storm
As Our Industry Evolves, So Should Our Terminology:

Other Cheese Reporter Guest Columnists
Visit John Umhoefer
Visit Neville McNaughton

What do you think about 
Dan Strongin's Comments?*

Please tell us if you are a
Dairy product manufacturer 
Dairy marketer/importer/exporter
Milk producer
Supplier to manufacturers

*Comments will remain anonymous. 
Cheese Reporter retains the right to publish anonymous comments to continue the discussion of this editorial.  Comments do not necessary reflect those of Cheese Reporter Publishing Co. Inc.