Be constantly curious
Ask questions, Don’t be afraid to ask strange questions (if they offend, keep them to yourself but question anyway), in data analysis is good to ask, the apparetly obvious things, to question the established way and the assumptions. Assumptions/limitations/status change over time, questions if existing assumptions still stand. Think child like: why why why.
Frequently ask: I wonder what…, I wonder how… Have a real wish to understand things.
Think like a child
Think like a child, ask a lot, question obvious things, adults overcomplicate matters, a child looks for simple explanations.
Think small, big problem space has too much complexity, smaller is easier to tackle and quicker to get to results.
Go for virgin territory, in already tried out fields is less likely to be able to find new things, unless there are new ways to see it.
Try to understand by removing emotion biases. Seeing it detached, from the outside is also required to be impartial.
Crime, deception, ugly and wrong things in general, bring loads of emotion, seek to understand why it happens without emotion. There is a reason. I would argue that in general any person put in some situation (current incentive) with same upbringing (past history), with the same physiology (a mental illness, for example) would do exactly the same.
Saying “i don’t know” is a good thing
Is important to say i don’t know (even if privately). People are biased against saying “i don’t know”, because can be seen as a sign of lack of knowledge, but only after recognizing that we don’t fully understand something we can then actually start the process to find it out the truth.
Future predicting is tricky
Future guesses are overrated and less accurate as they seem.
We sometimes hear of a guess that predicted correctly the future, and praise it to no end. What we are missing are all the wrong guesses that we never hear about. Wrong guessing (in general) has no costs, people often forget prediction when wrong, but praise when right.
In essence: incentives are wrong, and biased toward positive guesses
Only way to know if future predicting is accurate is to keep track of both wrong and right guesses. Some studies mentioned in think like a freak book, say they are about the same as random guesses.
A key to learn is feedback, we need to know if on the right path, have a success measure.
Experiments (A/B tests) are a way of learning. Sometimes a experiment where people know they are part of an experiment might influence the outcome. Observational Experiments are better, when we can infer something from a naturally occurring change, for example a particular cold winter would allow to compare behavior to a previous normal winter.
Look up literature on how to properly setup experiments, there are specific techniques that help: double blind experiments, A/A test, etc…
One of the cornerstones on why things happens. People respond to incentives.
Thats the whole way governs work, they change incentives so that people change habit and way of doing things.
More tax -> less purchases.
Less paperwork to create a company -> more companies will be created.
Lower the salaries for school teachers -> they will move to a better paying job when they can and potentially leave less able teachers in the teaching system (?)
Find Root Causes
Look for Root causes, not symptoms. Where is it coming from, where are the incentives, whats the past history on it, what influenced it, try to surface real background root causes.
Thing always have a reason to be, an explanation, that is often:
A consequence of:
- Past history
- Current incentives
- (sometimes) other variables
Variables might be unknown but do trust that they exist. (trust in science) And part of the work in data analysis is finding and verifying the variables.
past history example
A reason for many armed conflicts in Africa, is because it was divided artificially by colonizers, leaving different ethnic groups in same “country”, naturally ends up in war.
Structured data analysis
Have a structured way to approach data analysis problems. Be aware of a collection of tools and techniques that allow to tackle the problem.
Get time to think, blocked thinking time
Use a structured way to setting up the problem: properly define and re-define the problem, avoid the noise, try to well understand.
Tacking a problem
(remember the hot dog eating competition, from think like a freak book) Practice it, from the end user pov, question current practice, look for improvements and experiment them out in a scientific way, measure, analyze, ignore mental barriers (the current known limits), mental barriers are a self-made obstacle.
When working with numbers, is important to be exact, have a sharp eye, be detailed and organized. Is very easy to miss something in the sea of numbers figures. Need to put on the OCD hat.
Look for the truth
This is a key part of doing data analysis. Keep it in mind and keep reminding it.
As the analysis gets more involved, as more emotions came into play, as people argue on different sides of it, as politics plays starts to influence sides, etc… The truth is the fixed solid ground that the analyst can (and should) hold on to and safely pursue.
Set Behavior Traps
Teach the garden to weed itself
Sometimes is possible to find creative ways to create traps where you can calculate that is very likely a person in fault will be caught up on.
Tries to tap into predictable people behaviors.
Examples: Brown M&M to check if people have fully read the conditions for Van Halen’s concerts, freak book suggesting terrorists to get life insurance, to make them stand out for a strong positive identification.
When telling a finding or trying to get resources required for an involved analysis, is needed to know how to pass on the message.
Is important to understand people, and to be able to communicate well.
When approaching someone:
- Give downsides - there are always some, be honest.
- Take into account the other persons opinion, might reveal new ideas also.
- Avoid the name calling, negative comments push people into defensive.
- Most important: Tell a story - That’s the best way to communicate, teach a concept in a memorable way.
Quitting and Investment
Only after trying many flavors we can then choose the best one.
Beware of the sink cost fallacy. Quitting forces going into new places and often opens up new possibilities.
Investment science, is about buying into many properties and expect some of them to success and some of them to fault (and need quitting).
- Any of the freakonomics book
- Think like a freak for a how-to
Imagine you toying around with some idea and come up with a new question, whats the size of that, whats the total impact, could this be relevant, i wonder how many times this happens ?
“I would imagine to be very low” - For example. This is a guesstimate, even if very somewhat vague, is the quickest, intuition driven way to size up something.
Guesstimate are useful in:
- quick up front best guesses to sizes or impacts before looking at the data. - time saver
- And also way to calculate values we don’t have data for. - too expensive data to get, sometimes a key factor
Often, people develop this skill naturally in specific areas when repeatedly exposed to it, for example imagine you want to keep track of caloric intake, after a year of using an application to look up each food calories, you will start naturally memorizing and developing intuition for how much calories are in each food type.
Intuition about the problem and creativity plays a big role, but with practice and structure this skill can be further developed and be a great tool to have. Also is fun and seems to be popular in modern times job interviews, as a way to evaluate creative and organized reasoning.
- Bound values: do a guess of upper and lower limits, start with extreme values, and keep closing the gap until not possible to know sure a thinner. Example: whats the size of a 10 story high building ? Well, for sure each floor has more than 1 meter high and less than 5m, so between 10m and 50m. Has to accommodate people at least 2m high but likely no more than 4m, so tighter guess is between 20m and 40m, etc…
- Proxy, when not possible to get the data directly of what looking for, what is the closest data i have for it ?
- Be aware of common sizes, that can be used as reference. Like if in the business of online marketing be aware whats the typical conversion rate from an email campaign for example.
- Practice calculating numbers without a calculator, it will speed up guesstimate ability.
- Data Analysis with Open Source Tools
comments powered by Disqus