COUNTIF to identify patterns in your data

Count If can be used to identify patterns in your data. For instance… It’s quite common to have addresses not broken out neatly. This can often be the case with imported data.
In this example I would like to know how many records of addresses are from North Carolina. However my data is not consistent. The state could be referenced as NC or North Carolina. It could also be misspelled as in cell A6 or joined together as in cell A5.
I can use the =COUNTIF formula with wildcards to give me my answer.
However I just can’t rush into my formula. I need to examine my data to identify the best solution.  If I was to search for North, I would get an incorrect count since there is an instance of the word North in Cell A8 that is not part of the state name North Carolina. Also Cell A3 does not have the word North in it.

If I search for the word Carolina with the below formula I also get an incorrect answer as there is a typo for Carolina in cell A6 (don’t you just hate typos).

=COUNTIF(A3:A8,"*Carolina*") + COUNTIF(A3:A8,"*NC*")

So I hedge my formula to avoid the common typos for my search and use the following formula…
=COUNTIF(A3:A8,"*C*ina*") + COUNTIF(A3:A8,"*NC*")
Basically I want to see if there is any short name reference to North Carolina (NC) or if there is any portion of Carolina in the range.
Now data validation is always key when you are asked to produce hard numbers. However using wildcards can help eliminate bad data from your results. 

How would you have approached the solution to this problem?

No comments:

Post a Comment