We often asked to make sense of imported data. Perhaps you are asked to extract information from within a string of data. Now in another post I showed how to extract numbers from the front of a string.
However, in this example I was asked to extract the middle numbers (starting with a 1 and going 5 digits). Well I broke out my mad crazy Excel skills and after an hour I came up with a solution utilizing 2 array functions, a Left, Mid, Find and Right function. I was smugly pleased with my result but it was not eloquent even though it solved my problem.
Line Description
85363D ATLANTA 18545 785050 369634011-10-05
85364D FLORIDA 11430 785050 369662011-10-05
85365D NY 11928 785050 363672011-10-05
85366D COLUS 13535 785050 810142011-10-05
85367D AMSTERDAM 14089 785050 810152011-10-05
85368D NEVA 14331 785050 369712011-10-05
85340D GEORGIA 14331 785050 369702011-10-05
If I had taken a moment to change the font to a true type (such as courier), I would have seen the following.
Line Description
85363D ATLANTA 18545 785050 369634011-10-05
85364D FLORIDA 11430 785050 369662011-10-05
85365D NEW YORK 11928 785050 363672011-10-05
85366D COLUMBUS 13535 785050 810142011-10-05
85367D AMSTERDAM 14089 785050 810152011-10-05
85368D NEVADA 14331 785050 369712011-10-05
85340D GEORGIA 14331 785050 369702011-10-05
With a courier font, all the columns that at first appeared to not line up, in fact did line up. In addition, in every row that I needed, there was a common denominator (785050) (The above data is just a sample of the rows I needed - they were not grouped together in the original data). To use an old saying, "I was lost in the tall grass and could not see the obvious". This one column data that I was looking at was in a grouping of about 20 other columns and the outside distraction of those other columns helped obfuscate the pattern.
Well with this clear data I could scrap all my fancy array formulas and simple extract the data with the following code.
=IFERROR(MID(A2,(FIND(785050,A2,1))-7,5),"Not found")
I added the IFERROR because there were some records that did not have 785050.
So what does this formula do?
Well…
FIND(785050,A2,1) - checks to see if the number 785050 is in my string that I am searching
MID(A2( - uses the FIND to return the string I am looking for.
IFERROR – used to avoid #Error or N/A errors.
-7 - since I want to return the 5 digit number and I located the starting position of 785050, I needed to look to the left 7 characters to get my starting point.
Another way to get my answer would be to break out my data with Text To Columns. Since my data is aligned (and the record layout is apparent when I converted the text to courier), Text To Columns is another great way to break out my data without coding.
So what's the lesson to learn here? First always examine your data before you begin. You want to always compare apples to apples so convert the data to a true type font such as courier to help you identify patterns.
No comments:
Post a Comment