Extracting numbers from the middle of a mixed string (text and numbers)

We often asked to make sense of imported data. Perhaps you are asked to extract information from within a string of data. Now in another post I showed how to extract numbers from the front of a string.
However, in this example I was asked to extract the middle numbers (starting with a 1 and going 5 digits). Well I broke out my mad crazy Excel skills and after an hour I came up with a solution utilizing 2 array functions, a Left, Mid, Find and Right function. I was smugly pleased with my result but it was not eloquent even though it solved my problem.
Line Description
85363D ATLANTA    18545  785050    369634011-10-05
85364D FLORIDA    11430  785050    369662011-10-05
85365D NY         11928  785050    363672011-10-05
85366D COLUS      13535  785050    810142011-10-05
85367D AMSTERDAM  14089  785050    810152011-10-05
85368D NEVA       14331  785050    369712011-10-05
85340D GEORGIA    14331  785050    369702011-10-05
If I had taken a moment to change the font to a true type (such as courier), I would have seen the following.
Line Description
85363D ATLANTA    18545  785050    369634011-10-05
85364D FLORIDA    11430  785050    369662011-10-05
85365D NEW YORK   11928  785050    363672011-10-05
85366D COLUMBUS   13535  785050    810142011-10-05
85367D AMSTERDAM  14089  785050    810152011-10-05
85368D NEVADA     14331  785050    369712011-10-05
85340D GEORGIA    14331  785050    369702011-10-05
With a courier font, all the columns that at first appeared to not line up, in fact did line up. In addition, in  every row that I needed, there was a common denominator (785050) (The above data is just a sample of the rows I needed - they were not grouped together in the original data). To use an old saying, "I was lost in the tall grass and could not see the obvious". This one column data that I was looking at was in a grouping of about 20 other columns and the outside distraction of those other columns helped obfuscate the pattern.
Well with this clear data I could scrap all my fancy array formulas and simple extract the data with the following code.
=IFERROR(MID(A2,(FIND(785050,A2,1))-7,5),"Not found")
I added the IFERROR because there were some records that did not have 785050.
So what does this formula do?
Well…
FIND(785050,A2,1)  - checks to see if the number 785050 is in my string that I am searching
MID(A2(  - uses the FIND to return the string I am looking for.
IFERROR – used to avoid #Error or N/A errors.
-7  - since I want to return the 5 digit number and I located the starting position of 785050, I needed to look to the left 7 characters to get my starting point.
Another way to get my answer would be to break out my data with Text To Columns. Since my data is aligned (and the record layout is apparent when I converted the text to courier), Text To Columns is another great way to break out my data without coding.
So what's the lesson to learn here? First always examine your data before you begin. You want to always compare apples to apples so convert the data to a true type font such as courier to help you identify patterns. 


No comments:

Post a Comment