### 6 digit year analysis

As an analysis it’s my job to analyze data. Quite often I run across a common problem with dates exported from a mainframe job. Often the date is 6 characters long without any delimiter. Why oh why didn’t Y2K resolve this problem. Y2K??? you ask inquisitively?

For those not knowing, Y2K was a problem back in the late 1990s where computer programmers had to correct computer code. The problem was simple. When computers were new, memory was expensive so coders only represented the year with only 2 digits. Who would have thought that the code written in the late 70’s and early 80’s would still be in use?

As a result when the year 2000 rolled around, all the old code would show 00 for the year. We were told dire consequences would ensue; planes would drop from the sky, bank accounts would empty, worldwide panic. Fortunately none of that happened. But happily for thousands of programmers, secure jobs were available to verify millions of lines of code. The fix was simple; change the year to be 4 digits.

However there was also another fix, recode the handling of two digits so that the computer would know that 00 did not represent 1900, but rather 2000. And that leads me to today’s problem.
Whenever I see dates represented with only 6 digits I start by breaking the data down into columns for Month Day and Year. However I don’t know which column is the Month, Day or Year.

So I start by looking at the first column and see if there are numbers greater the 12. If there are, then the column can’t be the Month since there are only 12 months in a year.  In this example one cell has a 13 in it so this column cannot represent a month.

Continuing with my example the third column also has numbers greater than 12 so that leaves the second column as my month.

Examples
 03 09 17 12 06 11 10 05 14 07 07 13 09 03 13 03 11 06 03 06 27 10 06 09 05 06 01 05 08 28 04 05 23 01 08 30 08 07 15 05 06 28 11 12 31 13 06 07 07 01 03 08 05 01 04 04 30 11 08 09

So now the mystery is which is the day column and which is the year column. Here is where you need to know your data. With my work I mostly deal with data from the years 1995 through 2014. So 95 – 14. Column 1 has numbers 1 – 13. However column 3 has numbers that include 31, 30, 27 and 28. Hmmmm. It looks like a day column. And it’s easy enough to verify. If the second column is the month (and we have already determined that it is), then I can verify the month in the row with the 31 in the third column. If that month has 31 days, then I then know the day is the third column of data. In my above example I have a 12 in the second column (December).

 11 12 31

So the second column is the Month and the 31 confirms that the third column is the day. That leaves the first column as the year.

So could I code this solution in Excel?

I start by identifying the max value in each column.

 A B C D E F G 1 3 9 17 Max Col 1 Max Col 2 Max Col 3 2 12 6 11 13 12 31 3 10 5 14 =MAX(A:A) =MAX(B:B) =MAX(C:C) 4 7 7 13 5 9 3 13

Well right off I know column 2 has to be month since it’s max value is 12 and column 1 and 3 max value is greater than 12.

I also see column 3 has a max value of 31. So that looks like the day.

So very quickly I can identify which columns are which.

Year  Month  Day