Excel Professor: July 2014

As an analysis it’s my job to analyze data. Quite often I run across a common problem with dates exported from a mainframe job. Often the date is 6 characters long without any delimiter. Why oh why didn’t Y2K resolve this problem. Y2K??? you ask inquisitively?

For those not knowing, Y2K was a problem back in the late 1990s where computer programmers had to correct computer code. The problem was simple. When computers were new, memory was expensive so coders only represented the year with only 2 digits. Who would have thought that the code written in the late 70’s and early 80’s would still be in use?

As a result when the year 2000 rolled around, all the old code would show 00 for the year. We were told dire consequences would ensue; planes would drop from the sky, bank accounts would empty, worldwide panic. Fortunately none of that happened. But happily for thousands of programmers, secure jobs were available to verify millions of lines of code. The fix was simple; change the year to be 4 digits.

However there was also another fix, recode the handling of two digits so that the computer would know that 00 did not represent 1900, but rather 2000. And that leads me to today’s problem.

Whenever I see dates represented with only 6 digits I start by breaking the data down into columns for Month Day and Year. However I don’t know which column is the Month, Day or Year.

So I start by looking at the first column and see if there are numbers greater the 12. If there are, then the column can’t be the Month since there are only 12 months in a year. In this example one cell has a 13 in it so this column cannot represent a month.

Continuing with my example the third column also has numbers greater than 12 so that leaves the second column as my month.

Examples

03	09	17
12	06	11
10	05	14
07	07	13
09	03	13
03	11	06
03	06	27
10	06	09
05	06	01
05	08	28
04	05	23
01	08	30
08	07	15
05	06	28
11	12	31
13	06	07
07	01	03
08	05	01
04	04	30
11	08	09

So now the mystery is which is the day column and which is the year column. Here is where you need to know your data. With my work I mostly deal with data from the years 1995 through 2014. So 95 – 14. Column 1 has numbers 1 – 13. However column 3 has numbers that include 31, 30, 27 and 28. Hmmmm. It looks like a day column. And it’s easy enough to verify. If the second column is the month (and we have already determined that it is), then I can verify the month in the row with the 31 in the third column. If that month has 31 days, then I then know the day is the third column of data. In my above example I have a 12 in the second column (December).

So the second column is the Month and the 31 confirms that the third column is the day. That leaves the first column as the year.

So could I code this solution in Excel?

I start by identifying the max value in each column.

	A	B	C	D	E	F	G
1	3	9	17		Max Col 1	Max Col 2	Max Col 3
2	12	6	11		13	12	31
3	10	5	14		=MAX(A:A)	=MAX(B:B)	=MAX(C:C)
4	7	7	13
5	9	3	13

Well right off I know column 2 has to be month since it’s max value is 12 and column 1 and 3 max value is greater than 12.

I also see column 3 has a max value of 31. So that looks like the day.

So very quickly I can identify which columns are which.

Year Month Day

6 digit year analysis

Count value types in a column of data

03	09	17
12	06	11
10	05	14
07	07	13
09	03	13
03	11	06
03	06	27
10	06	09
05	06	01
05	08	28
04	05	23
01	08	30
08	07	15
05	06	28
11	12	31
13	06	07
07	01	03
08	05	01
04	04	30
11	08	09

03	09	17
12	06	11
10	05	14
07	07	13
09	03	13
03	11	06
03	06	27
10	06	09
05	06	01
05	08	28
04	05	23
01	08	30
08	07	15
05	06	28
11	12	31
13	06	07
07	01	03
08	05	01
04	04	30
11	08	09

03	09	17
12	06	11
10	05	14
07	07	13
09	03	13
03	11	06
03	06	27
10	06	09
05	06	01
05	08	28
04	05	23
01	08	30
08	07	15
05	06	28
11	12	31
13	06	07
07	01	03
08	05	01
04	04	30
11	08	09