Home » excel » string – reading dates from xls to csv through Perl

string – reading dates from xls to csv through Perl

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have a batch of excel files with lines like

1/13/04 21

I am trying to convert them to .csv, but find that the line is converted to

36537,21

It turns out this is a side-effect of excel’s storage rules. Excel should store dates as days since Jan 1, 1900. By that rule, this is the wrong integer, corresponding to Jan 12, 2001 not Jan 13, 2004 (which is the date meant by 1/13/04).

  • How on earth could Excel make that mistake?
  • And how can I get the raw unformatted value, sidestepping the conversion in effect here?

This is a rough sketch of the code:

my $xlsparser = Spreadsheet::ParseExcel->new();
my $xlsbook = $xlsparser->Parse('xls_test.xls');
my $xls = $xlsbook->{Worksheet}[0];
my $csv = '';

# then a loop over rows and columns with...
  my $cell = $xls->get_cell( $row, $col );
  $cellcon = $cell->unformatted();
  $csv .= $cellcon; 

In case my exposition isn’t clear enough or you can’t reproduce the issue, here is a minimal data set and script that reproduce it for me:

https://dl.dropboxusercontent.com/u/58760/softwareGrr/xls_example.pl
https://dl.dropboxusercontent.com/u/58760/softwareGrr/junk.xls

How to&Answers:

If you want to convert the Excel date serial value format 36537,21 to a time/date variables in perl, then you can use your own functions to convert the dates.
Below the functions

sub date2excelvalue {
  my($day1, $month, $year, $hour, $min, $sec) = @_;
  my @cumul_d_in_m = (0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365);
  my $doy = $cumul_d_in_m[$month - 1] + $day1;

  #
  full years + your day
  for my $y(1900..$year) {
    if ($y == $year) {
      if ($month <= 2) {

        #
        dont add manually extra date
        if inJanuary or February
        last;
      }
      if ((($y % 4 == 0) && ($y % 100 != 0)) || ($y % 400 == 0) || ($y == 1900)) {
        $doy++;#
        leap year
      }
    } else {#
      full years
      $doy += 365;
      if ((($y % 4 == 0) && ($y % 100 != 0)) || ($y % 400 == 0) || ($y == 1900)) {
        $doy++;#
        leap year
      }

    }
  }#
  end
  for y# calculate second parts as a fraction of 86400 seconds
  my $excel_decimaltimepart = 0;
  my $total_seconds_from_time = ($hour * 60 * 60 + $min * 60 + $sec);
  if ($total_seconds_from_time == 86400) {
    $doy++;#
    just add a day
  } else {#
    add decimal in excel
    $excel_decimaltimepart = $total_seconds_from_time / (86400);
    $excel_decimaltimepart = ~s / 0\. //;
  }
  return "$doy\.$excel_decimaltimepart";

}

sub excelvalue2date {
  my($excelvalueintegerpart, $excelvaluedecimalpart) = @_;
  my @cumul_d_in_m = (0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365);
  my @cumul_d_in_m_leap = (0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366);
  my @cumul_d_in_m_selected;
  my($day1, $month, $year, $hour, $min, $sec);
  $day1 = 0;#
  all days all years
  my $days_in_year;
  my $acumdays_per_month;
  my $daysinmonth;
  my $day;

  #
  full years + your day
  for my $y(1900. .3000) {
    my $leap_year = 0;#
    leap year
    my $leap_year_mask = 0;#
    leap year
    if ((($y % 4 == 0) && ($y % 100 != 0)) || ($y % 400 == 0) || ($y == 1900)) {
      $leap_year = 1;#
      leap year
      @cumul_d_in_m_selected = @cumul_d_in_m_leap;

    } else {
      $leap_year = 0;#
      leap year
      @cumul_d_in_m_selected = @cumul_d_in_m;
    }

    if (($day1 + (365 + $leap_year)) > $excelvalueintegerpart) {

      #
      found this year $y
      $year = $y;
      print "year $y\n";

      $days_in_year = $excelvalueintegerpart - $day1;
      $acumdays_per_month = 0;
      print "excelvalueintegerpart  $excelvalueintegerpart\n";
      print "day1  $day1\n";
      print "daysinyear $days_in_year\n";
      for my $i(0..$# cumul_d_in_m) {
        if ($i == $# cumul_d_in_m) {
          $month = $i + 1;#
          month 12 December
          $day = $days_in_year - $cumul_d_in_m_selected[$i];
          last;

        } else {

          if (($days_in_year > ($cumul_d_in_m_selected[$i])) && ($days_in_year <= ($cumul_d_in_m_selected[$i + 1]))) {
            $month = $i + 1;
            $day = $days_in_year - $cumul_d_in_m_selected[$i];
            last;
          }

        }

      }#
      end
      for $i months

      # end year
      last;

    } else {#
      full years
      $day1 += (365 + $leap_year);
    }

  }#
  end
  for years interger part comparator

  my $total_seconds_inaday;
  $total_seconds_inaday = "0\.$excelvaluedecimalpart" * 86400;

  $sec = $total_seconds_inaday;
  $hour = int($sec / (60 * 60));
  $sec -= $hour * (60 * 60);
  $min = int($sec / 60);
  $sec -= $min * (60);
  $sec = int($sec);
  return ($day, $month, $year, $hour, $min, $sec);

}
my $excelvariable = date2excelvalue(1, 3, 2018, 14, 14, 30);
print "Excel variable: $excelvariable\n";
my($integerpart, $decimalwithoutzero) = ($1, $2) if ($excelvariable = ~m / (\d + )\.(\d + ) / );
my($day1, $month, $year, $hour, $min, $sec) = excelvalue2date($integerpart, $decimalwithoutzero);
print "Excel Date from value: $day1, $month, $year, $hour, $min, $sec\n";

Enjoy it!

Answer:

The problematic line was

$cellcon = $cell->unformatted();

Unless someone can offer a better explanation, I’ll regard this as a bug. The line that I substituted was

$cellcon = $cell->Value;