DWH issue under non-default Timezone configuration


#1

Hello, everyone.

This is a report of DWH issue and solution method.
Our Kaltura CE (13.12.0 and 14.1.0) servers use the “Asia/Tokyo” or “Japan” for Servers’ timezone and PHP’s timzone settings.
One system is a cluster (7 servers), and version of this system is 13.12.0.
Another system is a single server, and version of this system is 14.1.0.
In the both systems, the version of “kaltura-dwh” package is “kaltura-dwh-12.14.0-1.noarch”, and the following errors occurred in “/opt/kaltura/dwh/etlsource/execute/etl_hourly.sh”.

INFO  10-07 04:00:14,634 - Create output files - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
INFO  10-07 04:00:14,635 - Mapping input specification - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
INFO  10-07 04:00:14,743 - Enrich cycle_id and file_id - play - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
INFO  10-07 04:00:14,837 - iterate file - Opening file: /opt/kaltura/dwh/cycles/process/4/cak02bs.cc.yamaguchi-u.ac.jp-kaltura_apache_access_ssl.log-20180710-03
INFO  10-07 04:00:14,841 - parse bandwidth lines - Optimization level set to 9.
INFO  10-07 04:00:14,842 - parse playManifest line - Optimization level set to 9.
INFO  10-07 04:00:14,851 - parse playManifest line - Optimization level set to 9.
INFO  10-07 04:00:14,852 - decode http string - Optimization level set to 9.
INFO  10-07 04:00:14,857 - parse bandwidth lines - Optimization level set to 9.
INFO  10-07 04:00:14,865 - decode http string - Optimization level set to 9.
ERROR 10-07 04:00:19,058 - parse bandwidth lines - Unexpected error
ERROR 10-07 04:00:19,059 - parse bandwidth lines - org.pentaho.di.core.exception.KettleValueException:
Javascript error:
Could not apply the given format dd/MMM/yyyy:HH:mm:ss on the string for 09/Jul/2018:09:56:29 : Format.parseObject(String) failed (script#15)

        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.addValues(ScriptValuesMod.java:457)
        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.processRow(ScriptValuesMod.java:688)
        at org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.mozilla.javascript.EvaluatorException: Could not apply the given format dd/MMM/yyyy:HH:mm:ss on the string for 09/Jul/2018:09:56:29 : Format.parseObject(String) failed (script#15)
        at org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.java:109)
        at org.mozilla.javascript.Context.reportRuntimeError(Context.java:938)
        at org.mozilla.javascript.Context.reportRuntimeError(Context.java:994)
        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesAddedFunctions.str2date(ScriptValuesAddedFunctions.java:909)
        at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:161)
        at org.mozilla.javascript.FunctionObject.call(FunctionObject.java:413)
        at org.mozilla.javascript.optimizer.OptRuntime.callName(OptRuntime.java:97)
        at org.mozilla.javascript.gen.c15._c0(script:15)
        at org.mozilla.javascript.gen.c15.call(script)
        at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:398)
        at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3065)
        at org.mozilla.javascript.gen.c15.call(script)
        at org.mozilla.javascript.gen.c15.exec(script)
        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.addValues(ScriptValuesMod.java:376)
        ... 3 more

INFO  10-07 04:00:19,059 - parse bandwidth lines - Finished processing (I=0, O=0, R=5181, W=5180, U=0, E=1)
INFO  10-07 04:00:19,059 - process file - process file
INFO  10-07 04:00:19,060 - process file - process file
ERROR 10-07 04:00:19,060 - parse bandwidth lines - Unexpected error
ERROR 10-07 04:00:19,060 - parse bandwidth lines - org.pentaho.di.core.exception.KettleValueException:
Javascript error:
Could not apply the given format dd/MMM/yyyy:HH:mm:ss on the string for 09/Jul/2018:09:56:29 : Format.parseObject(String) failed (script#15)

        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.addValues(ScriptValuesMod.java:457)
        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.processRow(ScriptValuesMod.java:688)
        at org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.mozilla.javascript.EvaluatorException: Could not apply the given format dd/MMM/yyyy:HH:mm:ss on the string for 09/Jul/2018:09:56:29 : Format.parseObject(String) failed (script#15)
        at org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.java:109)
        at org.mozilla.javascript.Context.reportRuntimeError(Context.java:938)
        at org.mozilla.javascript.Context.reportRuntimeError(Context.java:994)
        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesAddedFunctions.str2date(ScriptValuesAddedFunctions.java:909)
        at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:161)
        at org.mozilla.javascript.FunctionObject.call(FunctionObject.java:413)
        at org.mozilla.javascript.optimizer.OptRuntime.callName(OptRuntime.java:97)
        at org.mozilla.javascript.gen.c19._c0(script:15)
        at org.mozilla.javascript.gen.c19.call(script)
        at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:398)
        at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3065)
        at org.mozilla.javascript.gen.c19.call(script)
        at org.mozilla.javascript.gen.c19.exec(script)
        at org.pentaho.di.trans.steps.scriptvalues_mod.ScriptValuesMod.addValues(ScriptValuesMod.java:376)
        ... 3 more

By these errors, the DWH did not work well, and “plays”, “views”, “bandwidth” etc. were not updated.
So that, we have modified “/opt/kaltura/dwh/etlsource/events/process/process_file.ktr” and “/opt/kaltura/dwh/etlsource/common/parse_date_to_dwh_timezone.ktr”.

“/opt/kaltura/dwh/etlsource/events/process/process_file.ktr” have been modified as follows:

# diff process_file.ktr.org process_file.ktr.new
3719c3719
< var parnterRegex = &quot;GET .*&#47;p&#47;([0-9]+)&#47;.*&quot;;
---
> var partnerRegex = &quot;GET .*&#47;p&#47;([0-9]+)&#47;.*&quot;;
3721c3721
< resArr = str2RegExp(http_string, parnterRegex);
---
> resArr = str2RegExp(http_string, partnerRegex);
3726c3726
<       activityDate = str2date(substr(raw_datetime,1),&apos;dd&#47;MMM&#47;yyyy:HH:mm:ss&apos;);
---
>       activityDate = str2date(substr(raw_datetime, 1, 20) + timezone_offset, &quot;dd&#47;MMM&#47;yyyy:HH:mm:ssZ&quot;, &quot;EN&quot;);
4004c4004
< eventTime = str2date(substr(raw_datetime,1),&apos;dd&#47;MMM&#47;yyyy:HH:mm:ss&apos;);
---
> eventTime = str2date(substr(raw_datetime + timezone_offset, 1, 20) + timezone_offset, &quot;dd&#47;MMM&#47;yyyy:HH:mm:ssZ&quot;, &quot;EN&quot;);
4409c4409
< var parnterRegex = &quot;GET &#47;p&#47;([0-9]+)&#47;sp&#47;[0-9]+&#47;playManifest&#47;entryId&#47;([a-zA-Z0-9_]+)&#47;.*&quot;;
---
> var partnerRegex = &quot;GET &#47;p&#47;([0-9]+)&#47;sp&#47;[0-9]+&#47;playManifest&#47;entryId&#47;([a-zA-Z0-9_]+)&#47;.*&quot;;
4411c4411
< resArr = str2RegExp(http_string, parnterRegex);
---
> resArr = str2RegExp(http_string, partnerRegex);
4413c4413
< if (resArr!=null &amp;&amp; resArr.length &gt; 0 &amp;&amp; host != &apos;cdnapi.kaltura.com&apos; &amp;&amp; host != &apos;cdnapisec.kaltura.com&apos;)
---
> if (resArr!=null &amp;&amp; resArr.length &gt; 0 )
4415c4415
<       playDate = str2date(substr(raw_datetime,1),&apos;dd&#47;MMM&#47;yyyy:HH:mm:ss&apos;);
---
>       playDate = str2date(substr(raw_datetime, 1, 20) + timezone_offset, &quot;dd&#47;MMM&#47;yyyy:HH:mm:ssZ&quot;, &quot;EN&quot;);

And, “/opt/kaltura/dwh/etlsource/common/parse_date_to_dwh_timezone.ktr” have been modified as follows:

# diff parse_date_to_dwh_timezone.ktr.org parse_date_to_dwh_timezone.ktr.new
294,295c294,295
< desiredTimeZone = java.util.TimeZone.getTimeZone(getVariable(&apos;DataTimeZone&apos;, &apos;America&#47;New_York&apos;));
< dateFormat = java.text.SimpleDateFormat(&apos;[dd&#47;MMM&#47;yyyy:HH:mm:ssZ]&apos;,java.util.Locale.ENGLISH);
---
> desiredTimeZone = java.util.TimeZone.getTimeZone(getVariable(&apos;DataTimeZone&apos;, java.util.TimeZone.getDefault().getID()));
> dateFormat = java.text.SimpleDateFormat(&quot;[dd&#47;MMM&#47;yyyy:HH:mm:ssZ]&quot;,java.util.Locale.ENGLISH);
297,298c297,298
< calculatedDateTime = substr(dateFormat.format(dateFormat.parse(raw_datetime + timezone_offset)), 1, 20);
< eventTime = str2date(calculatedDateTime, &apos;dd&#47;MMM&#47;yyyy:HH:mm:ss&apos;);</jsScript_script>
---
> calculatedDateTime = substr(dateFormat.format(dateFormat.parse(raw_datetime + timezone_offset)), 1, 26);
> eventTime = str2date(calculatedDateTime, &quot;dd&#47;MMM&#47;yyyy:HH:mm:ssZ&quot;, &quot;EN&quot;);</jsScript_script>

The DWH issue has been resolved by the above corrections.

Regards