Running dwh_plays_views_sync.sh more than once per day

david.hahn1 · January 16, 2019, 5:11pm

Hey all,

“out of the box”, the dwh_plays_views_sync.sh script runs once per day at 10 AM.

What should be considered if you wanted to run this more frequently? Maybe once per hour?

My purpose is to give the users more up to date information on number of plays and last played date.

jess · January 16, 2019, 7:23pm

You can edit /etc/cron.d/kaltura-dwh [that’s a symlink that points to /opt/kaltura/app/configurations/cron/dwh] and play with the execution times.

For instance:

6 * * * * kaltura /opt/kaltura/dwh/etlsource/execute/etl_hourly.sh -p /opt/kaltura/dwh -k /opt/kaltura/pentaho/pdi/kitchen.sh
3 1-22/2 * * * kaltura /opt/kaltura/dwh/etlsource/execute/etl_update_dims.sh -p /opt/kaltura/dwh -k /opt/kaltura/pentaho/pdi/kitchen.sh
56 1-22/2 * * * kaltura /opt/kaltura/dwh/etlsource/execute/etl_daily.sh -p /opt/kaltura/dwh -k /opt/kaltura/pentaho/pdi/kitchen.sh
33 12 * * * kaltura /opt/kaltura/dwh/etlsource/execute/etl_perform_retention_policy.sh -p /opt/kaltura/dwh -k /opt/kaltura/pentaho/pdi/kitchen.sh
47 * * * * kaltura /opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh >> /opt/kaltura/dwh/logs/dwh_plays_views_sync.log

Note that if you make changes to this file, you should also edit /opt/kaltura/app/configurations/cron/dwh.template, otherwise, your changes will be overridden whenever you run the kaltura-*config.sh scripts.

david.hahn1 · January 16, 2019, 7:27pm

Thanks @jess .

I figured that would be the implementation steps. However, I was wondering if there was any known side effects of doing so. That is, was this process designed to be only run once per day for a reason?

I wouldn’t think so, but I thought it would be worth asking.

jess · January 16, 2019, 7:44pm

Hi @david.hahn1,

It certainly wasn’t designed to be real time. The intervals with which you can get away with greatly vary depending on the amount of data you have to process. If you invoke these scripts at too small intervals, you may reach a situation in which the last invoked process still hasn’t exited. There’s a locking mechanism [see kalturadw_ds.locks] that’s meant to prevent additional instances of the same job from running but all the same, it would mean that you’ll have to wait for the current proc to finish. Also, the order in which these run is important.

david.hahn1 · January 17, 2019, 5:36pm

Thanks @jess. This makes sense. Looking at the timestamps in the log for this specific script, it runs in about 8 seconds now. I certainly don’t need real time. I don’t expect my little instance to get so much traffic that I would need to bump it out further.

I haven’t studied the whole dwh process end to end yet, but I’m wondering if there are other processes that need to run before this would run in order to see updated plays and “viewed at” dates.

In other words, what data does the dwh_plays_vews_sync script use that is also updated by another dwh process that has it’s own schedule?