How reset DWH stats and start again...?

I manage a kaltura server (from this week) with a special condition, someone disable /etc/cron.d/kaltura-dwh so, now when the cron try to run normally the load average is too high.

In fact we don’t need stats for old videos, but we need for new plays from right now.

How I can reset stats (from database and log files) for avoiding reprocess all…?



The way it works is this:

  • Upon hitting play, the player makes an API call to the Kaltura endpoint [service=stats&action=collect], that call is logged to the Apache’s access log
  • Logroate archives the log and moves it to /opt/kaltura/web/logs/
  • The DWH script then iterates over these files to determine which records to process

Therefore, if you simple delete the files from /opt/kaltura/web/logs/ or just move them somewhere else, they will not be processed by analytics.

Let me know if you need more clarifications on this,

ok… let me check and I will prepare a detailed workaround for this

Ok, I understand…

A little more information.

This is my kaltura version installed (I cloned the production environment and I’m working on lab now)

root@Kaltura:/opt/kaltura/log# rpm -qa |grep -i kaltura-dwh
  • I check state of locks table

    root@Kaltura:~# query “SELECT * FROM kalturadw_ds.locks”

I set update to “0” with

root@Kaltura:~# query "UPDATE kalturadw_ds.locks SET lock_state=0"
  • I deleted all content of /opt/kaltura/web/logs/ and run scripts on /etc/cron.d/kaltura-dwh (in this version i don’t have /opt/kaltura/bin/ but, the first script:

    root@Kaltura:/opt/kaltura/log# cat /etc/cron.d/kaltura-dwh
    00 * * * * root /opt/kaltura/dwh/etlsource/execute/ -p /opt/kaltura/dwh

Take too long time and overload my server, can I avoid that…?

  • When all scripts from crontab finished, I check again status of tables

    SELECT * FROM kalturadw_ds.files WHERE insert_time >=’%Y%m%d’;

thats is ok, I see the last insert_time correctly, but when I run:

root@Kaltura:~# query "SELECT * FROM kalturadw.dwh_fact_events WHERE event_date_id >='%Y%m%d'"

the table content is empty

And of course in admin/content/manage the stats for video aren’t updated (yes, I put iframe code into a html file for view it outside content/manage, like a normal user)

Any suggestion…?

thank you!

I think you neglected to rotate the access logs post playing a new entry which is why you are not seeing new stats.
BTW, you can obtain from here:
You probably are using a version older than the one I introduced it in but you can simply copy it to /opt/kaltura/bin as is.


Nop, even with (run without errors) the table

SELECT * FROM kalturadw.dwh_fact_events WHERE event_date_id >='%Y%m%d';

doesn’t show any data… :confused:

mysql> select * from kalturadw_ds.files where insert_time >=%Y%m%d;
show the files processed?
And, in these files, do you see calls for action=collect?


And yes (in apache_access.log)

So we should still be seeing errors in /opt/kaltura/dwh/logs/
I would suggest you run /opt/kaltura/bin/ and then revisit the logs. the script removes old logs before starting so we should be able to see new ones from the recent run quite easily.

I quit with this problem, I never will resolve.

That is, of course, your right. But should you be willing to continuing the investigation, feel free to post the current errors from the log here.

There is nothing here that cannot be solved, that’s what is so charming about software issues:)


Sorry but this does not help me diagnose the issue.
Please attach the output of kaltlog and the surrounding log lines.