DWH logs folder

Hi all,

I just saw that the DWH logs folder was growing fast. I now have more than 21k log files in it.

For instance I still have hourly logs from two month ago.

-rw-r--r--. 1 kaltura kaltura 0 Jul 12 03:17 etl_hourly-20170518-13.log

And with the DWH logrotation I get multiple versions of this file.

-rw-r--r--.  1 kaltura kaltura      20 Jun 14 03:38 etl_hourly-20170518-13.log-20170615.gz
-rw-r--r--.  1 kaltura kaltura      20 Jun 15 03:40 etl_hourly-20170518-13.log-20170616.gz
-rw-r--r--.  1 kaltura kaltura      20 Jun 16 03:22 etl_hourly-20170518-13.log-20170617.gz
-rw-r--r--.  1 kaltura kaltura      20 Jun 17 03:43 etl_hourly-20170518-13.log-20170618.gz
-rw-r--r--.  1 kaltura kaltura      20 Jun 18 03:38 etl_hourly-20170518-13.log-20170619.gz
-rw-r--r--.  1 kaltura kaltura      20 Jun 19 03:22 etl_hourly-20170518-13.log-20170708.gz
-rw-r--r--.  1 kaltura kaltura      20 Jul  8 03:36 etl_hourly-20170518-13.log-20170709.gz
-rw-r--r--.  1 kaltura kaltura      20 Jul  9 03:17 etl_hourly-20170518-13.log-20170710.gz
-rw-r--r--.  1 kaltura kaltura      20 Jul 10 03:20 etl_hourly-20170518-13.log-20170711.gz
-rw-r--r--.  1 kaltura kaltura      20 Jul 11 03:22 etl_hourly-20170518-13.log-20170712.gz

Is it standard behavior, or is there a cleaning script somewhere not working or not configured ?

Setup: CentOS7 cluster install, Kaltura 12.17

Many thanks.

Hi @luca.guindani,

Yes, this is normal. These logs are created by /opt/kaltura/dwh/etlsource/execute/etl_hourly.sh which, as its name may imply, runs every hour:)
This is triggered from here:
/etc/cron.d/kaltura-dwh

Once the hourly run concludes, you can remove the logs but I recommend that, if you choose to do so, first grep -q ERROR on the log file and check the RC to ensure it includes no erroneous patterns before removal. These logs are not needed for runtime but can help in the event issues occur.

Hi @jess,

Alright, thanks for your answer and explanations.

Hello, everyone.

In the Kaltura CE, original script of /opt/kaltura/app/configurations/logrotate/kaltura_dwh is as follows:

/opt/kaltura/dwh/logs/*.log {
 rotate 30
 daily
 missingok
 compress
}

This script wastefully creates many rotated files.
For example, etl_hourly-20170810-00.log-20170811.gz, etl_hourly-20170810-00.log-20170812.gz,
etl_hourly-20170810-00.log-20170813.gz, …

So that, I am using following script as /opt/kaltura/app/configurations/logrotate/kaltura_dwh.
This script rotates non etl logs , for example log_aggregation_perform_aggregations.log.

/opt/kaltura/dwh/logs/log_*.log {
 rotate 30
 daily
 create 0644 kaltura kaltura
 missingok
 compress
 su root kaltura
}

And, My kaltura server cleans up old etl logs by using following script (/etc/cron.d/etl-cleanup).

0 4 * * * root find /opt/kaltura/dwh/logs/etl_*.log -type f -daystart -mtime +30 | xargs /bin/rm -f

We hope you will find the document useful.

Hello,

Thanks for your suggestion, looks like your solution avoids a lot of unnecessary etl logs. It will be useful for me!

Hi @luca.guindani, @jess,

Unfortunately, updating of Kaltura CE will return kaltura_dwh back to the original script.
So that, I hope the Kaltura CE will adopt such a function or script.

Hi @t-saito and @luca.guindani

This sounds very interessting, because I took also notice that kaltura is generating a lot of log files…
I’m not so familar with cron jobs. Does the posted cron job work just by editing the kaltura_dw file first and then creating a file called “etl-cleanup” under /etc/cron.d/ with the following content:

Or do I have to configure something additionaly?

What’s about the /opt/kaltura/log directory? Is it also useful to delete some log files there from time to time?

Thanks in advance

Hello, @daniel_mueller_1,

If you use my script and cron file, you do not need to modify other files.
My log-rotation script makes 30 rotation files about non-etl log files.
And, my cron file deletes etl log files generated more than 30 days ago.
You probably should modify “30” in the “kaltura_dwh” and “etl-cleanup”.

If you do not use kaltura_dwh and any cron, the elt log files continues to be created permanently, and, non-etl log files keeps increasing file size permanently.
But, originally kaltura_dwh creates a lot of unnecessary etl log files.
If you want to avoid this trend, you should use the log-rotation and the cron.

Best regards,
t-saito.

Hi @t-saito,

We’d love for you to make a pull request to the server repo so we can merge it.
The files to change are:


Thanks @jess ,

I made pull requests to the server repository.

Thank you for your contribution, @t-saito.
We’ve just merged it.