Analytics not get results

Hi experts

I have the same issue reported on Nothing in Top Content Analytics
I have installed a cluster deploying based on RPM… I have all the services installed separately but all the servers have the same features. Centos 6 updated

I have 7 servers
1 front, 1 batch, 1 index, 1 mysql, 1 VOD, 1 streaming, 1 DWH

I have followed the next url in order to achive the instalallation https://github.com/kaltura/platform-install-packages/blob/master/doc/rpm-cluster-deployment-instructions.md. All my servers are ec2-instances deployed in my aws account.

I have followed all the analytics issues that i found on line including the locks table issue, and run the cron scripts by myself

My servers info are corrected accroding to the miniaml requirements to deploy a kaltura system. I have deploy kaltura version 12. So it is implied that there is a version test run in the mysql server. That means, if my mysql version does not match the requirements i could not be able to installed, but I could, so is not a version problem.

It seems that all is working but not the analytics feautre. Mi Dwh is a separate server, and when i embebed a new entry to play it out of the kmc, the plays or views count is not increasing. In fact the events table kalturadw.dwh_fact_events is always empty. I have read and read and read looking for a solution but no success. My opinion comes through the NFS server… I do not know if a DWH needs to be connected to the NFS shared files. According to the doc the answer is YES. But when i mount the NFS in the DWH server there is NO folder call “logs” in /opt/kaltura/web/" and because of that the analytics does not get the plays or views data. However when i started the Dwh and i do not mount the NFS, the folder “logs” appears in the correct path… but still nothing happens.

I would like to know the correct flow from a counting views or plays. I mean, which are the scripts or pages loaded when I play an entry, from an embeded player, to get the plays or views data and where is the data stored in the database.

Please I will appreciate any info about this.

Hello @jlotito,

Yes, /opt/kaltura/web/logs/ should exist on your NFS mount. It should be mounted on all front servers [and batch as well, but only front nodes are relevant for DWH].

/opt/kaltura/web/logs belongs to the kaltura-dwh package as can be seen when running:

# rpm -qf /opt/kaltura/web/logs/
kaltura-dwh-11.3.0-1.noarch

or for Deb:

# dpkg -S /opt/kaltura/web/logs/
kaltura-dwh: /opt/kaltura/web/logs

My guess would be that you installed kaltura-dwh before mounting the NFS volume, hence, /opt/kaltura/web/logs was created on the local disk on the DWH machine, instead of on the NFS volume.
That’s easy enough to correct, simply mount it on the DWH machine and create the directory.

Just so that you further understand the flow of things:

  • When a user hits play, an API request similar to the below [service=stats&action=collect], is made by the player:
 api_v3/index.php?service=stats&apiVersion=3.1&expiry=86400&clientTag=kwidget%3Av2.53.2&format=1&ignoreNull=1&action=collect&event:eventType=3&event:clientVer=2.53.2&event:currentPoint=0&event:duration=29&event:eventTimestamp=1488213859683&event:isFirstInSession=false&event:objectType=KalturaStatsEvent&event:partnerId=102&event:sessionId=1b92c640-b6de-3e22-4736-0755706220d7&event:uiconfId=23448199&event:seek=false&event:entryId=0_wl3vd05h&event:historyEvents=111000000000000000000000-3-3&event:widgetId=_102&event
  • This in turn is recorded in the Apache access log for the Kaltura VHost [/opt/kaltura/log/kaltura_apache_access.log]
  • The access logs from each front node are then rotated to /opt/kaltura/web/logs using /etc/logrotate.d/kaltura_apache, in particular:
/opt/kaltura/log/kaltura_apache_access.log {
 rotate 5
 daily
 missingok
 compress
 dateext
 notifempty
 lastaction
 mv /opt/kaltura/log/kaltura_apache_access.log-`/bin/date +%Y%m%d`.gz /opt/kaltura/web/logs/`hostname`-kaltura_apache_access.log-`/bin/date +%Y%m%d-%H`.gz
 service httpd reload
 endscript
 su root kaltura
}
  • On the DWH machine, the relevant scripts are run by crond, because of this file: /etc/cron.d/kaltura-dwh
  • As a first step, the following config file is looked at /opt/kaltura/dwh/.kettle/kettle.properties, in it, the path to the log dir and the pattern to look for are defined like so:
EventsLogsDir = /opt/kaltura/web/logs
EventsWildcard = .*kaltura.*_access.*.log-.*

In summary:

  • Make sure /opt/kaltura/web/logs exist on the NFS
  • Make sure the volume is mounted on all front and batch machines and that at least /opt/kaltura/web/logs is mounted on the DWH machine [it does not need the other dirs though you can mount the entire volume]
  • Make sure /etc/logrotate.d/kaltura_apache exists on all front machines and the access log files are rotated

Note that analytics are not updated in real time but rather processed by running the cron jobs defined here /etc/cron.d/kaltura-dwh

If, after 24 hours after validating all the above, you still don’t see stats, please follow the checks here:

And report back with the results.

Thanks,

Hi Jess,
thanks for your soon answer I really appreciate. Your info has helped me so much. Now I have see the proble,s… first of all was that the logrotate must be execute in the fron services in order to create the access_log in the /web/logs directory. I have solved this issue.

After that, I run the dwh-scripts in order to update the plays in the KMC… and its works!!! showed me 3 plays for an entry… I was to exited and I tried agaidn in other entries… and after running the script again… the plays are not increasing anymore so it is wrong again.

Here where i’ve done.

  • rename the acces log created by the rotation to a name according to the path of the properties.kettle
  • After looking for an error, nothing find, I have droped the tables in the dwh and reinstall again with kaltura-dwh-config.sh.
  • Then i run the scripts again… no errors, but nothing happens… The other entries not shows any play. But in the first entry still appears “3”. Which concenrs me, because I dropped the dwh tables.

I have check the query “select * from kalturadw_ds.files where insert_time >=%Y%m%d;” And this is the result

mysql> select * from kalturadw_ds.files where insert_time >=20170228;
±--------±-------------------------------------------------------------------------------------±------------±------------±--------------------±---------±--------------±------±----------±-------------±-----------±---------±-------------------±-------+
| file_id | file_name | file_status | prev_status | insert_time | run_time | transfer_time | lines | err_lines | file_size_kb | process_id | cycle_id | compression_suffix | subdir |
±--------±-------------------------------------------------------------------------------------±------------±------------±--------------------±---------±--------------±------±----------±-------------±-----------±---------±-------------------±-------+
| 1 | kaltura_apache_access_ssl.log-. | IN_CYCLE | NULL | 2017-02-28 16:24:37 | NULL | NULL | NULL | NULL | 50 | 1 | 1 | | |
| 2 | kaltura._apache_access._ssl.log-. | IN_CYCLE | NULL | 2017-02-28 16:26:14 | NULL | NULL | NULL | NULL | 50 | 1 | 2 | | |
| 3 | ip-172-31-9-165.eu-west-1.compute.internal-kaltura_apache_access.log-20170228-16 | IN_CYCLE | NULL | 2017-02-28 16:31:59 | NULL | NULL | NULL | NULL | 7 | 1 | 3 | gz | |
| 4 | kaltura_apache_access_ssl.log-20170227 | IN_CYCLE | NULL | 2017-02-28 17:00:19 | NULL | NULL | NULL | NULL | 7 | 1 | 4 | gz | |
| 5 | ip-172-31-9-165.eu-west-1.compute.internal-kaltura_apache_access_ssl.log-20170228-17 | IN_CYCLE | NULL | 2017-02-28 17:13:29 | NULL | NULL | NULL | NULL | 9 | 1 | 5 | gz | |
±--------±-------------------------------------------------------------------------------------±------------±------------±--------------------±---------±--------------±------±----------±-------------±-----------±---------±-------------------±-------+

Which I understand that some proccess are running… what in-cycle means??.

I have tried too “select * from kalturadw.dwh_fact_events where event_date_id >=%Y%m%d ;” and shows me all the events to the other entries waiting to be processed. (I wont paste because is tedius)

Show as I see, my system are collecting the data correctly but when I execute the dwh scripts they dont update the plays field in entry table in the kaltura db.

What could be my next steps?? thanks

Hi @jlotito,

Glad to hear this info helped you and that you made progress.
If I understand correctly, the only issue now is that the ‘plays’ column in KMC->Content/entry DB table is not updated, is that right?
If so, the script responsible for syncing that data from the DWH DB to the operational Kaltura DB is:
/opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh
this is called from /etc/cron.d/kaltura-dwh, as you can see, it runs under the ‘kaltura’ user.
What it does is sync the data back to the operational DB AFTER all other scripts in /etc/cron.d/kaltura-dwh finished.
Once again, reminding you the operation is not done in real time. If after 24 hours you still don’t see plays, I suggest you:
# su - kaltura
# /opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh

See if this triggers any errors or, to the contrary, updates the table:) If it produces an error, please post it here [or just fix if it’s an obvious one], if it succeeds in updating the plays, I suggest you take a look at /var/log/syslog or /var/log/cron to see whether it’s being run and we can continue from there.

Hi jess

Thanks for your answer. I have still no succes in my problem.

I have run the script sync_play according you mentioned with the user kaltura. (The same as run in the dwh general script) and nothing happens.

This is how I make the test.

  • I hit the entry play button in an outside standalone page.
  • I Can see the action=coolect in teh requests in the apache_ssl_log in the front server
  • I execute logrotate -vvv -f /etc/logrotate.d/kaltura_apache (from the front server) to create the .gz file in /opt/kaltura/web/log and the file is created correctly.
  • Then I execute the /opt/kaltura/bin/kaltura-run-dwh.sh from the DWH server (I ve comment the line about the logrotate line in the script because i have done in the fornt server)
  • I have seen no errors in the /opt/kaltura/dwh/logs directory after the executing
  • I check the tables kalturadw_ds.files by select * from kalturadw_ds.files where insert_time >=20170301; and the files appear!! is not an empty table… But all the files are in the status IN_CYCLE
  • I check the table select * from kalturadw.dwh_fact_events where event_date_id >=20170301; and is not Empty Too. There are some records about the entries that I have played and the dates and times showed are correct.
  • BUT IN THE KMC THE PLAYS ARE NOT INCREASING… in the table entry in the database kaltura the data are not increasing.

There is no errors in triggers or in /var/log/syslog… I do not know how to do next. Thanks for all

Hi @jlotito,

So, a few questions:
0. Do you see the stats under KMC->Analytics?

  1. Assuming that you do and the issue is only with the play count under KMC->Content, what happens when you manually run:
    # su - kaltura
    $ /opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh
    ?

Thanks,

  1. Do you see the stats under KMC->Analytics?

Yes I can access to the link https://myservicename/index.php/kmc/kmc4#analytics|contentDashboard but in tables are not showed no data.

  1. Assuming that you do and the issue is only with the play count under KMC->Content, what happens when you manually run:

This is the output when i run the script

mié mar 1 11:19:45 UTC 2017
start dwh triggers
2017-03-01 12:19:45 [require_once] INFO: Starting script
2017-03-01 12:19:45 [require_once] INFO: Initializing database…
2017-03-01 12:19:45 [require_once] INFO: Database initialized successfully
2017-03-01 12:19:45 [KalturaPDO->__construct] DEBUG: conn took - 0.0031540393829346 seconds to mysql:host=ip-172-31-2-7.eu-west-1.compute.internal;port=3306;dbname=kaltura;
2017-03-01 12:19:45 [Propel::initConnection] NOTICE: total conn took 0.003849983215332 mysql:host=ip-172-31-2-7.eu-west-1.compute.internal;port=3306;dbname=kaltura;
start dwh wrap
end dwh

My system is not online yet… but you can acces to it if you need… please tell me if you want to

Thanks

OK, let’s break it down a bit and debug.
First, run:
mysql -h$DWH_HOST -P$DWH_PORT -u$DWH_USER -p$DWH_PASS
and call:
mysql> CALL kalturadw.get_data_for_operational(‘entry’);

What does that output?

This is the Output.
Empty set (0.01 sec)
Query OK, 0 rows affected (0.01 sec)

That disturbs me, all the tables that I have mentioned in previous posts are not empty. I have tried the querys.

select * from kalturadw_ds.files where insert_time >=%Y%m%d; and
select * from kalturadw.dwh_fact_events where event_date_id >=%Y%m%d ;

And they both have data… I will be waiting for the next step in the debugging process.

Thanks Jess

Hi @jlotito,

This procedure is defined here: /opt/kaltura/dwh/ddl/dw/functions/get_data_for_operational.sql
Essentially, the following query is expected to return the data:

        SELECT e.entry_id, e.plays, e.views
        FROM dwh_entry_plays_views e, kalturadw_ds.parameters p
        WHERE e.updated_at > p.date_value AND p.id = 4;

What does kalturadw.dwh_entry_plays_views contain on your ENV?

Hi Jess

BRavo!!! it seems that this is the error!!! I have the kalturadw.dwh_entry_plays_views table empty!!

This is the output

select * from dwh_entry_plays_views;
Empty set (0.00 sec).

I hope this guide us to find the solution…

Thanks

Hi @jlotito,

I suspect it may have something to do with file permissions.
Can you please run:
chown -R kaltura /opt/kaltura/dwh/cycles/
and try to run a full cycle again?

If after that, it still does not work, can you try to run all scripts included in /etc/cron.d/kaltura-dwh as root instead of as the kaltura user and let me know if that worked?

Thanks,

Hi Jess,
Nothing Changes… I have change the owner as you said, I have run the cycle (I understand to play a video from the standalone page, watch the logs, make the logrotate, execute the scripts and check the KMC), and nothing. Finally I have run the scripts as root and nothing too.

As always… in the table kalturadw_ds.files, it is shown the last action I have tried to run. And in the table kalturadw.dwh_fact_events it is shown the events according to the entry that I had played. But the table kalturadw.dwh_entry_plays_views is still empty…

I do not know what more to do… :cry: :cry: :cry:

Hi @jlotito,

The procedure that calls add_plays_views is kalturadw.post_aggregation_entry which is called by kalturadw.calc_aggr_day.

Is the kalturadw.dwh_hourly_events_entry table populated with data?
If this table is also empty it means that aggregations are not running, so events do exist in kalturadw.dwh_fact_events but are not aggregated.
If empty, can you please run a full cycle and send all logs created under /opt/kaltura/dwh/logs to jess.portnoy at kaltura.com?

If kalturadw.dwh_hourly_events_entry has data please check that kaltruadw.dwh_post_aggregatin_entry procedure exists by running:
mysql> SELECT * FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_NAME=‘post_aggregation_entry’

Thanks,

Hi Jess

You are right!!!, as always… The kalturadw.dwh_hourly_events_entry table is empty, so, aggregations are not running.

I have run a complete cycle and the events are registered but not the aggregations. I have packed in a tar.gz file all the logs recorded in /opt/kaltura/dwh/logs. I tried to paste the file here because i do not know how to send you across the portal. But I could not paste here too. Please tell me an email adress or somewhere where i can send the file compressed.

Thanks!

Hi @jlotito,

Please send it to jess.portnoy at kaltura.com.

Thanks,

Hi Jess
Logs sent!!

Thanks for your time

Hi @jlotito,

Welcome of course.
Can you please paste the output for:

Select * from kalturadw_ds.locks;
Select * from kalturadw_ds.parameters;
Select * from kalturadw.aggr_managment where aggr_name = ‘entry’;

Thanks,

Select * from kalturadw_ds.locks;
±--------±--------------------------------------------------±--------------------±-----------+
| lock_id | lock_name | lock_time | lock_state |
±--------±--------------------------------------------------±--------------------±-----------+
| 1 | daily_lock | 2017-03-06 12:59:10 | 0 |
| 2 | retention_lock | 2017-03-06 12:30:09 | 0 |
| 3 | hourly_ip-172-31-4-157.eu-west-1.compute.internal | 2017-03-06 14:00:20 | 0 |
| 4 | update_dims_lock | 2017-03-06 14:00:22 | 0 |
±--------±--------------------------------------------------±--------------------±-----------+

Select * from kalturadw_ds.parameters;
±—±-----------±----------------------------------±----------±--------------------+
| id | process_id | parameter_name | int_value | date_value |
±—±-----------±----------------------------------±----------±--------------------+
| 2 | 0 | dim_sync_last_update | -1 | 2017-03-06 14:00:16 |
| 3 | 2 | fms_stale_session_days_limit | 3 | 2017-02-28 16:20:28 |
| 8 | 0 | referencial_integrity_last_update | 0 | 2017-03-06 12:59:10 |
| 10 | 0 | convert_job_fact_last_update | 0 | 2017-02-27 16:20:28 |
| 4 | 0 | sync_last_execution_plays_views | 0 | 2017-02-28 16:20:28 |
| 5 | 0 | sync_start_time_plays_views | 0 | 2017-02-28 16:20:28 |
| 6 | 0 | sync_last_execution_kuser_storage | 0 | 2017-02-28 16:20:28 |
| 7 | 0 | sync_start_time_kuser_storage | 0 | 2017-02-28 16:20:28 |
| 9 | 9 | transcoding_errors_last_update | 0 | 2010-01-01 00:00:00 |
±—±-----------±----------------------------------±----------±--------------------+

Select * from kalturadw.aggr_managment where aggr_name = ‘entry’;
±----------±---------±--------±--------------------±--------------------±--------------------±--------------------+
| aggr_name | date_id | hour_id | start_time | end_time | data_insert_time | ri_time |
±----------±---------±--------±--------------------±--------------------±--------------------±--------------------+
| entry | 20170228 | 16 | 2017-02-28 17:14:40 | 2017-02-28 17:14:40 | 2017-02-28 17:14:40 | 2017-02-28 17:14:40 |
| entry | 20170228 | 18 | 2017-02-28 20:59:10 | 2017-02-28 20:59:10 | 2017-02-28 20:59:10 | 2017-02-28 20:59:10 |
| entry | 20170301 | 11 | 2017-03-01 12:59:10 | 2017-03-01 12:59:10 | 2017-03-01 12:59:10 | 2017-03-01 12:59:11 |
| entry | 20170301 | 14 | 2017-03-02 04:59:10 | 2017-03-02 04:59:10 | 2017-03-02 04:59:10 | 2017-03-02 04:59:11 |
| entry | 20170303 | 9 | 2017-03-03 09:01:00 | 2017-03-03 09:01:00 | 2017-03-03 09:01:00 | 2017-03-03 09:01:02 |
| entry | 20170306 | 12 | 2017-03-06 12:59:09 | 2017-03-06 12:59:09 | 2017-03-06 12:59:09 | 2017-03-06 12:59:10 |
±----------±---------±--------±--------------------±--------------------±--------------------±--------------------+

Hi Jess

I have followed All the instructions you made to me here and by email… apparently It was an issue related to the TZ of the machines.

I have set all the nodes in the UTC TZ… And it seems that works… I say it seems, because it works somtimes yes and sometimes not. It does not work with new entries… I have created a new one. and his plawys or views are not increasing.

I have checked all the tables that we usually look at… for example select * from dwh_entry_plays_views; and what is concerns to me is that there is no register for my new entry. But in the dwh_fact_events table exist the events related to that entry

How I can relaod the kalturadw database in order to load new entries created…??

Thanks