DWH analytics problem

Hello @t-saito,

Is it possible to empty this table so it starts again ?

Thank you for yoyr help,

David

Hello @david.eusse

Please see “locks” table in “kalturadw_ds” database.
When any cron does not work, “lock_state” field of each record must be set to 0.
If “lock_state” is set to 1 when no ETL process is working, ETL processes will not work well.

Regards

Thank you @t-saito but I checked and there are no locks. It’s something else.

I’l try to empty the etl_log table and see how it works.

Regards,

David

Hello @david.eusse

Error message “Deadlock found when trying to get lock” is posted by database system (MySQL / MariaDB).
If you face the same error again, please connect database console and execute the following command:

mysql> SHOW ENGINE INNODB STATUS;

And, please see “LATEST DETECTED DEADLOCK” block in results.
If there exists the block, you will find SQL statement that caused the deadlock.

Regards

Hello @t-saito

I ran mysql> SHOW ENGINE INNODB STATUS; but didn’t get any DEADLOCK results.

I got again this this morning:

ERROR 11-03 16:00:06,918 - Update Dimensions - A serious error occurred during job execution: org.pentaho.di.core.exception.KettleJobException:
Unable to begin processing by logging start in logtable etl_log

Unable to write log record to log table etl_log

Error inserting/updating row
Deadlock found when trying to get lock; try restarting transaction

I ran mysqlcheck and it’s fine.

My questions are:
Is there a way to empty some temp tables so I can reprocess the log files again ?
Can I empty the etl_log table ?

Thank you in advande for your help,

David Eusse

Hello @david.eusse

Normally, temporary tables are deleted when current session is closed.
So that, users cannot delete the temporary tables from other session.

I think that you can empty etl_log table when no process uses the table.

Regards

Hello @t-saito,
Hope you are doing well during these difficult times.
After a lot of trouble, I reinstalled kaltura-dwh. I don’t get any errors anymore but I still don’t get any plays views.

After debugging all steps, I noticed that the final triggers don’t do anything. When /opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh runs, it connects to de database and exits.

I checked and all kalturadw tables are filled with data.

Any ideas ?

Thank you again for your help,

David

Hello @david.eusse,

In the KMCng, other statistics (for example, bandwidth usage) are updated according to users’ actions?
Or, any statistic value is not updated?
If other statistics are updated correctly, we faced a similar trouble in earlier version of the Kaltuta CE.

Hello @t-saito,

After a lot of work and making a clean dwh install, it started showing data again. The dwh_entry_plays_views and dwh_hourly_events_entry were not been filled and I don’t not why.

Now I have a question, if you don’t mind.

If I put back the apache log files into /opt/kaltura/web/logs it will recalculate the stats, won’t it ?.
I will do that little by little so it doesn’t disturb the cycles.

I clarify that in order to avoid a server lock, moved all the apache logs into another folder before performing the clean dwh install.

regards,

David

Hello @david.eusse,

If you put the apache log files in /opt/kaltura/web/logs, it will recalculate the stats.

Note that, kalturadw_ds.files table has a list of log files are processed in ETL cycles.
Before placing the log files in /opt/kaltura/web/logs folder, you must delete records for those files from the kalturadw_ds.files table.

Regards

Hello @t-saito

Thank you. That’s what I’l do.

My new setup worked (more or less). It procressed the files and got stuck updating only two entries. :frowning:

Unfortunately, no errors appear on my log files.

Any ideas ?

Regards,

David

Hello @david.eusse,

The log file size may be too large to process.
If the size of the log file is issue, splitting the old log file before placing it in the /opt/kaltura/web/logs folder will work.

For example, hostname-kaltura_apache_access.log-20200327-01.gz, hostname-kaltura_apache_access.log-20200327-02.gz, hostname-kaltura_apache_access.log-20200327-03.gz, and so on.

First, extract old log file.
Next, split the log file.
Next, compress each split log file.
Finally, place the compressed split log files in the /opt/kaltura/web/logs folder.

If splitting the old log files can be fortunate, please try modifying log rotation script and cron, in order to process new log files.

Regards

Hello @t-saito,

Thank you. I’l try and let you know.

Regards,

David

Hello @t-saito,

I finally managed to load and old backup made with xtrabackup and it started recording again !
I deleted all records from the files table but the days that didn’t process well still don’t show after putting the archived log files in /opt/kaltura/web/logs and reprocessing without errors.

Is there another table where these processed records are kept ?

Thank you for your help,

David

Hello @david.eusse,

Sorry, I don’t know anything more.
I think that there is no other table.
And, Old log files may have been added to the statistics on the day they were processed by ETL.

Regards

Hello @t-saito,

I think I found the problem and I hope you have some ideas about the solution.

I installed kaltura dwh on a different server (web03) from the main kaltura /database server (web01).

web01 hosts the kaltura DB and web03 hosts kalturadw, kalturadw_ds, etc. All cron processes run on web03.

When kaltura /opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh runs, I get these errors:

/opt/kaltura/app/alpha/scripts/dwh/updateEntryPlaysViews.php(19): KalturaLog::err(‘Couldn’t find e…’)

[updateEntryPlaysViews.php] [global] ERR: exception ‘Exception’ with message ‘Couldn’t find entry [0_1ndwt52w]’ in /opt/kaltura/app/infra/log/KalturaLog.php:88

I notice that updateEntryPlaysViews.php seems to not be able to find the kaltura.entries table on web01 although I have checked all my db settings (system.ini, db.ini) (even dwh/.kettle/kettle.propertites)

Do you know which configuration file or setting should I change so it is able to connect to the correct database ?

Thanks again for your help,

David

Hello @david.eusse,

Is the media entry that caused the error the deleted media?
Is the media entry is able to view on the KMCng?
If a media entry is removed from the Kaltura system, the log for that media is used for statistics.
Therefore, errors related to updateEntryPlaysViews.php (19) may not be relevant to this issue, if those media have been deleted.

I have a question.
Logs about plays/views are stored on “kalturadw.dwh_fact_events” table?
Record that “event_type_id” is 2 means “view”.
Record that “event_type_id” is 3 means “play”.

Previously, our Kaltura system faced a similar problem.
Other stats (bandwidth usage, storage usage, etc.) are updated correctly.
But, plays and views were not updated.
Then, we wrote a PHP script that gets logs from the “kalturadw.dwh_fact_events” table and updates the plays / views for each media entry.
We ran that script every day by using cron.

Later, we upgraded our Kaltura system several times, but this problem was not solved.
However , this problem has recently been resolved when upgrading our system.

Regards

Hello @t-saito,

The media is definitely not deleted and it shows on the KMC. I also ran a sphinx reindex, according to an old post from Jess and tried a second sphinx server to make sure everything is up to date.

It’s exactly the same problem yo experienced and haven’t been able to solve it. I think I will write a script as well in order to fix the problem.

I hope this error will get fixed someday.

Thank you for your patience and help,

David

Hello @david.eusse,

If correct logs are stored the “kalturadw.dwh_fact_events” table, you may be able to use same script.
For reference, here is the script we created.

<?php

$dbhost = $argv[1];
$dbport = $argv[2];
$dbuser = $argv[3];
$dbpassword = $argv[4];
$usage_file = $argv[5];

$custom_update = 0; // If plays/views are updated by the Kaltura system, set to 0. Otherwise, set to 1.

$link = mysql_connect($dbhost . ':' . $dbport , $dbuser, $dbpassword);
if (!$link) {
    die('Cannot connection to Kaltura database.'. mysql_error() . PHP_EOL);
}
else {
    echo "Connect to Kaltura database." . PHP_EOL;
}

$yesterday = date("Y-m-d", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));
$yesterday = $yesterday . " 00:00:00";

$today = date("Y-m-d", mktime(0, 0, 0, date("m"), date("d"),   date("Y")));
$today = $today . " 00:00:00";

$update_array = array();

$daily_plays = 0;
$daily_views = 0;

$view_sql = "select entry_id,count(event_type_id) from kalturadw.dwh_fact_events where event_type_id='2' and event_time>='" . $yesterday . "' and event_time<'" . $today . "' group by entry_id";

$result = mysql_query($view_sql);
if (!$result) {
    die('ERRO: cannot get entry views. : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    $update_array += array($row['entry_id'] => array('views'=> $row['count(event_type_id)'] , 'plays' => 0, 'last_played_at' => null));
    $daily_views += $row['count(event_type_id)'];
}

$play_sql = "select entry_id,count(event_type_id),max(event_time) from kalturadw.dwh_fact_events where event_type_id='3' and event_time>='" . $yesterday . "' and event_time<'" . $today . "' group by entry_id";

$result = mysql_query($play_sql);
if (!$result) {
    die('ERROR: cannot get entry plays : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    if ($update_array[$row['entry_id']] == null) {
        $update_array += array($row['entry_id'] => array('views' => 0, 'plays' => $row['count(event_type_id)']), 'last_played_at' => $row['max(event_time)']);
    }
    else {
        $update_array[$row['entry_id']]['plays'] = $row['count(event_type_id)'];
        $update_array[$row['entry_id']]['last_played_at'] = $row['max(event_time)'];
    }

    $daily_plays += $row['count(event_type_id)'];
}

$update_entries = 0;

foreach ($update_array as $entry_id => $playsViews) {
    if ($custom_update == 0) {
        $update_entries += 1;
    }
    else {
        $update_sql = "update kaltura.entry set plays=plays+" . $playsViews['plays'] . ",views=views+" . $playsViews['views'];
        if ($playsViews['last_played_at'] != null) {
            $update_sql .= ",last_played_at='" . $playsViews['last_played_at'] . "'";
        }

        $update_sql .= " where id='" . $entry_id . "'";

        $result = mysql_query($update_sql);
        if ($result) {
            $update_entries += 1;
        }
    }
}

echo $update_entries . " entries' plays/views has been updated." . PHP_EOL;

$bandwidth_usage = 0;
$added_storage = 0;
$deleted_storage = 0;

$yesterday = date("Ymd", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));

$bandwidth_sql = "select sum(count_bandwidth_kb),sum(added_storage_mb),sum(deleted_storage_mb) from kalturadw.dwh_hourly_partner_usage where date_id='" . $yesterday . "'";

$result = mysql_query($bandwidth_sql);
if (!$result) {
    die('ERROR: cannot get entry plays : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    $bandwidth_usage += $row['sum(count_bandwidth_kb)'];
    $added_storage += $row['sum(added_storage_mb)'];
    $deleted_storage += $row['sum(deleted_storage_mb)'];
}

$bandwidth_usage /= 1000;

$contents = 0;

$contents_sql = "select count(id) from kaltura.entry where status='2' and source='1' and puser_id != '' and categories != '' and conversion_profile_id != ''";

$result = mysql_query($contents_sql);
if (!$result) {
    die('ERROR: cannot get entries : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    $contents += $row['count(id)'];
}

$usage_date = date("Ymd", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));

$insert_sql = "";
$insert_sql .= "insert into customlog.daily_usage (date_id, contents, plays, views, bandwidth_usage, added_storage, deleted_storage) values(";
$insert_sql .= "'" . $usage_date . "',";
$insert_sql .= "'" . $contents . "',";
$insert_sql .= "'" . $daily_plays . "',";
$insert_sql .= "'" . $daily_views . "',";
$insert_sql .= "'" . $bandwidth_usage . "',";
$insert_sql .= "'" . $added_storage . "',";
$insert_sql .= "'" . $deleted_storage . "'";
$insert_sql .= ")";

$result = mysql_query($insert_sql);
if (!$result) {
    die('ERROR: cannot insert daily usage : ' . mysql_error() . PHP_EOL);
}

$usage_date = date("Y-m-d", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));

$write_string = '';
$write_string .= "Daily Usage on " . $usage_date . " :" . PHP_EOL;
$write_string .= "  " . $contents . " contents" . PHP_EOL;
$write_string .= "  " . $daily_plays . " plays, " . $daily_views . " views" . PHP_EOL;
$write_string .= "  Bandwidth Usage " . $bandwidth_usage . " MB" . PHP_EOL;
$write_string .= "  Added Storeage " . $added_storage . " MB, Deleted Storage " . $deleted_storage . " MB" . PHP_EOL;

$close_flag = mysql_close($link);
if ($close_flag) {
    echo "Kaltura server has been closed safely." . PHP_EOL;
}
else {
    echo "Kaltura server has been closed unsafely." . PHP_EOL;
}

file_put_contents($usage_file, $write_string, FILE_APPEND | LOCK_EX);

?>

This script gets the logs from database and updates plays/views of each media entry.
And, this script writes a record to “customlog.daily_usage” table, and writes result messages.
The “customlog.daily_usage” table is our original.
If you use this script, please delete unnecessary statements.

Regards

Thank you @t-saito !