DWH analytics problem

Hello @t-saito,

I think I found the problem and I hope you have some ideas about the solution.

I installed kaltura dwh on a different server (web03) from the main kaltura /database server (web01).

web01 hosts the kaltura DB and web03 hosts kalturadw, kalturadw_ds, etc. All cron processes run on web03.

When kaltura /opt/kaltura/app/alpha/scripts/dwh/dwh_plays_views_sync.sh runs, I get these errors:

/opt/kaltura/app/alpha/scripts/dwh/updateEntryPlaysViews.php(19): KalturaLog::err(‘Couldn’t find e…’)

[updateEntryPlaysViews.php] [global] ERR: exception ‘Exception’ with message ‘Couldn’t find entry [0_1ndwt52w]’ in /opt/kaltura/app/infra/log/KalturaLog.php:88

I notice that updateEntryPlaysViews.php seems to not be able to find the kaltura.entries table on web01 although I have checked all my db settings (system.ini, db.ini) (even dwh/.kettle/kettle.propertites)

Do you know which configuration file or setting should I change so it is able to connect to the correct database ?

Thanks again for your help,

David

Hello @david.eusse,

Is the media entry that caused the error the deleted media?
Is the media entry is able to view on the KMCng?
If a media entry is removed from the Kaltura system, the log for that media is used for statistics.
Therefore, errors related to updateEntryPlaysViews.php (19) may not be relevant to this issue, if those media have been deleted.

I have a question.
Logs about plays/views are stored on “kalturadw.dwh_fact_events” table?
Record that “event_type_id” is 2 means “view”.
Record that “event_type_id” is 3 means “play”.

Previously, our Kaltura system faced a similar problem.
Other stats (bandwidth usage, storage usage, etc.) are updated correctly.
But, plays and views were not updated.
Then, we wrote a PHP script that gets logs from the “kalturadw.dwh_fact_events” table and updates the plays / views for each media entry.
We ran that script every day by using cron.

Later, we upgraded our Kaltura system several times, but this problem was not solved.
However , this problem has recently been resolved when upgrading our system.

Regards

Hello @t-saito,

The media is definitely not deleted and it shows on the KMC. I also ran a sphinx reindex, according to an old post from Jess and tried a second sphinx server to make sure everything is up to date.

It’s exactly the same problem yo experienced and haven’t been able to solve it. I think I will write a script as well in order to fix the problem.

I hope this error will get fixed someday.

Thank you for your patience and help,

David

Hello @david.eusse,

If correct logs are stored the “kalturadw.dwh_fact_events” table, you may be able to use same script.
For reference, here is the script we created.

<?php

$dbhost = $argv[1];
$dbport = $argv[2];
$dbuser = $argv[3];
$dbpassword = $argv[4];
$usage_file = $argv[5];

$custom_update = 0; // If plays/views are updated by the Kaltura system, set to 0. Otherwise, set to 1.

$link = mysql_connect($dbhost . ':' . $dbport , $dbuser, $dbpassword);
if (!$link) {
    die('Cannot connection to Kaltura database.'. mysql_error() . PHP_EOL);
}
else {
    echo "Connect to Kaltura database." . PHP_EOL;
}

$yesterday = date("Y-m-d", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));
$yesterday = $yesterday . " 00:00:00";

$today = date("Y-m-d", mktime(0, 0, 0, date("m"), date("d"),   date("Y")));
$today = $today . " 00:00:00";

$update_array = array();

$daily_plays = 0;
$daily_views = 0;

$view_sql = "select entry_id,count(event_type_id) from kalturadw.dwh_fact_events where event_type_id='2' and event_time>='" . $yesterday . "' and event_time<'" . $today . "' group by entry_id";

$result = mysql_query($view_sql);
if (!$result) {
    die('ERRO: cannot get entry views. : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    $update_array += array($row['entry_id'] => array('views'=> $row['count(event_type_id)'] , 'plays' => 0, 'last_played_at' => null));
    $daily_views += $row['count(event_type_id)'];
}

$play_sql = "select entry_id,count(event_type_id),max(event_time) from kalturadw.dwh_fact_events where event_type_id='3' and event_time>='" . $yesterday . "' and event_time<'" . $today . "' group by entry_id";

$result = mysql_query($play_sql);
if (!$result) {
    die('ERROR: cannot get entry plays : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    if ($update_array[$row['entry_id']] == null) {
        $update_array += array($row['entry_id'] => array('views' => 0, 'plays' => $row['count(event_type_id)']), 'last_played_at' => $row['max(event_time)']);
    }
    else {
        $update_array[$row['entry_id']]['plays'] = $row['count(event_type_id)'];
        $update_array[$row['entry_id']]['last_played_at'] = $row['max(event_time)'];
    }

    $daily_plays += $row['count(event_type_id)'];
}

$update_entries = 0;

foreach ($update_array as $entry_id => $playsViews) {
    if ($custom_update == 0) {
        $update_entries += 1;
    }
    else {
        $update_sql = "update kaltura.entry set plays=plays+" . $playsViews['plays'] . ",views=views+" . $playsViews['views'];
        if ($playsViews['last_played_at'] != null) {
            $update_sql .= ",last_played_at='" . $playsViews['last_played_at'] . "'";
        }

        $update_sql .= " where id='" . $entry_id . "'";

        $result = mysql_query($update_sql);
        if ($result) {
            $update_entries += 1;
        }
    }
}

echo $update_entries . " entries' plays/views has been updated." . PHP_EOL;

$bandwidth_usage = 0;
$added_storage = 0;
$deleted_storage = 0;

$yesterday = date("Ymd", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));

$bandwidth_sql = "select sum(count_bandwidth_kb),sum(added_storage_mb),sum(deleted_storage_mb) from kalturadw.dwh_hourly_partner_usage where date_id='" . $yesterday . "'";

$result = mysql_query($bandwidth_sql);
if (!$result) {
    die('ERROR: cannot get entry plays : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    $bandwidth_usage += $row['sum(count_bandwidth_kb)'];
    $added_storage += $row['sum(added_storage_mb)'];
    $deleted_storage += $row['sum(deleted_storage_mb)'];
}

$bandwidth_usage /= 1000;

$contents = 0;

$contents_sql = "select count(id) from kaltura.entry where status='2' and source='1' and puser_id != '' and categories != '' and conversion_profile_id != ''";

$result = mysql_query($contents_sql);
if (!$result) {
    die('ERROR: cannot get entries : ' . mysql_error() . PHP_EOL);
}

while ($row = mysql_fetch_assoc($result)) {
    $contents += $row['count(id)'];
}

$usage_date = date("Ymd", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));

$insert_sql = "";
$insert_sql .= "insert into customlog.daily_usage (date_id, contents, plays, views, bandwidth_usage, added_storage, deleted_storage) values(";
$insert_sql .= "'" . $usage_date . "',";
$insert_sql .= "'" . $contents . "',";
$insert_sql .= "'" . $daily_plays . "',";
$insert_sql .= "'" . $daily_views . "',";
$insert_sql .= "'" . $bandwidth_usage . "',";
$insert_sql .= "'" . $added_storage . "',";
$insert_sql .= "'" . $deleted_storage . "'";
$insert_sql .= ")";

$result = mysql_query($insert_sql);
if (!$result) {
    die('ERROR: cannot insert daily usage : ' . mysql_error() . PHP_EOL);
}

$usage_date = date("Y-m-d", mktime(0, 0, 0, date("m"), date("d") - 1,   date("Y")));

$write_string = '';
$write_string .= "Daily Usage on " . $usage_date . " :" . PHP_EOL;
$write_string .= "  " . $contents . " contents" . PHP_EOL;
$write_string .= "  " . $daily_plays . " plays, " . $daily_views . " views" . PHP_EOL;
$write_string .= "  Bandwidth Usage " . $bandwidth_usage . " MB" . PHP_EOL;
$write_string .= "  Added Storeage " . $added_storage . " MB, Deleted Storage " . $deleted_storage . " MB" . PHP_EOL;

$close_flag = mysql_close($link);
if ($close_flag) {
    echo "Kaltura server has been closed safely." . PHP_EOL;
}
else {
    echo "Kaltura server has been closed unsafely." . PHP_EOL;
}

file_put_contents($usage_file, $write_string, FILE_APPEND | LOCK_EX);

?>

This script gets the logs from database and updates plays/views of each media entry.
And, this script writes a record to “customlog.daily_usage” table, and writes result messages.
The “customlog.daily_usage” table is our original.
If you use this script, please delete unnecessary statements.

Regards

Thank you @t-saito !