Migration considerations


#1

Hello all,

I am putting some thoughts down around a migration project I’m taking on. Here is the landscape

Kaltura CE5 installation.two servers. One app, one DB. ~1k videos are in the repository. There is a C#-based app that interacts with CE 5 to provide a “Youtube” like experience for our customers internally. We have multiple categories established as well.

The goal is to migrate to a Kaltura CE 12 cluster and adapt the C# app as needed. I would like to migrate such that entry id’s are preserved, but this isn’t insurmountable. The C# app provides links that depend on the entry id and those links would be broken if we didn’t migrate them. I could build in some translation into the new version of the C# app, but I’d rather not. We have limited embeds so changing them would not be hard.

I would also like to migrate the analytics, however, what I really care about here is play count. If i had to migrate just this piece, that would be fine.

The new cluster is installed and seems to be working properly. I have a clone of our existing production

In perusing the upgrade guide and also some similar discussions on the forums, I’m concerned about a few things. It seems one may just export the db’s from the CE 5 installation (data only), import into the new db and then migrate the /opt/kaltura/web/ entries and perhaps ask sphinx to re-index as it’s likely to need it. However, this seems to simple :).

First, my source system is so old that there are undoubtedly multiple schema changes to deal with. This makes me think that any DB migration is likely asking for trouble, but I’d like some input on that. Perhaps if the schema changes have been purely additive it would be less of a pain. I don’t think schema changes are addressed in the upgrade guide.

Second, what db\table\column records the number of plays for an entry? I’ve started looking around the database, and trying to find the number that the KMC presents but I haven’t quite found it yet. Along those lines, if I just wanted to set the play count would it be as simple as finding the entry by ID in the right table and updating the play count column. I’m assuming that play count is actually stored in a single column, of course.

Thanks for the thoughts around this and helping me put together a plan!


/opt/kaltura/web/content/entry/data/*/* structure question
#2

Hello Mr. @david.hahn1,

are you looking for professional help, maybe a consultant job for this particular project, or are you looking for a community advice only?

I have made migrations in the past, and I have done a lot of research about that process, I could migrate for you in a virtual environment so you can test it, and then promote to production, or help you with specific steps.

if you are interested in my services or consultant services please PM me your contact information.

if you are just looking for advice or help, also I’m willing to do that of course, as I have learned a lot from this great community.

Bryan


#3

Hey @aquileasfx1

Thanks for the reply. I’m looking for advice on this as we are capable of doing it internally. I think I just need a little guidance in a decent direction. :slight_smile:. Thanks


#4

Hi @david.hahn1,

When the old server and the new server are of the same version, migrating is indeed pretty easy, I detailed the process in broad lines here:

However, your case is very different and so, I’m afraid

will not suffice:)

The DB schema undergoes changes between versions and as a result of that, so does the Sphinx schema.
A guide on how to migrate from Kaltura CE 6 to 9 [the first RPM based version] is available here:


It was contributed by Joost from VIXY who did it 3 years ago.
Migrating from 5 will require even more DB transitions and while it can be done, the whole operation can get pretty complex.

If it is not utterly crucial to preserve the original IDs [and we’re not just talking about entry IDs but also, partner IDs, UI conf IDs, etc], a faster and less error prone approach would be to export all your assets from the original server re-ingest them on the new one.

For the entries, you can use this code:

<?php
if ($argc < 4){
    echo "\nUsage: ".$argv[0] . ' <partner id> <admin secret> <service url>'."\n\n";
    exit (1);
}
require_once('/opt/kaltura/web/content/clientlibs/php5/KalturaClient.php');
$userId = null;
$expiry = null;
$privileges = "disableentitlement";
$partnerId=$argv[1];
$secret = $argv[2];
$type = KalturaSessionType::ADMIN;
$config = new KalturaConfiguration($partnerId);
$config->serviceUrl = $argv[3];
$client = new KalturaClient($config);
$ks = $client->session->start($secret, $userId, $type, $partnerId, $expiry, $privileges);
$client->setKs($ks);
$filter = new KalturaMediaEntryFilter();
$total_media_entries = $client->media->count($filter);
$pager = new KalturaFilterPager();
$page_index=1;
$pager->pageSize = 500;
$processed_entries=0;
$csv_file='/tmp/entries.csv';
error_log(" * title,description,tags,url,category\n",3,$csv_file);
while ($processed_entries < $total_media_entries){
    $pager->pageIndex=$page_index;
    $result = $client->media->listAction($filter, $pager);
        foreach ($result->objects as $entry) {
            $filter = new KalturaAssetFilter();
            $filter->entryIdEqual = $entry->id;
            $filter->tagsLike = 'source';
            $pager = null;
            $result = $client->flavorAsset->listAction($filter, $pager);
            if(isset($result->objects[0]->id)){
                $url= $client->flavorAsset->geturl($result->objects[0]->id, null, null, null);
            }
            if(is_string($url)){
                $filter = new KalturaCategoryEntryFilter();
                $filter->entryIdEqual=$entry->id;
                $pager = null;
                $cat_entry = $client->categoryEntry->listAction($filter, $pager);
                $cat_id=$cat_entry->objects[0]->categoryId;
                $cat = $client->category->get($cat_id);

                error_log( '"'.$entry->name.'","'.$entry->description.'","'.$entry->tags.'","'.$url.'","'.$cat->fullName."\"\n",3,$csv_file);
            }
            $processed_entries++;
        }
    $page_index++;
}
echo "Written entries to $csv_file\n";

Running this on the original server would generate /tmp/entries.csv which you can then use on the new server to bulk ingest the sources from the original server.

In regards to analytics, obviously, if the entry IDs change, the current data will become useless, however, it is rather easy to populate this data on the new server. Basically, DWH works like so:

  • The player makes a request to the stats->collect() API
  • The request, like any other API request, is logged in the Apache access log
  • The DWH scripts parse the access log looking for the service=stats&action=collect pattern, extract the relevant data and proceed to populate the DWH tables accordingly.

And so, if you prepare a CSV file with the following fields:

NEW_ENTRY_ID, VIEWS, LIKES

You could use the below script to make the needed requests and then the DWH process will handle the rest.
In regards to ‘likes’, as far as my memory goes, this feature was not even available in CE 5 and was only introduced in 6 so you may want to change the script accordingly and skip that part.

You can get the amount of views/plays for the original entries using the API or just by querying the entry table.
Then you will need to look up the corresponding entry ID on the new server and generate the CSV.

#!/bin/sh
if [ $# -lt 1 ];then
        echo "USAGE $0 </path/to/csv/file>"
        exit 1
fi
FILE=$1
LOG=$$.log
PARTNER_ID=
SERVICE_URL=''
DB_HOST=''
DB_NAME=kaltura
DB_USER=kaltura
DB_PASSWD=''
KS=""
while read LINE;do
        LINE=`echo $LINE|sed 's@, @ @g;s@"@@g'`
        ENTRY_ID=`echo $LINE|awk -F "," '{print $1}'`
        VIEWS=`echo $LINE|awk -F "," '{print $2}'`
        LIKES=`echo $LINE|awk -F "," '{print $3}'`
        I=$VIEWS
        echo "I will work on $NAME $ENTRY_ID" >>"$TMPDIR/$LOG"
        TMPDIR="/tmp/`basename $0`.out"
        mkdir -p $TMPDIR
        while [ $I -gt 0 ];do
                echo "Working on $ENTRY_ID">>"$TMPDIR/$LOG"
                echo "Working on $ENTRY_ID"
                wget -q "$SERVICE_URL//api_v3/index.php?service=stats&action=collect&kalsig=4f7ee732a3a99027f659cffd952b44e0&event%3AuiconfId=6710010&clientTag=kdp%3Av3%2E5%2E34%2Esap%2Ea&event%3AeventType=2&event%3AeventTimestamp=1343730047996&event%3AobjectType=KalturaStatsEvent&ks=$KS%3D&event%3AisFirstInSession=false&event%3AentryId=$ENTRY_ID&event%3AcurrentPoint=0&event%3Aduration=0&event%3AsessionId=8CD5A0F9%2DBF67%2D5021%2D4548%2DDC8D22260FB5&ignoreNull=1&event%3Aseek=false&event%3Areferrer=https%253A%2F%2Fmediashare%2Dqa%2Ewdf%2Esap%2Ecorp%2Fpublic%2F&event%3AclientVer=3%2E0%3Av3%2E5%2E34%2Esap%2Ea&event%3ApartnerId=$PARTNER_ID" -O$TMPDIR/$ENTRY_ID
                wget -q "$SERVICE_URL//api_v3/index.php?service=stats&action=collect&kalsig=4f7ee732a3a99027f659cffd952b44e0&event%3AuiconfId=6710010&clientTag=kdp%3Av3%2E5%2E34%2Esap%2Ea&event%3AeventType=3&event%3AeventTimestamp=1343730047996&event%3AobjectType=KalturaStatsEvent&ks=$KS%3D&event%3AisFirstInSession=false&event%3AentryId=$ENTRY_ID&event%3AcurrentPoint=0&event%3Aduration=0&event%3AsessionId=8CD5A0F9%2DBF67%2D5021%2D4548%2DDC8D22260FB5&ignoreNull=1&event%3Aseek=false&event%3Areferrer=https%253A%2F%2Fmediashare%2Dqa%2Ewdf%2Esap%2Ecorp%2Fpublic%2F&event%3AclientVer=3%2E0%3Av3%2E5%2E34%2Esap%2Ea&event%3ApartnerId=$PARTNER_ID" -O$TMPDIR/$ENTRY_ID
                if [ $? -ne 0 ];then
                        echo "wget $URL for $ENTRY_ID failed." >> "$TMPDIR/$LOG"
                else
                        echo "wget $URL for $ENTRY_ID succeeded." >> "$TMPDIR/$LOG"
                fi
                I=`expr $I - 1`
        done
        echo "update $DB_NAME.entry set total_rank=$LIKES,rank=$LIKES*1000,votes=$LIKES where id='$ENTRY_ID';"|mysql -h$DB_HOST -u$DB_USER -p$DB_PASSWD
done <$FILE

You didn’t mention metadata profiles or other objects but if you do have a lot of these, you could use the API to export and import these as well.

In regards to your C# app, while there is a chance the Kaltura C# client you are currently using will work, I’d strongly recommend migrating to the new client, which you can generate on the new server by calling:
# php /opt/kaltura/app/generator/generate.php csharp
The client will then be placed under /opt/kaltura/web/content/clientlibs/csharp

For code samples utilising the new C# client, see developer.kaltura.com.


#5

Sure,

well, I have never installed a fresh kaltura and then import data from another version, I always update the old data, from v5 to current, doing a migration process. I mean, executing upgrade scrips.

Bryan


#6

also dont forget that you have to translate urls to delivery profiles.


#7

Thanks for the info @jess. I appreciate it

Your script requires some adaptation for CE 5 as the API has changed a bit but it is certainly adaptable.

I am preparing the CSV for import and I’m wondering if the referenceID field can be populated via CSV? I’ve tried testing it with a single entry CSV file and the ReferenceID was not brought in from the CSV… Here is a sample of the csv

* name,description,tags,url,category,mediaType,referenceId
"Nov2013_ConferenceCall.mp4","","","http://URLhere","Health and Safety","video","user@email.com"

#8

****Please disregard the last message. I had an error in my CSV format which was causing the referenceID data to not import properly. It works fine.


#9

Hey @jess,

Regarding the wget statements in the script you provided. There are some arguments in the query string like kalsig, clienttag, eventTimestamp, sessionID, referrer and clientVer that maybe specific to an instance of Kaltura.

Are they? If they are, how would I find these values in my instance?

Edit: Additionally, why is it not a good idea to just update entry.plays in my new instance directly in the database? What would I be losing by not sending the plays through the DWH?


#10

Hi Dave,

eventTimestamp should be the epoch representing the time of the event.
kalsig is a unique ID we later use for debugging purposes, we look for it in the log to track down the request we’re interested in investigating, it really has no bearings as far as analytics go.
clientTag and clientVer just indicate which API client and of what version made the request.
This is again useful for debugging purposes. For instance when we see:

[clientTag] => batch

then we know this request was made by the batch daemon.

[clientTag] => Kaltura-admin

request made by the Admin Console

[clientTag] => kmc:5.43.1

request made by KMC

and so forth.
Each API client [C#, Java, PHP, whatnot] has a unique name and a version.
In the example I gave you, clientTag=kdp, which means it was made by the player, clientTag=kwidget would usually mean the same thing.
You can look at your /opt/kaltura/log/kaltura_apache_access*log for /api_v3/index.php?service=stats.*action=collect to get a better understand of the structure and values.


#11

Thanks @jess.

I had another question related to the play count that I added after submitting the post.

Why is it not a good idea to just update entry.plays in my new instance directly in the database? What would I be losing by not sending the plays through the DWH?

It seems the difference would be that I would not be able to see plays over time, but since I’m submitting plays all at once, through the DWH, that data wouldn’t be terribly useful anyway. I’m assuming if I set the plays value,the DWH will just increment it when actual plays come through, but I haven’t tested if that’s the behavior.

Thoughts? Are there other things to take into consideration?


#12

Hi @david.hahn1,

entry.plays is only used to show the amount of plays per entry but analytics is about more than just that:)
The DWH DBs store data about geo locations, about user names [when there is a KS in the player’s embed], about the clients/browsers used to play/view an entry and so on… if you only care about plays per entry and do not need more than that, by all means, you can just update entry.plays and entry.views but note that when you look at the Analytics widget in KMC’s Dashboard or under KMC->Analytics, it will be empty. Only place in KMC where entry.plays is used is in the content tab, since it’s faster to load the value for entry.plays per record from the operational DB than to call the ‘report’ API.


#13

Thanks @jess

I appreciate the explanation! Have a great day.


#14

Most welcome and to you too.
Please note that I added a bit more info to my original reply:)


#15

Thanks! I do see that now.

Right now I don’t really care about the analytics. Just the counts as they’re displayed in my front end application.

Going forward, yes, the analytics will start working as designed when “real” plays and views come through.

If I need information about previous plays, my approach will be to just go back to the CE 5 installation and get the data. After all, to really replicate the analytics data it seems I would have to figure out views per day per entry, then use the script to submit the right number of plays on the right days. This is all not really necessary in my use case right now. :slight_smile:


#16

Hi Dave,

In that case, sure, just go ahead and update the play column in the entry table.