Distributed Kaltura CE Installation works but can't log in ( lots of questions )

guillem_liarte · April 28, 2021, 8:16pm

Hello Again.

I have successfully deployed 2 Kaltura AIO recently and now I wanted to deploy a fully functional clustered Kaltura. I have followed the instructions here:

github.com

kaltura/platform-install-packages/blob/master/doc/rpm-cluster-deployment-instructions.md

# Deploying Kaltura Clusters

Below are **RPM** based instructions for deploying Kaltura Clusters.    
Refer to the [All-In-One Kaltura Server Installation Guide](install-kaltura-redhat-based.md) for more notes about deploying Kaltura in RPM supported environments.    
Refer to the [Deploying Kaltura Clusters Using Chef](rpm-chef-cluster-deployment.md) for automated Chef based deployments.

### Before You Get Started Notes
* If you see a `#` at the beginning of a line, this line should be run as `root`.
* Please review the [frequently answered questions](kaltura-packages-faq.md) document for general help before posting to the forums or issue queue.
* All post-install scripts accept answers-file as parameter, this can used for silent-automatic installs.
* For a cluster install, it is very important to pass an [answer file](kaltura.template.ans) to each script because otherwise, the MySQL 'kaltura' passwd is autogenerated by the installer. This is fine for a standalone server but for a cluster, passwd must be the same on all. 
* [Kaltura Inc.](http://corp.kaltura.com) also provides commercial solutions and services including pro-active platform monitoring, applications, SLA, 24/7 support and professional services. If you're looking for a commercially supported video platform  with integrations to commercial encoders, streaming servers, eCDN, DRM and more - Start a [Free Trial of the Kaltura.com Hosted Platform](http://corp.kaltura.com/free-trial) or learn more about [Kaltura' Commercial OnPrem Edition™](http://corp.kaltura.com/Deployment-Options/Kaltura-On-Prem-Edition). For existing RPM based users, Kaltura offers commercial upgrade options.

### Instructions here are for a cluster with the following members:

* [Load Balancing](load_balancing.md)
* [NFS server](#the-nfs-server)
* [MySQL Database](#the-mysql-database)
* [Sphinx Indexing Nodes](#the-sphinx-indexing-server)
* [Front Nodes](#the-first-front-node)

This file has been truncated. show original

Everything is fin in principle.

I deployed two for each component:

10.0.2.10 db-back1
10.0.2.11 db-back2

10.0.2.12 sphinx1
10.0.2.13 sphinx2

10.0.2.14 front1
10.0.2.15 front2

10.0.2.16 batch1
10.0.2.17 batch2

10.0.2.18 dwh1
10.0.2.19 dwh2

10.0.2.20 vod1
10.0.2.21 vod2

I have the haproxy as per the reference configuration, with the obvious changes, here: platform-install-packages/haproxy.cfg at master · kaltura/platform-install-packages · GitHub

All machines can ping each other, and there is no firewall. SElinux is disabled.

In front there is an HAproxy.

The obvious first issue I have is that when I try to login to the admin console I get an login error . RC : 301.

Now, in the first batch host I see:

--
==> /opt/kaltura/log/batch/validatelivemediaservers-0-2021-04-28.err.log <==
PHP Fatal error:  Uncaught exception 'KalturaClientException' with message 'failed to unserialize server result
' in /opt/kaltura/app/batch/client/KalturaClientBase.php:401
Stack trace:
#0 /opt/kaltura/app/batch/client/KalturaClient.php(2401): KalturaClientBase->doQueue()
--
2021-04-28 21:56:07 [0.004964] [1159369195] [8] [BATCH] [KalturaClientBase->doQueue] NOTICE: result (serialized): 
2021-04-28 21:56:07 [0.000155] [1159369195] [9] [BATCH] [KScheduleHelper->run] ERR: exception 'Exception' with message 'System is not yet ready - ping failed' in /opt/kaltura/app/infra/log/KalturaLog.php:88
Stack trace:
#0 /opt/kaltura/app/batch/batches/KScheduleHelper.class.php(42): KalturaLog::err('System is not y...')`

This got me thinking. How does it know where to go? There is nno metion of ANY batch host in /etc/kaltura.d/system.ini after all…

Then questions started to flood:

Does the DWH need to have its own DB?
How does each component know where to go?

There are some oviuos ones:

DB is replicated ( and I guess that can be LBd too )
Front is LBd

But then…

Sphinx seems to be by config ( but how does it know where the second one is ?)

Same for Batch, the instructions say it writes itself on the DB… but how does it know where to find it? via API?

What about elasticsearch? Can’t it run in a separate host?

Finally, for VOD, where is it specified where to find it? My system.ini only had:
PRIMARY_MEDIA_SERVER_HOST=

Although in my answers file I added:

VOD_PACKAGER_HOST="vod1"

So, after success with AIO, now I find new challenges in a clustered Kaltura. I am trying to find official documentation for kaltura, but aside from instructions in GitHub, there seems to be nothing at all. Please correct me, I am more than happy to read and go through documentation for this type of set-up, if theres any.

guillem_liarte · April 29, 2021, 10:20am

In the Sanity check I get:
[Check kaltura-sphinx daemon status] [SKIPPED as kaltura-sphinx is not installed] [Check kaltura-sphinx daemon init] [SKIPPED as kaltura-sphinx is not installed]
It looks like checks are performed locally only?

BTW I get:

curl: no URL specified!
curl: try ‘curl --help’ or ‘curl --manual’ for more information
[check_start_page] [FAILED, RC: 1] - [.084134187]

If I test it manually:

curl -I -L http://media.mydomain.com/
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://media.mydomain.com/

HTTP/1.1 302 Found
Date: Thu, 29 Apr 2021 10:16:28 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.4.16
Location: http://media.mydomain.com/start/index.php
Content-Type: text/html; charset=iso-8859-1
Set-Cookie: DYNSRV=s2; path=/

HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://media.mydomain.com/start/index.php

HTTP/1.1 200 OK
Date: Thu, 29 Apr 2021 10:16:28 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.4.16
X-Powered-By: PHP/5.4.16
X-Me: media.mydomain.com
Content-Type: text/html; charset=UTF-8
Set-Cookie: DYNSRV=s1; path=/
Cache-control: private

When trying to run kaltlog ina front I get this too:

[root@ndoamsel114 ~]# kaltlog 
tail: cannot open ‘/opt/kaltura/log/batch/*.log’ for reading: No such file or directory

Why is it trying to get batch logs in a front server?

Which points me to the original question. system.inin does not tell wher batch servers are. How do we tell via LB?

Should I just give up the idea of scaling each component individually and use just two big systems with everything in? That does not seem right for a production environment

I get this in the kaltlog in first batch server:

2021-04-29 12:21:29 [0.000124] [1692245302] [9] [BATCH] [KScheduleHelper->run] ERR: exception 'Exception' with message 'System is not yet ready - ping failed' in /opt/kaltura/app/infra/log/KalturaLog.php:88
Stack trace:
#0 /opt/kaltura/app/batch/batches/KScheduleHelper.class.php(42): KalturaLog::err('System is not y...')
PHP Fatal error:  Uncaught exception 'KalturaClientException' with message 'failed to unserialize server result
' in /opt/kaltura/app/batch/client/KalturaClientBase.php:401

The error messages are not very helpful for me. It does not look like a permissions issue and the deployment well perfectly well.

guillem_liarte · April 29, 2021, 11:26am

Something I noticed is that the AIO Kaltura deployment gets Elastic search deployed, while the clustered one does not.

And about this script: /opt/kaltura/bin/kaltura-config-all.sh

Is of course, not used during the clustered steps. It is however mentioned in many troubleshooting steps after deployment.

The cluster steps do not contain the execution

guillem_liarte · April 29, 2021, 11:49am

I have replayed the steps and I see in the front config steps that I gets this;

kaltura-db-config.sh FAILED with: 255 on line 207

insertPermissions.log:2021-04-29 13:39:34 [KalturaStatement->execute] DEBUG: /* ndoamsel114.xxxxxxx.online[916130315][propel] */ INSERT INTO permission (`ID`,`TYPE`,`NAME`,`FRIENDLY_NAME`,`DESCRIPTION`,`PARTNER_ID`,`STATUS`,`DEPENDS_ON_PERMISSION_NAMES`,`TAGS`,`CREATED_AT`,`UPDATED_AT`,`CUSTOM_DATA`) VALUES (NULL,'1','SYSTEM_ADMIN_BATCH_CONTROL_FAILED','Batch Control failed','','-2','1','','','2021-04-29 13:39:34','2021-04-29 13:39:34','a:1:{s:13:"partner_group";s:0:"";}')
insertPermissions.log:2021-04-29 13:39:34 [PermissionPeer::addToPartner] NOTICE: Adding permission [SYSTEM_ADMIN_BATCH_CONTROL_FAILED] to partner [-2].

Queuing action [userRole.add]
Queuing action [user.add]
Executing multirequest
service url: [http://media.xxxxxx.com:80]
curl: http://media.xxxxx.com:80/api_v3/service/multirequest
post: {"format":"3","ignoreNull":true,"clientTag":"php5:21-04-28","apiVersion":"16.14.0","0":{"service":"userrole","action":"add","userRole":{"objectType":"KalturaUserRole","name":"System Administrator","systemName":"System Administrator","description":"System Administrator","status":"1","permissionNames":"*","tags":"admin_console"},"ks":"YjA1ZjIxZWM3NThhYmY2ZmNiY2YwZmY5MGVjM2Y0ZGRlZDViYTMwYnwtMjstMjsxNjE5NzgyOTAzOzI7NzkxMDs7"},"1":{"service":"user","action":"add","user":{"objectType":"KalturaUser","isAdmin":"1","roleIds":"{1:result:id}","password":"xxxxxx.","id":"guillem.liarte@xxxxxx.com","screenName":"guillem.liarte@xxxxxxx.com","fullName":"Kaltura Administrator","email":"guillem.liarte@xxxxx.com","status":"1","allowedPartnerIds":"*"},"ks":"YjA1ZjIxZWM3NThhYmY2ZmNiY2YwZmY5MGVjM2Y0ZGRlZDViYTMwYnwtMjstMjsxNjE5NzgyOTAzOzI7NzkxMDs7"},"kalsig":"2c3266e3db5a5e4d048c48adc68f00de"}
result (serialized): 
PHP Fatal error:  Uncaught exception 'KalturaClientException' with message 'failed to unserialize server result
' in /opt/kaltura/app/tests/lib/KalturaClientBase.php:401
Stack trace:
#0 /opt/kaltura/app/tests/lib/KalturaClientBase.php(971): KalturaClientBase->doQueue()
#1 /opt/kaltura/app/tests/standAloneClient/exec.php(345): KalturaClientBase->doMultiRequest()
#2 {main}
  thrown in /opt/kaltura/app/tests/lib/KalturaClientBase.php on line 401

Yes, and that was after dropping DB and starting over.

What is wrong here?

guillem_liarte · April 29, 2021, 12:24pm

I have:

removed completely and reinstalled DB nodes
made sure the settings in DN are correct
root can connect to mysql from front , sphinx and batch hosts

Running from first front gives me this:

]# /opt/kaltura/bin/kaltura-db-config.sh db-back1 db-back1 root xxxxxxxxxx 3306
Checking MySQL version…
Ver 5.5.68-MariaDB found compatible

CREATE USER kaltura;
CREATE USER etl;
CREATE DATABASE kaltura;
CREATE DATABASE kaltura_sphinx_log;
CREATE DATABASE kalturadw;
CREATE DATABASE kalturadw_ds;
CREATE DATABASE kalturadw_bisources;
CREATE DATABASE kalturalog;
Checking connectivity to needed daemons…
Connectivity test passed:)
Cleaning cache…
Populating DB with data… please wait…
Output for /opt/kaltura/app/deployment/base/scripts/installPlugins.php being logged into /opt/kaltura/log/installPlugins.log
Output for /opt/kaltura/app/deployment/base/scripts/insertDefaults.php being logged into /opt/kaltura/log/insertDefaults.log
Output for /opt/kaltura/app/deployment/base/scripts/insertPermissions.php being logged into /opt/kaltura/log/insertPermissions.log
Output for /opt/kaltura/app/deployment/base/scripts/insertContent.php being logged into /opt/kaltura/log/insertContent.log

kaltura-db-config.sh FAILED with: 255 on line 207

Archving logs to /opt/kaltura/log/log_29_04_21_14_20.tar.gz…

guillem_liarte · April 30, 2021, 5:31pm

So I can see that I can log in IF i go to plain http.

Just to make is clearer: following the suggestions in the instructions, I am using the SSL termination in haproxy.

The haproxy frontend listens to both http and https, as per the given example.

in KCM, I can make it to log in screen, but logins ( and reset password functions) for some reason, instead of going to https://media.domain.com/ goes to https://media.domain.com:80/

So like this:

OPTIONS https://media.domain.com:80/api_v3/service/multirequest?format=1&clientTag=kmcng

The rest of the elements stay in place, for example:

https://media.domain.com/apps/kmcng/v5.17./runtime.c45f8bb68a1741984dab.js

Why does it shove that port 80 there?

My guess is because that is what it says in the configuration…

SERVICE_URL=http://media.fetfilms.com:80

Load Balancers can perform SSL offloading (aka SSL Acceleration). Using SSL offloading can dramatically reduce the load on the systems by only encrypting the communications between the Load Balancer and the public network while communicating over http with the internal nodes.

We recommend that you utilise SSL offloading. In such a case, you will only need to deploy the SSL certificates on your Load Balancer.

Just to make it clear, I am quite familiar with this configuration as i use it in other projects.

I tried by adding:
http-request set-header X-Forwarded-Proto https

That seems to make no difference.
I tried rewriting the URL to remove the port 80 from the request. That does not work either.

What am I doing wrong here?

Let me know what else I can provide to get the rigth answer.

guillem_liarte · April 30, 2021, 6:05pm

I tried as well using the LUA module for haproxy as in here:

github.com

haproxytech/haproxy-lua-cors/blob/master/example/haproxy/haproxy.cfg

global
    log stdout local0
    lua-load /etc/haproxy/cors.lua
    stats socket :9000 mode 660 level admin

defaults
    mode http
    timeout connect 5s
    timeout client 5s
    timeout server 5s
    log global
    option httplog

listen ui
    bind :80
    server s1 server1:80 check

listen api
    bind :8080

This file has been truncated. show original

I get exactly the same.

Can anyone help?

guillem_liarte · May 3, 2021, 3:34pm

I have spent some hours with the HAproxy community. The issue’s root cause is clearly that protocol mismatch; kaltura is sending, in the API requests Host with :80 attached to it ( see my screenshot ) that is not well disgested by HAproxy

So it is intercepted by the browsers as a CORS issue.

is there a way to make the API not add the port in the API ? For other paths like app/ it does not add it.

@jess? This is not explained in the documentation ( well in github repo ). Can someone point me to the documentation for making this work properly with HAproxy?

I am happy to add and help improve it and provide working examples too

guillem_liarte · May 4, 2021, 11:28am

I have tried putting more frontends in haproxy. The traffic separates perfectly. It is when, from a public host you try to get to KMC, that the url generated injects a :80 in the host and then haproxy gets a protocol mismatch which results in a PR BADREQ and later interpreted by the browser as a CORS issue.

I can’t believe I am the only one getting this issue. Nobody tried recently to deploy a Kaltura cluster?

Is everyone running AIO only?

guillem_liarte · May 4, 2021, 12:34pm

I have also submitted Bug report here:

After double checking with HAProxy and the guy who writes the CORS LUA module, we came to the conclusion that the application code is not sending sane requests that can work well over SSL in modern browsers.

guillem_liarte · May 25, 2021, 9:28am

Hello. I was wondering if anyone has an update on this issue. Currently the only workable solution would be to use HTTPS everywhere, including internal API calls.

This is not the setup expected in at least a good portion of deployments, correct?

Can someone from Kaltura shed some light onto this? Is the current expectation to use secure HTTP end-to-end ?