Issue after upgrading.. no content, no categories

jlotito · March 7, 2018, 2:19pm

Hi experts.
I have the same issue as All entries gone, but i did not write it down in the same issue because i have test all the steps showed in the issue and still no content, and no categories. After upgrading to kaltura 13

The main problem is that when i tried the api media-list action, it returns different results each time. If make the request from php code, sometimes i get all the entries, sometiemes i get 0 results, with no categories. sometiems 0 rersults with all categories. If i make the request from the developer section in the admin-console, setting the api session and with no filter or no pager… sometimes i get 0 results, sometimes only 3!!!.

Things that I have checked

Reindex the sphinx like the preivuos issue reccomend. stop sphinkx, stop monit, touch /opt/kaltura/app/configurations/sphinx_schema_update and run kaltura-sphinx-schema-update.sh -> No erros in the ouput, no errors in kalt_up.log
Execute each script populateSphinx* with no errors in the ouput
As every time I invoke the request for media-list() I get different results, I wonder if it is a cache problem… I have remove all files in the cache directory in all my nodes.
Run the script /opt/kaltura/app/configurations/sphinx_schema_reindex.sh with no errors

The most surprising event is that it seems all the enviroemnt works fine. I can upload videos from KMC and thrid partys, I can play/delivery videos from KMC and thrid partys, but the problem occurs in the list action executed from kmc or from third parties

Env Info

Cluester based on RPM ENV
All nodes centos 6
Because I have this problem I have only now one front, one index, one batch, one VOD and one DWH
Production ENviroment
Connection between nodes are checked
Database under RDS in AWS, but this is not the problem… I assume because the new fields in the tables that are populate in the kaltura 13 release EXISTS in th DATABASE

I do not understand why the api give me those nonsense results in every query… please can you tell me where i can make my own request to the api… including de api session, using the postman app (por example), where is and example of invoking the api manually

Thanks for all

jess · March 7, 2018, 3:03pm

Hi @jlotito,

You didn’t state how many Sphinx servers you have. Judging by the varying results, I believe it is more than one.
You can check that in /opt/kaltura/app/configurations/db.ini where you will see these directives:

sphinx1.adapter
sphinx2.adapter

When reindexing, you should of course make sure to do so on all Sphinx instances. Also, make sure that the kaltura-populate service is up and running.

In addition, did you verify that Sphinx is actually restarted? Perhaps there is a defunct searchd process that doesn’t exit when you restart?
You should also check for errors in /opt/kaltura/log/sphinx/kaltura_sphinx_searchd.log and /opt/kaltura/log/sphinx/kaltura_sphinx_query.log.

Lastly, when making the media->list() request, check the /opt/kaltura/log/kaltura_api_v3.log and follow the whole flow, starting from:

    [service] => media
    [action] => list

You should see the exact queries that went down to Sphinx and the MySQL server as a result of that request and also the server that they hit.

jlotito · March 7, 2018, 3:40pm

Hi jess,

Thanks for your soon answer. I forgot to tell That I have only ONE SPHINXS SERVER. Because to solve this problem i have reduce all my enviroment to one node for each service.

I Have check the logs /opt/kaltura/log/sphinx/kaltura_sphinx_searchd.log… and you were right I have the error
"FATAL: failed to lock pid file ‘/opt/kaltura/sphinx/searchd.pid’: Resource temporarily unavailable (searchd already running?)" So I think That I have one proccess already running… So how can I stop this zombie proccess, reindex and restart the index correctly… Please I wolud like the correct steps… Maybe I am doing wrong beacuse I am changing the steps order

jess · March 7, 2018, 3:55pm

Hi @jlotito,

Start by sending SIGKILL to that searchd process [kill -9] and make sure it’s stopped. Also remove /opt/kaltura/sphinx/searchd.pid.
Then, try to restart Sphinx and make sure it starts correctly. Call # service kaltura-sphinx status just to verify it’s OK and the PID is correct.
After that and because of all the issues you had, I recommend that you re-run the reindex script and make sure it finishes w/o issues.

jlotito · March 7, 2018, 4:33pm

Hi jess
Thanks for your answer.

I have tried all you have told me… But still have the same issue. Sometimes media-list() return all the entries… sometimes only a few… sometimes 0 results. I did not tell you that this is only happening in one publisher I have another publuisher with a bigger number of entrys in the same env and not happens. Thats why I think this is a cache problem and not sphinkx problem.

I have recreate the api request and I have made it from a third part program… (postman)… and occurs the same… I have different results in the responses. I do not know how to proceed… And the editors have many problems because they cant see the videos that have just uploaded… And the curious thing is that the video is uploaded indeed, but cant list it.

jlotito · March 8, 2018, 9:12am

Hi jess,

I Have been checking the api flow for a list action over base entry objects… I have record in a file the log of two actions. One return 402 results and the other 0 results. Comparing the log… I have noticed some features that might help me to solve the problem.

The main query over the database is the select over the entry table. I am going to paste the two queries one in the OK test, and the other in the ERROR test

SELECT entry.ID, entry.KSHOW_ID, entry.KUSER_ID, entry.NAME, entry.TYPE, entry.MEDIA_TYPE, entry.DATA, entry.THUMBNAIL, entry.VIEWS, entry.VOTES, entry.COMMENTS, entry.FAVORITES, entry.TOTAL_RANK, entry.RANK, entry.TAGS, entry.ANONYMOUS, entry.STATUS, entry.SOURCE, entry.SOURCE_ID, entry.SOURCE_LINK, entry.LICENSE_TYPE, entry.CREDIT, entry.LENGTH_IN_MSECS, entry.CREATED_AT, entry.UPDATED_AT, entry.PARTNER_ID, entry.DISPLAY_IN_SEARCH, entry.SUBP_ID, entry.CUSTOM_DATA, entry.SCREEN_NAME, entry.SITE_URL, entry.PERMISSIONS, entry.GROUP_ID, entry.PLAYS, entry.PARTNER_DATA, entry.INT_ID, entry.INDEXED_CUSTOM_DATA_1, entry.DESCRIPTION, entry.MEDIA_DATE, entry.ADMIN_TAGS, entry.MODERATION_STATUS, entry.MODERATION_COUNT, entry.MODIFIED_AT, entry.PUSER_ID, entry.ACCESS_CONTROL_ID, entry.CONVERSION_PROFILE_ID, entry.CATEGORIES, entry.CATEGORIES_IDS, entry.START_DATE, entry.END_DATE, entry.FLAVOR_PARAMS_IDS, entry.AVAILABLE_FROM, entry.LAST_PLAYED_AT FROM entry WHERE entry.PARTNER_ID=‘110’ AND entry.ID IN (‘0_kvcza1y0’,‘0_elsph34l’,‘0_tp966189’,‘0_wvo6pcqd’,‘0_f09iu2zt’,‘0_05m2vxfn’,‘0_wjs3gr43’,‘0_j97f1zit’,‘0_ovrj82e9’,‘0_wgdsjazn’,‘0_z9bzr5jh’,‘0_bzp75657’,‘0_llrmqp1z’,‘0_ymm43e2y’,‘0_nr2s83of’,‘0_4jstpwb7’) LIMIT 16

SELECT entry.ID, entry.KSHOW_ID, entry.KUSER_ID, entry.NAME, entry.TYPE, entry.MEDIA_TYPE, entry.DATA, entry.THUMBNAIL, entry.VIEWS, entry.VOTES, entry.COMMENTS, entry.FAVORITES, entry.TOTAL_RANK, entry.RANK, entry.TAGS, entry.ANONYMOUS, entry.STATUS, entry.SOURCE, entry.SOURCE_ID, entry.SOURCE_LINK, entry.LICENSE_TYPE, entry.CREDIT, entry.LENGTH_IN_MSECS, entry.CREATED_AT, entry.UPDATED_AT, entry.PARTNER_ID, entry.DISPLAY_IN_SEARCH, entry.SUBP_ID, entry.CUSTOM_DATA, entry.SCREEN_NAME, entry.SITE_URL, entry.PERMISSIONS, entry.GROUP_ID, entry.PLAYS, entry.PARTNER_DATA, entry.INT_ID, entry.INDEXED_CUSTOM_DATA_1, entry.DESCRIPTION, entry.MEDIA_DATE, entry.ADMIN_TAGS, entry.MODERATION_STATUS, entry.MODERATION_COUNT, entry.MODIFIED_AT, entry.PUSER_ID, entry.ACCESS_CONTROL_ID, entry.CONVERSION_PROFILE_ID, entry.CATEGORIES, entry.CATEGORIES_IDS, entry.START_DATE, entry.END_DATE, entry.FLAVOR_PARAMS_IDS, entry.AVAILABLE_FROM, entry.LAST_PLAYED_AT FROM entry WHERE entry.PARTNER_ID=‘110’ AND 1<>1 LIMIT 16;

The difference is clear!!. in the second query the result is always 0 because the “AND 1<>1” that means that in the logic is not including the array in the query like the first one. So the fist question would be… where is the code to debug this query and why is not filling the array??

The second Thing I have noticed is this query
SELECT str_entry_id FROM kaltura_entry WHERE partner_id = 110 AND moderation_status not in(1,3) AND type in(1,10118) AND entry_status = 2 AND display_in_search <> -1 AND MATCH(’( @sphinx_match_optimizations P110ST2)’) ORDER BY created_at DESC LIMIT 16 OPTION ranker=none, max_matches=16, comment=‘ip-172-31-3-143[1161679356][sphinx1]’

I could not find this table in any of my database “kaltura,kaltura_sphinx_log,kalturadw,kalturadw_bisources,kalturadw_ds,kalturalog”. Maybe in the upgrade my database is not upgraded because I have an RDS aws and I have to do manually… Therefore I thought that the upgrading in the database would be achieve in the front upgrading, it means, when you update the fron NODE automatically creates the new tables or fields in the database… is that true??

On the other hand I have noticed happens in the Index node… There is a file /opt/kaltura/log/kaltura_populate.log which is increasing and increasing without stop… That is maybe becuase my index process is stuck in something
And the last Thing I have noticed in the index is… in the file /opt/kaltura/log/sphinx/kaltura_sphinx_query.log the file is always increasing too… with this query.
SELECT str_entry_id , ((end_date >= 1520500190) OR (end_date = 0)) as cnd0 FROM kaltura_entry WHERE MATCH(’( @(name,tags,description,entry_id,reference_id,roots,puser_id,user_names) esports ) ( ( @sphinx_match_optimizations P110ST2) )’) AND partner_id IN (0,110) AND partner_id=110 AND moderation_status IN (1,2,5,6) AND moderation_status NOT IN (1,3) AND int_entry_id!=1740526941 AND start_date<=1520500190 AND entry_status IN (1,2) AND entry_status!=3 AND entry_status=2 AND display_in_search!=-1 AND type IN (1,7,10118) AND type IN (1,2,7,10118) AND cnd0>0 LIMIT 0,12 OPTION max_matches=12, comment=’‘ip-172-31-3-143[1417300880][sphinx1]’’

Again the table kaltura_entry

Those are the things that I have seen… Please give me more advices to check.
Thanks for all your time

jlotito · March 8, 2018, 1:51pm

Hi Jess

I Think I have found the solution to my problem… The problem was to use the .ans file in the upgrade proccess. Beacuse of that I have earned from the previuos version the configuration of the SECOND sphinx server as “127.0.0.1” The problem i found in the doc is that when you deploy a new RPM cluster system you have the ability to introduce a second sphinks server if you want. But in the help notice which came along this step it is said… “Leave empty if you do not have”. That was I made I left empty… but automatically it fills in the configuration section the second sphinks index as “127.0.0.1” which for a front is an error because in a front there is no index. Because of that I Made the .ans file according to my previos version and that configuration was carried.

That was the mistake in this version… In my front configuration it is configured the second index as “127.0.0.1” an error in a cluster deployment. So I have edit the file /opt/kaltura/app/configurations/db.ini, replace the second sphinks with the same url as the first. Reload the apache. And until now IS WORKING!!!

So many thanks for your guide, it was very helpful. I only recomend to erase the notice tip in the index step when you are deploy the RPM cluster, or redesing the script in order to avoid the second index WHEN YOU LEAVE THE FIELD EMPTY. Or make required the second index configuration. That would be erase the optional thing…

Thanks for all