Kaltura_sphinx_searchd.log file growing due to errors occurring

Hi @msl2000,

Your Sphinx keeps segfaulting [signal 11].
The stack trace is of no use since the searchd binary is stripped of debug symbols. We may reach a point where you’ll have to compile a custom Sphinx binary and not strip the debug symbols so that we can get a more meaningful trace but before we go there, what Linux distro are you using and what’s the output for:
# rpm -q kaltura-sphinx kaltura-base
or, if using Debian/Ubuntu:
# dpkg -l kaltura-sphinx kaltura-base

Also, does the segfault occur after performing a particular action or at predicated intervals [for instance, does it only occur once a day, like when rotating logs for instance]?

Hi Jess

The output we get is shown below

[root@media01 genisys]# rpm -q kaltura-sphinx kaltura-base
kaltura-sphinx-2.2.1-20.x86_64
kaltura-base-12.17.0-20.noarch

The segfault seems to occur all the time. So much so that in 24 hours we increase the log size to 4GB worth of data relating to segfault errors

OK…
Is this RHEL 6 or 7?
Also, what is the output for:
# ldd /opt/kaltura/sphinx/bin/searchd
?

Hi Jess

We are using CentOS Linux release 7.3.1611

As for the output, as shown below:

Sphinx 2.2.1-id64-dev (r4097)
Copyright © 2001-2013, Andrew Aksyonoff
Copyright © 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

FATAL: no readable config file (looked in /opt/kaltura/app/configurations/sphinx/sphinx.conf, ./sphinx.conf).

Sorry, I meant:
# ldd /opt/kaltura/sphinx/bin/searchd

Hi Jess

Response below:

    linux-vdso.so.1 =>  (0x00007ffdf2bfc000)
    libmysqlclient.so.18 => /usr/lib64/mysql/libmysqlclient.so.18 (0x00007f2f06976000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2f0675a000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f2f06543000)
    libssl.so.10 => /lib64/libssl.so.10 (0x00007f2f062d5000)
    libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f2f05eeb000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f2f05ce6000)
    libexpat.so.1 => /lib64/libexpat.so.1 (0x00007f2f05abc000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f2f057ba000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f2f055b1000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f2f052a8000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2f05092000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f2f04cd0000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f2f06e81000)
    libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f2f04a82000)
    libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f2f0479b000)
    libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f2f04596000)
    libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f2f04364000)
    libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f2f04154000)
    libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f2f03f50000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f2f03d36000)
    libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f2f03b0e000)
    libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f2f038ad000)

Hi @msl2000,

I built a debug RPM for you, please run:

# rpm -ihv http://installrepo.origin.kaltura.org/repo/releases/debug/kaltura-sphinx-debug-2.2.1-22.x86_64.rpm

then, edit /etc/init.d/kaltura-sphinx so that line 24 reads:

exec="$BASE_DIR/sphinx/bin/searchd.debug"

and line 33 reads:

su $OS_KALTURA_USER -c "$exec --logdebugvv --config $config"

Then restart the Sphinx daemon with:

# /etc/init.d/kaltura-sphinx restart

This will cause Sphinx to log much more info into /opt/kaltura/log/sphinx/kaltura_sphinx_searchd.log and also produce a [hopefully] meaningful backtrace.
Next time it crushes, please provide the log and we’ll see what we can do to pinpoint the cause and fix it.

Hi Jess

i ran the debug RPM and then edited the file.

When I restarted however, these were dumped after /etc/init.d/kaltura-sphinx restart:

Stopping searchd: Sphinx 2.2.1-id64-dev (r4097)
Copyright © 2001-2013, Andrew Aksyonoff
Copyright © 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/opt/kaltura/app/configurations/sphinx/kaltura.conf’…
FATAL: stop: failed to read valid pid from ‘/opt/kaltura/sphinx/searchd.pid’

Starting searchd: Sphinx 2.2.1-id64-dev (r4097)
Copyright © 2001-2013, Andrew Aksyonoff
Copyright © 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/opt/kaltura/app/configurations/sphinx/kaltura.conf’…
listening on all interfaces, port=9312
WARNING: index ‘kaltura_base’: no fields configured (use rt_field directive) - NOT SERVING
WARNING: index ‘kaltura_base_gt_in_charset’: no fields configured (use rt_field directive) - NOT SERVING
precaching index 'kaltura_entry’
WARNING: index ‘kaltura_entry’: preload: invalid meta file /opt/kaltura/sphinx/kaltura_rt.meta; NOT SERVING
precaching index 'kaltura_category’
WARNING: infix definition changed (from len=8, hashes=2 to len=0, hashes=0) - rebuilding…
Crash!!! Handling signal 11

Hi @msl2000,

Are you sure you’re now running with the debug binary? The output you provided does not seem to agree with that:)
What’s the output for:
# ps fax|grep searchd
?

Hi Jess

The output shown below:

31488 ? S 0:00 _ su kaltura -c /opt/kaltura/sphinx/bin/searchd.debug --logdebugvv --config /op t/kaltura/app/configurations/sphinx/kaltura.conf
31489 ? Ss 0:00 _ /opt/kaltura/sphinx/bin/searchd.debug --logdebugvv --config /opt/kaltura/ app/configurations/sphinx/kaltura.conf
31490 ? Zs 0:00 _ [searchd.debug]
31524 pts/0 S+ 0:00 _ grep --color=auto searchd
31491 ? S 0:00 /opt/kaltura/sphinx/bin/searchd.debug --logdebugvv --config /opt/kaltura/app/configur ations/sphinx/kaltura.conf
31521 ? S 0:00 _ /opt/kaltura/sphinx/bin/searchd.debug --logdebugvv --config /opt/kaltura/app/conf igurations/sphinx/kaltura.conf
31522 ? R 0:00 _ addr2line -e /opt/kaltura/sphinx/bin/searchd.debug 0x5882d2 0x41ad09 0xe2d6d3 70 0xe1422c86 0x49b684 0x6c740c 0x6d9327 0x424b2f 0x4490d6 0x4677d7 0x415d96 0xe12f9b35 0x4163e9

OK, looks like you’re running with the debug binary and --logdebugvv, which is good.
Has it crushed since? if so, what’s in /opt/kaltura/log/sphinx/kaltura_sphinx_searchd.log?

hi Jess

Thanks for the response. It is still continously crashing.

This is the output of the log file

Sphinx 2.2.1-id64-dev (r4097)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with gcc 4.9.2
Configured with flags: ‘–build=x86_64-redhat-linux-gnu’ ‘–host=x86_64-redhat-linux-gnu’ ‘–program-prefix=’ ‘–disable-dependency-tracking’ ‘–prefix=/usr’ ‘–exec-prefix=/usr’ ‘–bindir=/usr/bin’ ‘–sbindir=/usr/sbin’ ‘–sysconfdir=/etc’ ‘–datadir=/usr/share’ ‘–includedir=/usr/include’ ‘–libdir=/usr/lib64’ ‘–libexecdir=/usr/libexec’ ‘–localstatedir=/var’ ‘–sharedstatedir=/var/lib’ ‘–mandir=/usr/share/man’ ‘–infodir=/usr/share/info’ ‘–sysconfdir=/opt/kaltura/app/configurations/sphinx’ ‘–with-mysql’ ‘–with-unixodbc’ ‘–with-iconv’ ‘–enable-id64’ ‘–with-syslog’ ‘–prefix=/opt/kaltura/sphinx’ ‘–mandir=/opt/kaltura/sphinx/share/man’ ‘–bindir=/opt/kaltura/sphinx/bin’ ‘–with-debug’ ‘build_alias=x86_64-redhat-linux-gnu’ ‘host_alias=x86_64-redhat-linux-gnu’ ‘CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic’ 'LDFLAGS=-Wl,-z,relro ’ 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic’
Host OS is Linux installrep 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Stack bottom = 0x7fff9d029cf7, thread stack size = 0x10000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x7fff9d027ce0)
Stack looks OK, attempting backtrace.
0x41ad09
0x7f001bd61c86
0x6c740c
0x6d9327
0x424b2f
0x4490d6
0x4677d7
0x415d96
0x7f001bc38b35
Something wrong in frame pointers, manual backtrace failed (fp=0)
Trying system backtrace:
begin of system symbols:
/opt/kaltura/sphinx/bin/searchd.debug[0x5882d2]
/opt/kaltura/sphinx/bin/searchd.debug[0x41ad09]
/lib64/libpthread.so.0(+0xf370)[0x7f001d6ac370]
/lib64/libc.so.6(+0x14ac86)[0x7f001bd61c86]
/opt/kaltura/sphinx/bin/searchd.debug[0x49b684]
/opt/kaltura/sphinx/bin/searchd.debug[0x6c740c]
/opt/kaltura/sphinx/bin/searchd.debug[0x6d9327]
/opt/kaltura/sphinx/bin/searchd.debug[0x424b2f]
/opt/kaltura/sphinx/bin/searchd.debug[0x4490d6]
/opt/kaltura/sphinx/bin/searchd.debug[0x4677d7]
/opt/kaltura/sphinx/bin/searchd.debug[0x415d96]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f001bc38b35]
/opt/kaltura/sphinx/bin/searchd.debug[0x4163e9]
-------------- backtrace ends here ---------------
Backtrace looks OK. Now you have to do following steps:

  1. Run the command over the crashed binary (for example, ‘searchd’):
    nm -n searchd > searchd.sym
  2. Attach the binary, generated .sym and the text of backtrace (see above) to the bug report.
    Also you can read the section about resolving backtraces in the documentation.
    — BT to source lines (depth 13): —
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/sphinxutils.cpp:1981
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/searchd.cpp:1963
    ??:0
    ??:0
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/sphinx.cpp:7964
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/sphinxrt.cpp:4040
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/sphinxrt.cpp:4000
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/searchd.cpp:18369
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/searchd.cpp:20717
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/searchd.cpp:21336
    /home/jess/rpmbuild/BUILD/kaltura-sphinx-2.2.1/src/searchd.cpp:21705
    ??:0
    ??:?
    — BT to source lines finished —
    — 0 active threads —
    ------- CRASH DUMP END -------
    [Fri Oct 20 09:53:25.189 2017] [12605] Child process 12947 has been killed with signal 11, will be restarted
    [Fri Oct 20 09:53:25.189 2017] [12605] Child process 12949 has been forked
    [Fri Oct 20 09:53:25.190 2017] [12949] listening on all interfaces, port=9312
    [Fri Oct 20 09:53:25.190 2017] [12949] WARNING: index ‘kaltura_base’: no fields configured (use rt_field directive) - NOT SERVING
    [Fri Oct 20 09:53:25.190 2017] [12949] WARNING: index ‘kaltura_base_gt_in_charset’: no fields configured (use rt_field directive) - NOT SERVING
    [Fri Oct 20 09:53:25.191 2017] [12949] WARNING: index ‘kaltura_entry’: preload: invalid meta file /opt/kaltura/sphinx/kaltura_rt.meta; NOT SERVING
    [Fri Oct 20 09:53:25.191 2017] [12949] WARNING: infix definition changed (from len=8, hashes=2 to len=0, hashes=0) - rebuilding…

— crashed SphinxAPI request dump —
— request dump end —

Hi @msl2000,

I’m afraid I am going to need more info in order to help you get to the bottom of this. You’ll have to provide a coredump.
Before I go into explaining how, it may also be worth trying to reindex the Kaltura indeces.
This can be done by running:
# /opt/kaltura/bin/kaltura-sphinx-reindex.sh
Note that this will cause an outage so if you can reproduce the segfault on a testing ENV rather than your production one, you should.

After reindexing, if the issue does not go away, here is what you should do:

  • disable Monit’s Sphinx configuration by removing this symlink:
/opt/kaltura/app/configurations/monit/monit.d/enabled.sphinx.rc 
  • stop the Sphinx daemon with:
# /etc/init.d/kaltura-sphinx stop
  • make sure no orphan searchd procs are running with:
# ps fax|grep searchd
  • run sphinx with GDB like so:
# gdb  --args /opt/kaltura/sphinx/bin/searchd --config /opt/kaltura/app/configurations/sphinx/kaltura.conf --console

you’ll probably want to do that using screen, see:

When it crushes, in the GDB console, you will see the stacktrace and you’ll also be able to save generate a coredump file with:

(gdb) generate-core-file

The output will tell you the file’s name, for instance:

Saved corefile core.18423

Then attach this file so I can take a look.

Hi jess

As per instructed, I am trying to reindex the Kaltura indeces with the command below,

/opt/kaltura/bin/kaltura-sphinx-reindex.sh

Possibly a rookie problem, but am encountering the following error and unsure what it signifies:

searchd is stopped
Backing up files to /opt/kaltura/sphinx.bck.1508800044. Once the upgrade is done and tested, please remove this directory to save space
mv: cannot stat ‘/opt/kaltura/sphinx/kaltura’: No such file or directory_
_mv: cannot stat ‘/opt/kaltura/log/sphinx/data/binlog.
’: No such file or directory_
Starting searchd: Sphinx 2.2.1-id64-dev (r4097)
Copyright © 2001-2013, Andrew Aksyonoff
Copyright © 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/opt/kaltura/app/configurations/sphinx/kaltura.conf’…
FATAL: failed to lock pid file ‘/opt/kaltura/sphinx/searchd.pid’: Resource temporarily unavailable (searchd already running?)

Restarting kaltura-monit (via systemctl): [ OK ]
Starting searchd: Sphinx 2.2.1-id64-dev (r4097)
Copyright © 2001-2013, Andrew Aksyonoff
Copyright © 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/opt/kaltura/app/configurations/sphinx/kaltura.conf’…
FATAL: failed to lock pid file ‘/opt/kaltura/sphinx/searchd.pid’: Resource temporarily unavailable (searchd already running?)

When I attempt to run sphinx with GDB, I receive:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright © 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /opt/kaltura/sphinx/bin/searchd…
warning: the debug information found in “/opt/kaltura/sphinx/bin/searchd.debug” does not match “/opt/kaltura/sphinx/bin/searchd” (CRC mismatch).

Reading symbols from /opt/kaltura/sphinx/bin/searchd…(no debugging symbols found)…done.
(no debugging symbols found)…done.
Missing separate debuginfos, use: debuginfo-install kaltura-sphinx-2.2.1-20.x86_64

In the GBD console, I tried to generate the core dump files but receive the following response.

You can’t do that without a process to debug.

Hi @msl2000,

In regards to reindexing, I believe the Sphinx daemon was not actually stopped, hence causing the issue.
Start by manually stopping the watchdog with:
# service kaltura-monit stop
Then stop Sphinx:
# service kaltura-sphinx stop
Then make sure Sphinx is not running:
# ps fax|grep searchd
Should return no PIDs.

After doing that, rerun /opt/kaltura/bin/kaltura-sphinx-reindex.sh.
It should finish successfully, if it doesn’t please post the output.

As for GDB, in your case, you should use /opt/kaltura/sphinx/bin/searchd.debug not /opt/kaltura/sphinx/bin/searchd, so the GDB command will be:

# gdb  --args /opt/kaltura/sphinx/bin/searchd.debug --config /opt/kaltura/app/configurations/sphinx/kaltura.conf --console

Hi Jess

After stopping the watchdog and sphinx, I get the following message after calling ps fax|grep searchd

[root@media01 ~]# ps fax|grep searchd

21166 pts/0    S+     0:00  |       \_ grep --color=auto searchd
 9183 ?        S      0:00 /opt/kaltura/sphinx/bin/searchd.debug --logdebugvv --                                                       config /opt/kaltura/app/configurations/sphinx/kaltura.conf
 9184 ?        Sl     1:15  \_ /opt/kaltura/sphinx/bin/searchd.debug --logdebugv                                                       v --config /opt/kaltura/app/configurations/sphinx/kaltura.conf

And am encountering the same problem as before when trying to generate the core files with the commands given.

[root@media01 ~]# gdb --args /opt/kaltura/sphinx/bin/searchd.debug --config /opt/kaltura/app/configurations/sphinx/kaltura.conf --console

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright © 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /opt/kaltura/sphinx/bin/searchd.debug…done.

(gdb) generate-core-file
You can’t do that without a process to debug.
(gdb)

Thanks jess

Hi @msl2000,

If service kaltrua-sphinx stop does not kill these, then just kill them manually, then try to reindex.
If this solves the issue and it no longer crushes, no need to continue with generating a coredump, but if it does…
After launching GDB with:

# gdb --args /opt/kaltura/sphinx/bin/searchd.debug --config /opt/kaltura/app/configurations/sphinx/kaltura.conf --console

You need to type ‘r’ to actually make it run.
Only after it crushes, are you supposed to type generate-core-file

Hi jess

Problem persists as before. where we were unable to generate core-files:

[root@media01 ~]# gdb --args /opt/kaltura/sphinx/bin/searchd.debug --config /opt/kaltura/app/configurations/sphinx/kaltura.conf --console
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright © 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /opt/kaltura/sphinx/bin/searchd.debug…done.
(gdb) run
Starting program: /opt/kaltura/sphinx/bin/searchd.debug --config /opt/kaltura/app/configurations/sphinx/kaltura.conf --console
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
Sphinx 2.2.1-id64-dev (r4097)
Copyright © 2001-2013, Andrew Aksyonoff
Copyright © 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/opt/kaltura/app/configurations/sphinx/kaltura.conf’…
listening on all interfaces, port=9312
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
bind() failed on 0.0.0.0, retrying…
FATAL: bind() failed on 0.0.0.0: Illegal seek
[Inferior 1 (process 3536) exited with code 01]
Missing separate debuginfos, use: debuginfo-install kaltura-sphinx-debug-2.2.1-22.x86_64
(gdb) generate-core-file
You can’t do that without a process to debug.
(gdb)

Apologies but would it be easier for you to have access to the server to make it easier to pinpoint the problem?

Thanks in advance!

Hi @msl2000,

Sphinx was not launched from GDB because it was already running. Hence the bind() failed on 0.0.0.0.
Like I said, before you try to launch it from GDB, you must stop the currently running Sphinx and the Monit daemon which serves as a watchdog. Reason being that if you do not, Monit will relaunch it.

Also, did you manage to reindex Sphinx?

Hi Jess

Apologies for the late response but been trying several things, and decided to redo everything from the start of this thread.

There is always 1 error that pops up. This time it appeared when I tried to restart the Sphinx Daemon.

failed to lock pid file ‘/opt/kaltura/sphinx/searchd.pid’: Resource temporarily unavailable (searchd already running?)