Retrieving and Updating Media Captions

Jamie · May 19, 2022, 11:03am

Hello,

I have a word which is often confused with another in the auto-captioning produced by Kaltura. For example, the word “Arup” gets confused with “Arab”, understandably. I would like to automate the process of correcting the word Arab to Arup. It seems to be possible in the KMC to find and replace words in the captions, is it possible to do the same via the API?

A process along the lines of, for a media space video, get the captions, do some alterations, and make another call to the API to update with the altered version.

Thanks,
Jamie

jess · May 25, 2022, 9:47pm

Hi @Jamie ,

In general, anything that can be done using Kaltura’s web I/Fs (KMS, KMC, editing app, etc) can be done using the API and in fact, the Kaltura interfaces also invoke the same API actions to get the job done.

For editing captions, see the captionAsset service actions (captionAsset - Kaltura VPaaS API Documentation). Also, have a look here for documentation of the dictionary feature:
REACH - Ordering Captions For Media Per Entry | Kaltura Knowledge Center

Cheers,

Jamie · May 30, 2022, 9:09am

Hi @jess,

I have looked at the captionAsset endpoint and I am confused about how I get the captionAssetId for the .get endpoint. In the .list endpoint I can add a filter on entryIdEqual but for some reason, this returns empty. I tried it with a video ID that I know has captions.

filter = KalturaAssetFilter()
filter.entryIdEqual = "1_3bux4j0v"
pager = KalturaFilterPager()

result = client.caption.captionAsset.list(filter, pager)
print(result)

{
  "objects": [],
  "totalCount": 0,
  "objectType": "KalturaCaptionAssetListResponse"
}

I have also looked at what is returned from media.list and I don’t see a captionAssetId

Any help would be greatly appreciated!

jess · May 30, 2022, 9:48am

Hi @Jamie ,

The below code returns the expected result for me (one captionasset object) so my guess would be you’ve neglected to set disableentitlement when generating the KS…

from KalturaClient import *
from KalturaClient.Plugins.Core import *
from pprint import pprint

config = KalturaConfiguration()
config.serviceUrl = "https://www.kaltura.com/"
client = KalturaClient(config)
ks = client.generateSessionV2(
      "ADMIN_SECRET",
      "YOUR_USER_ID",
      KalturaSessionType.ADMIN,
      PARNTER_ID,86400,'disableentitlement')
client.setKs(ks)


filter = KalturaAssetFilter()
filter.entryIdEqual = "1_3bux4j0v"
pager = KalturaFilterPager()
result = client.caption.captionAsset.list(filter, pager)

for obj in result.objects:
        pprint(vars(obj))

Let me know if that’s not the case,

Jamie · May 31, 2022, 1:55pm

Hi @jess,

I managed to fix it, it was because I was running it on the web API, and then when I tried it with disable entitlement on my PC I was using the wrong credentials

So using your code I am able to get the caption asset ID, and with this, I can make a request to captionAsset.get which returns…

{'accuracy': 88,
 'actualSourceAssetParamsIds': '',
 'associatedTranscriptIds': '1_oe9aired,1_gbivq2ut',
 'captionParamsId': 0,
 'createdAt': 1651107994,
 'deletedAt': None,
 'description': '',
 'displayOnPlayer': True,
 'entryId': '1_3bux4j0v',
 'fileExt': 'srt',
 'format': <KalturaClient.Plugins.Caption.KalturaCaptionType object at 0x000002048BCDE148>,
 'id': '1_dsr85xbx',
 'isDefault': <KalturaClient.Plugins.Core.KalturaNullableBoolean object at 0x000002048BCDEC08>,
 'label': 'English',
 'language': <KalturaClient.Plugins.Core.KalturaLanguage object at 0x000002048BCBD5C8>,
 'languageCode': <KalturaClient.Plugins.Core.KalturaLanguageCode object at 0x000002048BCBD448>,
 'parentId': '',
 'partnerData': '',
 'partnerDescription': '',
 'partnerId': 529921,
 'relatedObjects': {},
 'size': 6013,
 'sizeInBytes': 6013,
 'source': None,
 'status': <KalturaClient.Plugins.Caption.KalturaCaptionAssetStatus object at 0x000002048BCDE1C8>,
 'tags': '',
 'updatedAt': 1651107997,
 'version': 1}

which has the associated transcript ids, with that how do I get the transcript data and then update it?

jess · May 31, 2022, 2:37pm

Hi @Jamie ,

To obtain the caption asset, use captionasset.serve(), to update it, call captionasset.setContent().
If you want to update the transcript as well, use the same actions but of the attachmentasset service.

Cheers,

Jamie · May 31, 2022, 4:15pm

Hi @jess,

Sorry to keep asking questions, but when I run

client = auth(secret, partner_id, user_id, service_url, privileges)

    filter = KalturaAssetFilter()
    filter.entryIdEqual = "1_3bux4j0v"
    pager = KalturaFilterPager()
    result = client.caption.captionAsset.list(filter, pager)

    for id in result.objects[0].associatedTranscriptIds.split(","):
        serve_options = KalturaAttachmentServeOptions()
        r = client.attachment.attachmentAsset.serve(id, serve_options)
        pprint(r)

I get the output in the form of two URLs which both give me the below error… do I need to configure the KalturaAttachmentServeOptions() or is it do with the URL formatting?

thanks for all the help btw!

jess · May 31, 2022, 7:23pm

Hi @Jamie ,

There’s no need to set/pass a KalturaAttachmentServeOptions object.
When I run the below code, I get two valid URLs that return the text in the transcript:

filter = KalturaAssetFilter()
filter.entryIdEqual = "1_3bux4j0v"
pager = KalturaFilterPager()
result = client.caption.captionAsset.list(filter, pager)
for id in result.objects[0].associatedTranscriptIds.split(","):
    r = client.attachment.attachmentAsset.serve(id)
    pprint(r)

If you’re not, please send me a private message with the URLs you get back and we can debug further.

Cheers,

Jamie · June 1, 2022, 8:12am

Hi @jess,

That was what was causing the problem, as soon as I removed the serve options object it worked. I was following the docs which included it.

thanks!

jess · June 6, 2022, 5:44pm

Hi @Jamie ,

Glad we’re good.
I’ve also fixed the issue you’ve found in this pull:

Not crucial to you since, as noted previously, you don’t need to pass that object when making the request but just FYI.

Cheers,