Disparity between KMC entries and Media returned from API

Jamie · May 16, 2022, 3:09pm

I have written a script that gets all the media under a specific category, skills networks. I have exported the data to a .csv for testing purposes and it had 4863 media entires. However, when I look at the KMC it says there are only 4364 entries.

I think this is to do with a “status” field which is a number. I have a feeling this may mean hidden, private or something to that effect. However, I couldn’t find an endpoint that has this information.

Would anyone know why there is a disparity, and if it is to with the status field?

Thanks,
Jamie

jess · May 16, 2022, 3:35pm

Hi @Jamie ,

When calling media.list(), you can pass along a KalturaMediaEntryFilter object. That object has a member called statusEqual as well as a statusIn one.
See media.list - Kaltura VPaaS API Documentation.
If you use statusIn, the expected value is a comma separated list of statuses (these are numeric but we have enums for them, see - https://developer.kaltura.com/api-docs/General_Objects/Enums/KalturaEntryStatus)

Also, did you remember to set disableentitlement in the privileges param when generating the session? See How to get all media entries order by lastPlayedAt in PHP - #3 by kenpeter

Jamie · May 16, 2022, 3:55pm

Hi @Jess,

Looks like this isn’t the issue, all of my entries have a status of 2, ready.

Any other ideas?

jess · May 16, 2022, 3:56pm

Hi @Jamie ,

Did you see my question about disableentitlement?

Jamie · May 16, 2022, 3:57pm

Sorry I managed to miss that!

Jamie · May 16, 2022, 4:03pm

Hi @jess,
After reading your message properly, yes I set disable entitlement, and looking at kenpeter’s problem mine seems to be the opposite. He was getting fewer entries from the API than from KMC whereas I am getting more.

jess · May 16, 2022, 4:05pm

Hi @Jamie ,

Please share your full code.

Jamie · May 16, 2022, 4:18pm

Hi @Jess,

The code below is giving me 4863 entries

from datetime import datetime, timedelta
import KalturaClient.exceptions
import pandas as pd
from KalturaClient import *
from KalturaClient.Plugins.Core import *
import re


def auth(secret, partner_id, user_id, service_url, privileges):
    config = KalturaConfiguration(partner_id)
    config.serviceUrl = service_url
    k_type = KalturaSessionType.ADMIN
    expiry = 86400
    client = KalturaClient(config)
    ks = client.session.start(secret, user_id, k_type, partner_id, expiry, privileges)
    client.setKs(ks)
    return client


def write_to_excel(data, columns):
    df = pd.DataFrame(data, columns=columns)
    df.to_excel("data.xlsx", index=False)


def get_media(client, first_run=False):
    columns = ["Media ID", "Title", "URL", "Description", "Thumbnail Image URL", "Created By", "Created On", "Category", "Tags",
               "Length", "Plays", "Likes"]

    PAGESIZE = 500

    error = False
    i = 0

    data = []

    while not error:
        try:
            print(f"Downloading Page {i}...")
            filter = KalturaMediaEntryFilter()
            filter.categoryAncestorIdIn = "19562491"
            if not first_run:
                filter.createdAtGreaterThanOrEqual = int((datetime.utcnow() - timedelta(days=1)).timestamp())
            pager = KalturaFilterPager()
            pager.pageSize = PAGESIZE
            pager.pageIndex = i
            result = client.media.list(filter, pager)

            for obj in result.objects:
                result_data = vars(obj)

                obj_data = [result_data['id'], result_data["name"],
                            f"https://media.arup.com/media/{result_data['id']}", "",
                            result_data["thumbnailUrl"], result_data["userId"], "",
                            "", result_data["tags"], result_data["duration"],
                            result_data["plays"], result_data["votes"]]
                data.append(obj_data)



            i += 1
        except Exception as err:
            print(err)
            error = True
    write_to_excel(data, columns)


if __name__ == '__main__':
    secret = 
    user_id = 
    service_url = 
    partner_id = 
    privileges = "*,disableentitlement"

    client = auth(secret, partner_id, user_id, service_url, privileges)

    get_media(client, True)
    print("...Done")

jess · May 16, 2022, 4:32pm

Hi @Jamie ,

I have not run your code but when I run this simple snippet:

from KalturaClient import *
from KalturaClient.Plugins.Core import *

config = KalturaConfiguration()
config.serviceUrl = "https://www.kaltura.com/"
client = KalturaClient(config)
ks = client.generateSessionV2(
      "SECRET",
      "YOUR_USER_ID",
      KalturaSessionType.ADMIN,
      PARTNER_ID,86400,'disableentitlement')
client.setKs(ks)

filter = KalturaMediaEntryFilter()
filter.categoryAncestorIdIn = "19562491"
result = client.media.list(filter)
print(result.totalCount)

I get 4363. Same is true when invoking media.count().
I believe your timestamp filter may be off (have not debugged yet). Firstly, do you get the same result as I am when running the above? And secondly (assuming that you do), did you check whether your CSV includes duplicate entry IDs?

Cheers,

Jamie · May 16, 2022, 4:50pm

Hi @jess,

How strange there were 500 duplicates, 4863-500 = 4363.

…

Figured it out, it was because to get all the data I was indexing the pages starting from 0, which turns out just returns the exact same page as index 1. So I was getting 500 duplicates as that is the page size I was using.

Thanks for your help!

jess · May 17, 2022, 1:02pm

Hi @Jamie ,

I reviewed your script. Indeed, pager.pageIndex should always be set to 1 on the first iteration (I agree it’s somewhat confusing).

Glad we’re good now.

Cheers,