Skip to content

getDocumentsPerTopicsProbabilities Undefined offset: 0 #64

@slava-vishnyakov

Description

@slava-vishnyakov

I'm trying to follow http://php-nlp-tools.com/posts/introducing-latent-dirichlet-allocation.html
But trying to call getDocumentsPerTopicsProbabilities at the end:

$docs = [
    'The queen does something',
    'Queen is very good queen',
    'Mission mission mission',
    'What is mission your mission'
];

$tok = new WhitespaceTokenizer();
$tset = new TrainingSet();
foreach ($docs as $line) {
    $tset->addDocument(
        '', // the class is not used by the lda model
        new TokensDocument(
            $tok->tokenize(
                mb_strtolower($line)
            )
        )
    );
}

$lda = new Lda(
    new DataAsFeatures(), // a feature factory to transform the document data
    2, // the number of topics we want
    1, // the dirichlet prior assumed for the per document topic distribution
    1  // the dirichlet prior assumed for the per word topic distribution
);

$lda->train($tset,50);

$lda->getDocumentsPerTopicsProbabilities(2);

This results in:

Undefined offset: 0 at
vendor/nlp-tools/nlp-tools/src/NlpTools/Models/Lda.php:243

image

This probably requires something along the lines of:

if (!isset($count_topics_docs[$doc])) {
    $count_topics_docs[$doc] = [];
}
if (!isset($count_topics_docs[$doc][$t])) {
    $count_topics_docs[$doc][$t] = 0;
}

also, further down you have a variable $limit_docs, which is undefined, maybe the signature of method is incorrect public function getDocumentsPerTopicsProbabilities($limit_docs = -1), maybe it's $limit_words there?

But, anyways, after running this method on this input:

$docs = [
    'The queen does something',
    'Queen is very good queen',

    'Mission mission mission',
    'What is mission your mission'
];
...
$lda->getDocumentsPerTopicsProbabilities(2);

I get this result:

[
0.3333333333333333,
0.3333333333333333,
0.3333333333333333,
0.3333333333333333
]

And I'm not sure how to interpret that... :)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions