Page 1 of 1

Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Tue May 26, 2020 11:46 am
by Daniel Beardsmore
Good afternoon

I am looking after an old VICIdial installation, 2.8-404a, which has a curious bug that I am led to believe was there from the start.

Specifically, the Real-Time Main Report will sometimes show agents who are on calls as being in status “DISPO” instead of in status “INCALL”. The internal symptoms are that rows within the vicidial_live_agent table are sometimes set such that status = 'PAUSED' AND lead_id <> 0. This is the set of criteria used by the “DISPO” pseudo-status, but these are agents who are on calls and their status is not being set back to 'INCALL' when they receive or place the next call. There are no SQL errors being logged for updates to vicidial_live_agent and no apparent indication that any of the rest of agc/vdc_db_query.php is failing (updating the status line to INCALL is only a minuscule fraction of the 12,000 lines within that script, and one would like to imagine that there would be much more drastic problems if invocations of that script were failing.

Does this ring a bell for anyone? Does anyone recall seeing such a bug before? This bug is intermittent with no known trigger.

Regards

Daniel.

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Tue May 26, 2020 4:36 pm
by mflorell
I can't say I've heard of that issue before, but that code is about 7 years old at this point, and the system is probably older than that.

First of all, we've fixed hundreds of bugs and completely rewritten code related to this in several places in the last 7 years, so a VICIdial upgrade may fix whatever this issue is. Of course there may be an underlying issue with your 7+ year old system that needs to be addressed first, possibly database maintenance or archiving of data.

Our usual suggestion when approached by a client in this type of situation is to put together a new system on new hardware, export the database from the old system, upgrade it's DB schema, install it on the new system and leave the old system behind. When we do this, the old problem almost always goes away.

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Wed May 27, 2020 3:29 am
by Daniel Beardsmore
As it happens that is the only version that was ever installed, and it’s on virtual hardware which has already been replaced recently. vicidial_live_agent is a tiny table that would not struggle with excessive lock time, and although it is possible that vdc_db_query.php could completely time out on a previous table lock or table operation before it reaches updating that table, I think we would be seeing far worse problems if that were the case. The agents are not aware of the bug at all: it’s only the people who monitor the Real-Time Main Report who ever notice it. If the agents had any awareness of it, that might at least offer some clue as to what might trigger it, as it seems to be intermittent (it’s hard to be sure as I am not getting answers to most of my questions about the problem, and since I don’t use their phone system I cannot speak from experience).

Hopefully one day it will get rebuilt, but from what I understand, that is not currently planned.

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Tue Jun 16, 2020 5:21 am
by Daniel Beardsmore
From what I can observe, the situation is as follows:

  1. The agent’s browser stops sending ping requests, so last_update_time under vicidial_live_agents stops getting updated
  2. VICIdial detects that the agent has been gone for 30 seconds, and changes the agent’s status within vicidial_live_agents to “PAUSED”
  3. As the agent is still on a call, and thus has lead_id in vicidial_live_agents set, the Real-Time Main Report misreports the agent’s status as “DISPO”

There is a small problem.

Step (b) is performed by AST_VDauto_dial.pl. I have added custom logging to this script, and that action never occurs (but if I add an “else” statement, it will log that no lagged agents were detected). I even broke this script by mistake (copied my PHP logging code into a Perl script) and the symptoms persisted even with that script unable to start.

This means that somehow the code portion of AST_VDauto_dial.pl responsible for idle detection is dead, and some other portion of the software handles this task instead.

I have tried the following search to locate anywhere that is updated in any fashion:

cd /
grep -n -i -r -E --exclude-dir=proc --exclude-dir=sys --exclude-dir=usr/src --include=*.{php,pl,cgi,agi} 'UPDATE.+vicidial_live_agents' /

Having examined every single result, every line of code that could possibly make that change, is covered by my own logging, and no such change is ever detected.

Is there something else besides AST_VDauto_dial.pl that handles lag detection? Does something unexpectedly call into /usr/src/astguiclient/ somehow? Some obscure script with no filename extension (relying on #! for execution)? I just searched for *.pm just in case, although I’ve never seen any Perl modules for this.

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Thu Apr 21, 2022 9:43 am
by Ikram_Ali
Facing the same problem, any solution for that.

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Thu Apr 21, 2022 9:52 am
by Daniel Beardsmore
Not from me. VICIdial’s haphazard construction makes it unmaintainable. Instead of centralised routine libraries for co-ordinating management of data (read/write/permissions/access logging), SQL data access is strewn across the codebase. You can’t trace activity as data changes are not funnelled through proper access control (where you could use existing logging or add your own tracing/logging), so it becomes a silly guessing game, and I ran out of time to keep investigating. From what I could tell, this is some kind of weird front-end JavaScript bug and that is not something that I have the means to monitor.

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Thu Apr 21, 2022 10:19 am
by carpenox
its usually related to poor internet connections in my experience. Check your agents LAG on sip show peers, see what type of connection they have to the server

Re: Old(?) bug: agents on calls stuck in “DISPO” state

PostPosted: Thu Apr 21, 2022 11:48 am
by martinch
Hey guys,

We've discussed the reasons why this happens in viewtopic.php?f=4&t=41269 as carpenox has stated, this is linked the LAGGED status where the agent interface loses connection with the ViCi backend.

I've also logged a ticket on Mantis with ID #0001356 and provided a quick and dirty patch but Matt is looking at it.

Thanks,
Martin.