Page 1 of 1

7.0.4 server with ongoing time sync issues - also on 8.0.1

PostPosted: Thu Sep 07, 2017 11:20 am
by thirdhatch
2 - 4C Xeon Processors
48 GB Memory
256 GB SSD

I am getting a lot of time sync errors on this server. I have tried changing the NTP servers that I am syncing with, but it still seems to happen at least once a week. It isn't the PHP or DB time, it is the top line. Sometimes it is out of sync by several hours, sometimes by just a few minutes. I have read through the previous solutions for this, but I still can't seem to solve the problem. Any additional insight or advice would be most welcome.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Thu Sep 07, 2017 2:30 pm
by williamconley
1) Welcome to the Party! 8-)

2) As you are obviously new here, I have some suggestions to help us all help you:

When you post, please post your entire configuration including (but not limited to) your installation method (7.X.X?) and vicidial version with build (VERSION: 2.X-XXXx ... BUILD: #####-####).

This IS a requirement for posting along with reading the stickies (at the top of each forum) and the manager's manual (available on EFLO.net, both free and paid versions)

You should also post: Asterisk version, telephony hardware (model number is helpful here), cluster information if you have one, and whether any other software is installed in the box. If your installation method is "manual/from scratch" you must post your operating system with version (and the .iso version from which you installed your original operating system) plus a link to the installation instructions you used. If your installation is "Hosted" list the site name of the host.

If this is a "Cloud" or "Virtual" server, please note the technology involved along with the version of that techology (ie: VMware Server Version 2.0.2). If it is not, merely stating the Motherboard model # and CPU would be helpful.

Similar to This:

Vicibox X.X from .iso | Vicidial X.X.X-XXX Build XXXXXX-XXXX | Asterisk X.X.X | Single Server | No Digium/Sangoma Hardware | No Extra Software After Installation | Intel DG35EC | Core2Quad Q6600

3) Try this:

/etc/ntp.conf

Code: Select all
# Stock configuration
driftfile /var/lib/ntp/drift/ntp.drift
logfile /var/log/ntp
keys /etc/ntp.keys
trustedkey 1
requestkey 1
controlkey 1
restrict -4  default notrap nomodify nopeer noquery
restrict -6  default notrap nomodify nopeer noquery
restrict 127.0.0.1
restrict ::1
# PT suggested Servers
server time-c.nist.gov
server time-a.nist.gov
server time-b.nist.gov
server time-d.nist.gov
server time.nist.gov
disable monitor
# PT additions
server 127.127.1.0
restrict 127.127.1.0
fudge 127.127.1.0 stratum 10


service ntp restart

wait a minute

ntpq -p

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 08, 2017 11:50 am
by thirdhatch
Thanks, William. I'm not new, it has just been awhile since I posted. I will mend my wayward posting actions.

2 - 4C Xeon Processors
48 GB Memory
256 GB SSD
VERSION: 2.14-620a
BUILD: 170623-2142
Vicidial (vicibox) 7.0.4 in Express mode (single server)

I have updated the ntp servers to see if this helps. The server was already syncing with different ntp servers, but still losing sync. I'll report back in a few days.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 08, 2017 12:37 pm
by williamconley
thirdhatch wrote:Thanks, William. I'm not new, it has just been awhile since I posted. I will mend my wayward posting actions.
Utter Chaos. lol


thirdhatch wrote:I have updated the ntp servers to see if this helps. The server was already syncing with different ntp servers, but still losing sync. I'll report back in a few days.

Only ONE server in your cluster should have this configuration. That would be the master for time for the rest of the servers.

The other servers only need to have the same time as the master. The actual time is irrelevant.

So the other servers in the cluster should have this:
Code: Select all
driftfile /var/lib/ntp/drift/ntp.drift
logfile /var/log/ntp

#server 0.north-america.pool.ntp.org
#server 1.north-america.pool.ntp.org
#server 2.north-america.pool.ntp.org
#server 3.north-america.pool.ntp.org
#server pool.ntp.org
server 127.127.1.0

#restrict default nomodify notrap
restrict 127.127.1.0

#fudge 127.127.1.0 stratum 10

server xxx.xxx.xxx.xxx iburst


where xxx.xxx.xxx.xxx is the local IP of that master server.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 08, 2017 1:06 pm
by thirdhatch
Got it. Typo on the "servers". There is only one server in this configuration. Thanks for the help, William!

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 08, 2017 2:38 pm
by williamconley
thirdhatch wrote:Got it. Typo on the "servers". There is only one server in this configuration. Thanks for the help, William!

Just cuz you're too cheap to have more servers doesn't make it a typo on my end. Get another server, ya cheapskate!

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 22, 2017 12:48 pm
by thirdhatch
I meant my typo. But as a follow up, we got this error again today for the first time since I moved to the new NTP servers. Any ideas what could still be causing this? It was only the apache that was out of sync, php and the database were fine.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 22, 2017 1:17 pm
by williamconley
thirdhatch wrote:I meant my typo. But as a follow up, we got this error again today for the first time since I moved to the new NTP servers. Any ideas what could still be causing this? It was only the apache that was out of sync, php and the database were fine.

1) Overload? What was the average server load on the server at the time in question?

2) How much "off" was apache?

3) Is one of your severs the master for the others, or are you trying to use external NTP sources on all servers (bad idea)?

4) Did everyone get this error, or only one person, or only everyone on ONE server? Often this error is not related to time sync, but to packets being dropped between the agent screen and the agent's web server.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Mon Sep 25, 2017 10:59 am
by thirdhatch
1. Not overload, there are 4 agents on a server that could easily accommodate 30+ agents. (less than 1% load most of the time)
2. Apache is always off by less than a minute, but around 40 seconds usually.
3. There is only one server, it runs ntp as a daemon to sync with the servers specified above.
4. Everyone is getting the error, and when I go to the report page the server is highlighted in red.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Mon Sep 25, 2017 2:03 pm
by thirdhatch
This has happened like 4 times this morning now. It is happening around every 90 minutes.

I have noticed that since I moved to 7.0.4 I have had a lot more problems. I'm wondering if I should update suse to the new version like Kumba posted about at the top of the forum. This is unmanageable.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Mon Sep 25, 2017 3:12 pm
by mflorell
I've been testing vicibox 8 in our lab since last week and it seems pretty stable. We don't have any production clients on it, but if you run into issues we will be looking at them since it's still in a testing phase.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Mon Sep 25, 2017 5:50 pm
by thirdhatch
Matt, do you have a place I can download the iso and test?

Thanks!

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Mon Sep 25, 2017 10:26 pm
by mflorell
I've been told no link yet, I guess I shouldn't have posted that yet, we're apparently still testing VICIbox 8.

Kumba had installed it on two servers for me here at the office from a USB stick last week.

He did say it should be wrapped up soon, apparently he's still refining the screens.

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Tue Sep 26, 2017 2:59 pm
by mflorell
Looks like Kumba's released vicibox 8!
viewtopic.php?f=8&t=37680

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Fri Sep 29, 2017 3:17 pm
by thirdhatch
As an FYI, this is still an ongoing issue. We couldn't upgrade to v8 yet because of some bugs in the initial build that prevent us from using SSL. We will try again on Monday after Kumba wraps the 8.0.1 iso.

But what we have managed to do is prevent this from having too large an impact by setting NTP to update every minute. This seems to have headed off the problem for the moment, but makes me think there is something else going on either with the hardware or the OS software that is no longer up to date. Hoping V8 fixes it for us permanently.

Thanks everyone for the assistance!

Re: 7.0.4 server with ongoing time sync issues

PostPosted: Thu Oct 05, 2017 8:43 pm
by Kumba
Usually when NTP is not able to to keep the clocks synchronized it means the BIOS clock has a signifigant clock drift. Are these servers overclocked? It's very common when they are. Otherwise it's usually a BIOS issue. If you try a different motherboard/BIOS you might not have this issue.

Re: 7.0.4 server with ongoing time sync issues - Now on 8.0.

PostPosted: Fri Oct 06, 2017 3:37 pm
by thirdhatch
Kumba,

We upgraded to 8.0.1 and moved to different hardware and are still seeing this issue. It is happening less frequently, but still happening. These servers are not overclocked. They are pretty run of the mill IBM Blade Servers. The last server was an HS22 with 2X 6C XEON and 48GB of memory and SSD hard drives. We stepped down to an HS21 with 2X 4C XEON and 8GB of memory and SSD hard drives on the new server. It worked no problem for a few days, then this happened again today.

I will try a BIOS update on this server tonight, but we already did that on the HS22 and it made no difference. What I am seeing is on the report page, the top line of the server is out of sync with the database time and the php time. I assume this is the apache time, but when I check the clock (date command from ssh) it agrees with the php and db time, not the apache time. Restarting apache does not fix the issue.

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Sat Oct 07, 2017 3:19 pm
by thirdhatch
Is it possible that this is a result of introducing SSD drives in to the environment? We were using 10k SAS drives previous to this and never had time sync errors. But since we started using SATA SSD drives this seems to have come up an awful lot.

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Mon Oct 09, 2017 8:58 pm
by alo
When you say the Time sync issue are you just talking about the screen saying time sync and the reports page going red?
we had this happening a lot on vicibox 7. how do you recover from the Time sync? Restarting vicidial? Just trying to see if its the same problem so I can help.

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Tue Oct 10, 2017 11:51 am
by Kumba
Time slippage is usually a result of virtualization or an unstable system clock. If you are using Virtualization then there really is no fix other then to decrease your load until your guest quits losing time. In the second case this is a hardware issue and there isn't much that can be done about it short of trying different hardware until the problem goes away.

Does the time slip on a server with no clients on it? Or is it constant regardless of whether there are clients on the server or not? I'd try manually running the 'date' command on all servers in order to figure out where the slip is. It could be database, dialer, or web server. If this is an all in one or an 'express' install, then that shouldn't apply so likely you have a hardware solution or Asterisk is getting overloaded and not reporting back in. If the date command always shows in-sync within a few seconds then the error is being caused by something else.

The last time we ran into a client that had a hardware issue with time slipping was a customer using an overclocked gamer board. This doesn't sound like it applies to you, but we have ViciBox v.8 running in our lab and also on a dozen servers without any time issues.

The other thing is that unless the 'date' command shows an actual time difference at the Linux CLI (not the vicidial web interface) or more then 3 seconds, then you do not have an actual NTP/Time issue. You have a software issue somewhere instead. The time synchronization error is somewhat generic in vicidial and doesn't always mean it's an actual clock issue.

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Thu Oct 12, 2017 3:54 pm
by thirdhatch
Kumba, we are not using virtualization. The servers are not overclocked. I'm not even sure I could if I wanted to, since these are official IBM blades. The is an express install. The only thing special about it is we use webrtc for the pbxwebphone.

It is happening much less frequently than before since I changed hardware. But we did a fresh install on this installation and just restored the database and audio files. So, the software shouldn't have an issue, right?

I have checked the DATE command from CLI and it matched the DB and PHP time, which both always stay in sync.

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Fri Oct 13, 2017 9:29 am
by Kumba
I don't know what the cause is but it's something localized to your install or environment.

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Fri Oct 13, 2017 11:20 am
by chornyi_taras
thirdhatch wrote:Kumba, we are not using virtualization. The servers are not overclocked. I'm not even sure I could if I wanted to, since these are official IBM blades. The is an express install. The only thing special about it is we use webrtc for the pbxwebphone.

It is happening much less frequently than before since I changed hardware. But we did a fresh install on this installation and just restored the database and audio files. So, the software shouldn't have an issue, right?

I have checked the DATE command from CLI and it matched the DB and PHP time, which both always stay in sync.

Can you confirm that asterisk was not crashed (eg asterisk process has same pid )

Re: 7.0.4 server with ongoing time sync issues - also on 8.0

PostPosted: Fri Oct 13, 2017 12:24 pm
by williamconley
thirdhatch wrote:Kumba, we are not using virtualization. The servers are not overclocked. I'm not even sure I could if I wanted to, since these are official IBM blades. The is an express install. The only thing special about it is we use webrtc for the pbxwebphone.

It is happening much less frequently than before since I changed hardware. But we did a fresh install on this installation and just restored the database and audio files. So, the software shouldn't have an issue, right?

I have checked the DATE command from CLI and it matched the DB and PHP time, which both always stay in sync.


Time sync is rarely from "time" problems (especially when time is not showing "off" between servers or packages, you've done your due diligence there).

Especially since you have experienced it on two different installs ... I'd say you have a dropped packet scenario.

Usually it's from dropped packets to/from the agent workstation's web browsers. Are you dropping packets? Those "every second" packets from the agent browser update a field. If those are dropped (or if the page is customized so they are not sent out properly!), they don't update the field. The process that checks that field then decides that the update time being off for "x" seconds is too much, time must be "out of sync". But it's not actually out of sync: it's just not updated recently enough. The system that reports time sync error has no way of knowing which problem it is (time vs no update) and reports time sync error instead of "time sync or not updating" which I think would be a better generic message. 8-)

There has been a lot of discussion on this topic.

However: As a test, I would recommend putting a regular SIP Soft Phone on one workstation and see if that workstation is then exempt from the failure. Perhaps WebRTC is interfering with the other web traffic.

Try to narrow it down to a specific workstation or group of workstations to isolate any specific hardware involved (such as a switch in a server closet that may run a group of agents that don't actually sit near each other). This problem can be tricky to track down, but eventually it can be resolved 100%.