2600hz Expert Q&A Recap
Whew, the first part of this year has really gone quickly. We want to take some time out to reflect on the stuff we’ve discussed over the last few weeks and to talk about where the Expert Q&A is going from here.
The first Q&A session we did was on Virtualization. This is a huge topic, but we focused on Virtual Machine Timing because that has such a big impact on communications applications. Lots of great stuff in this Q&A featuring our friends from Voxeo and PSSC Labs.
The second Q&A session we did was on Faxing. Again, another huge topic but we decided to go after Faxing because it impacts almost all of our clients. What was really interesting to me about this was the idea that the delimiter in faxing is silence, as opposed to most IP protocols where signaling comes from packets. It’s very clever and I encourage you to dive into this presentation, it’s Jam Packed!
The third Q&A session we did was on Provisioning. We think the industry can do better when it comes to provisioning. There isn’t one provisioner that handles all handsets, but there should be, and that’s what we’re working on at 2600hz. This Q&A presentation explains why this is such a hard thing to do and how our team is solving this problem. This presentation features Andrew Nagy of Provisioner.net fame!
The third Q&A session we did was on Database. Could we have picked a more massive topic? Maybe, but we decided to focus critically on Layer 1 Failures and Network Partitions, which are some of the biggest issues operationally for distributed systems. Featuring Sam Bisbee from Cloudant, we dive headfirst into a TON of content on Databases. This one is not to be missed, it’s a great deck and a ton of awesome commentary. (The image at the top of this email is about Zombie, Flapping Datacenters, which have to be removed from the cluster).
Where to now?
First up we’ve got two more Awesome Q&A sessions scheduled. The one Next Friday, the 14th of June is on Session Border Controllers! If you can’t tell the difference between an SBC and a Firewall, this is the presentation for you.
Two weeks after that we’re covering DTMF! Ever wonder where those tones on your phone came from? We’re going to give you the answers.
If you want to resell 2600hz, we have two great trainings coming up. The first one is this Friday, the 7th of June and the second one is two weeks later on Friday, the 21st of June.
Last but not least, our sales team just got a new shiny 800 #. Ring them at:
Thanks, and we look forward to bringing you tons of new content. Be on the lookout for an announcement of our API Trainings which should be starting later this month or early next!
How Squirrels break Datacenters and other Database Conjectures
This Q&A presentation was influenced by Kyle Kingsbury’s work on Jepsen, an exploration of modern databases. If you haven’t seen his work and you like this stuff, you should go check it out. It’s awesome.
We just did our Epic Database Expert Q&A featuring Sam Bisbee of Cloudant and Darren Schreiber of 2600hz. We covered a range of topics but focused on these three kinds of failures:
- Network Partitions
- Layer 1 Disasters
- Flapping Internet (Special Class of Network Partitions)
All public networks are unreliable; such is the reality of modern distributed database management (and let’s face it, because of AWS we’re all managing distributed databases, whether we like it or not). Sometimes, when these unreliable networks break down, a partition can form. These partitions, depending on your database configuration, can wreak havoc across a wide gamut of scenarios.
Arguably, most of what a database admin does is prepare for network partitions and how to resolve them.
-Joshua Goldbard, 2600hz
Yes, modern databases run fairly well when they’re not in a failure state, but, frankly, the only thing that matters is the failure state. During a partition, it’s important to understand your database behavior, which can vary wildly. At 2600hz, we leverage BigCouch which is a Master-Master replication strategy with Dynamo Quorum(PPT LINK). What that means in plain English is that every node is a master node and it uses consistent hashing to redistribute the load in the event of a partition.
The best advice we can give here is to know the failure modes and behavior of your database and understand the partition realities of the software.
-Darren Schreiber, 2600hz
Layer 1 Disasters
Hurricanes, Earthquakes or Squirrels? Squirrels eating glass. Squirrels caught in HVAC units. Squirrels tampering with Power lines. All of these are examples of Layer 1 Disasters, but we only think about the really massive outages, not the unexpected ones that effect critical infrastructure.
Darren, the 2600hz CEO, has a lot of experience managing Datacenters. Here’s a quick story from back-in-the-day about managing racks in a DC:
Once upon a time a Datacenter vendor decided to give my company a couple of months notice that they were going to 10x our rates. They assumed we couldn’t migrate out of that Datacenter easily, and they were right. Because we were cheap, we did everything ourselves, which meant loading the racks into a pickup truck by hand that we drove in the rain to another Datacenter. Not my definition of Fun.
-Darren Schreiber, 2600hz
Contrast that with our experience during Sandy, when we were using BigCouch:
On the day before the storm, we just turned off the Datacenter. That was it.
We can evade storms, earthquakes and Squirrels because of Cloudant.
-Darren Schreiber, 2600hz
If a Datacenter gets into a Layer 1 issue, we just kill it and move on. When the disaster is mitigated we bring the service back up, but losing an entire DC (or even multiiple DCs) is not an issue because of our database choice.
Protip: If you can’t predict disasters, have a plan to avoid them.
It is up or is it down? Flapping internet is a special case of the Network Partition. Basically, a flapping connection is one that goes down, then comes up, then goes down; this is actually worse than a server going hard down because of the reconciliation process that happens when the networks reconnect. We’ve got one answer for this and one only: Zombie Servers get Double Tapped.
Basically, if a DataCenter is flapping, it’s better to just disconnect that datacenter manually until it can be confirmed as restored. There’s no easy way to say this, Flapping is one of those scenarios that requires manual intervention. If the DC is flapping you have to take it out or you may never get back online.
Protip: If it flaps, Double Tap.
Darren chose to use the last few minutes to pontificate about how ridiculous life was before BigCouch. There was a point very early on where we simply could not get BigCouch to work and we thought we might have to fold the company. Thanks to incredible support from the Cloudant team, we got everything working and the rest is history.
It’s night and day. We just don’t spend any time on the database anymore… We just don’t have problems with the Software.
-Darren Schreiber, 2600hz
Sam chose to talk about right reliability, specifically in the way in which other systems buffer writes and respond to concerns about right availability.
There are a lot of other databases out there that will reply “Write confirmed” when you buffer the write, NOT when it actually commits to disk. The practical effect of this is that if the disk dies before the write moves out of the buffer, you’re missing writes, which is death to a database.
Durable databases write to disk and confirm, they don’t just buffer. Databases that buffer can be very dangerous depending on the workload.
-Sam Bisbee, Cloudant
We had a blast doing this presentation with Cloudant and we can’t wait for the next Q&A Session on Border Controllers in two weeks. Click here to join us!
Two weeks after that, we’re going to discuss DTMF and how all of that nonsense works in VoIP. Register for free here:
Lastly, if you’d like to talk to our friends at Cloudant, check them out at Cloudant.com or in IRC on Freenode in channel #Cloudant.
Thanks so much for checking out our Q&A. If this all sounds like it’s too much work, you should call our Sales team at 8554642600 or email@example.com. We power some of the biggest infrastructures on the planet and we’d love to talk about how we can help your business eliminate the pains of operating communications infrastructure :).
When Darren isn’t busy working on stuff in the guts of the world’s biggest Telecom Infrastructures, he’s helping to write books about FreeSWITCH with the epic FreeSWITCH team. Their latest work is available now!
You can buy the book here: http://www.packtpub.com/freeswitch-1-2/book
Learn more about FreeSWITCH by checking out their site: http://freeswitch.org
Thanks for letting us be a part of such an awesome open-source project!! This is the culmination of a lot of very hard work from our friends Michael, Raymond and Anthony, without whom this project would be impossible.
The much requested deck by James Aimonetti from Kamailio World earlier this year (Epic Powerpoint)
Every now and again we let our engineers out of the office to wreak havoc on the conference circuit. James gave an awesome presentation at Kamailio World earlier this year and we’ve had a bunch of requests for the deck.
SO, without further adieu, I’d like to present “Kazoo” by James Aimonetti, presented earlier this year at Kamailio World (THANKS FOR THE INVITE DANIEL!!!)
2600hz Kazoo Kamailio Integration Deck from Kamailio World by James Aimonetti
Do you like what you’re reading here? Is this the kind of problem you’re trying to tackle? Send us an email at firstname.lastname@example.org to talk about how we can help you scale, or if you’d like to solve these problems, send us an email at email@example.com. We’re looking for engineers, hackers, and people that are really passionate about communications. What are you waiting for, send us an email!!
Provisioning Sucks (And here’s what 2600hz is doing about it)
This is the recap of our Provisioning Q&A session featuring Andrew Nagy of Provisioner.net and Francis Genet, lead engineer on the 2600hz Provisioning project. If you dig this kind of stuff you should check out our next Q&A on Database Management here.
If you have ever dealt with phones, chances are you hate provisioning. If you do it manually, it is exceedingly tedious. If you do it automatically, it can be disastrous. Many organizations opt for a homogenous equipment mix, supporting only one Manufacturer with a proprietary provisioning solution because it works and that’s good enough.
Here at 2600hz, almost all of our clients run heterogeneous infrastructures, which means we have to handle all different manufacturers so we couldn’t use the proprietary solution. Second, we work with a lot of handsets and we realized pretty early on that manual provisioning wouldn’t work for us. So we did what any self-respecting group of telecom engineers would do: we built our own provisioner! And, since we’re awesome open-source citizens, we’ve made the code publicly available HERE too! Let’s take a look at the work we’ve done and why we’re doing it:
On the shoulders of Giants
It’s worth mentioning that we are hardly the only organization to wrestle with the realities of provisioning. Our work is based on Andrew Nagy’s Provisioner.net and, to quote Isaac Newton or Linus Torvalds, depending on who you ask, “If we have seen further, it is because we stand on the shoulders of Giants”. Before we start diving into what we consider the state-of-the-art, we’d like to acknowledge the great work our predecessors have done in bringing us to the point where what we’d like to achieve is possible. Alright, let’s dive in!
Why is this hard?
(Quick note: Cisco Handsets take up to 2.1 hours per phone to provision. That’s why having auto-provisioning is so important. Source)
It’s actually not that hard to provision a single phone, or even 100 phones. Hell, it’s not that hard to provision 1000 phones if they’re at the same site and the same manufacturer. See, routers have this awesome option called DHCP Option66 which lets you point phones en masse towards a provisioning server. All of the devices that connect to the router will receive a URL in a packet header that points the phone towards the config files. This is how the process works, but it’s worth diving into how this works over the WAN in a little more detail. Let’s lay out the process for setting up a handset over a Wide Area Network:
- Phone arrives brand new from factory
- Phone has Provisioning URL added to the on-Device GUI <—- This is DHCP66
- Provisioning server creates a provisioning profile for the handset containing all of the configuration files (MAC Address used for identification)
- The Phone is attached to the corporate network and attempts to connect to the provisioning URL in the GUI
- The provisioning server recognizes the MAC ID of the handset and sends the corresponding configuration files after authenticating the phone
- The phone receives the firmware and if this is a secure environment, performs a checksum on the configuration files to make sure they match
- If everything is Kosher, the phone will begin the update process. Once complete it will enter service.
- Every few minutes (days) or when the phone powers on, it will repeat this process starting at step 4
This process has to work every time for every handset. Now, one would think that after the 150 years of telecom that we’ve had, there might be some standardization between vendors but that’s certainly not the case with respect to provisioning. Every manufacturer has a different way crafting their provisioning files, even down to the number required to boot a phone or even the names of the files. It’s enough to drive a developer batty, but this is what we have to work with in Telecom. Seriously go look at the Polycom firmware grid; it’s like a forest of incompatible firmwares.
The Polycom Nightmare Grid
If you want to present your users with simple-to-consume services, you must first conceal complexity. That’s a recurring theme in all the work we do at 2600hz, but it’s perhaps no more true than what we’re doing with respect to provisioning.
What are we doing about it?
At 2600hz, we believe in presenting simple interfaces for complex services. When we think of provisioning, we want our clients to experience a service that “just works”. We don’t allow folks to see firmware file names because we know what works with our servers. Power users can get this functionality back with trivial difficulty, but for the majority of use cases, the default settings are perfect. Here’s what our provisioning interface looks like in our GUI:
You’ll notice that we request a user to select a make and model of their phone, a name and a MAC address. The only piece of really specialized information is the MAC address; everything else is immediately obvious to the user. But provisioning the handset doesn’t govern how the handset might interact with the network. That’s why we’ve included some extra tools to take the experience just a bit further. Like take segregating Voice and Data traffic without physically separate ports. That’s hard to do without VLAN tags but who wants to manually go into each phone to program a VLAN? That’s complicated, and remember, 2600hz is all about hiding complexity:
Here you’ll see a place to enter a VLAN tag. It really is that simple to push VLANs to all of your clients equipment.
How do we hide all this complexity?
When you check new boxes in the management interface for provisioning your handset, we make on the fly changes to the provisioning file for that phone. If you want to have a Yealink T-22 change from 1 line to 2 lines, you can execute that change NOT with a site visit to your client, but with the click of a mouse. This dramatically reduces labor and wasted time in client site visits by eliminating unnecessary troubleshooting.
2600hz has built an awesome suite of provisioning tools for our clients to use in managing their systems. Provisioning is hard because hardware manufacturers make it hard, but that’s why there’s an opportunity for us to innovate in the first place. By concealing complexity from our clients, we make things run smoother and in a much more controllable fashion.
See our Powerpoint here:
Do provisioning servers make you feel weak in the knees? Does the prospect of reading SIP Packets for a living intrigue you? You might have a future working with 2600hz. If this is interesting, shoot us an email at firstname.lastname@example.org and we’ll chat :D.
Expert Q&A Faxing Edition
We wrapped up an excellent Q&A this morning on the subject of faxing. Yes, the Darth Vader of VoIP has been slain. We broke down a ton of helpful information in our attached powerpoint and hopefully we were able to debunk a lot of myths.
If you missed the event, here are some great quotes:
Silence is the signaling protocol for Faxing
-Darren Schreiber, CEO, 2600hz
Fax machines were designed to deal with noise, not dropouts. IP is designed to deal with dropouts but not noise. The secret to doing faxing at scale is finding the happy medium.
-Joshua Goldbard, VP of Marketing, 2600hz
We also had some great “Pro Tips” for folks doing faxing at scale:
- Turn up the Jitter Buffer
- Turn off the adaptive buffer (Variable timing is death for faxing)
- Turn off Echo Cancellation (Remember silence is the delimiter)
- Reduce Confounding Factors (If it doesn’t help, eliminate it so it can’t break later)
Finally, one piece of the discussion I wanted to pull out was the talk on codec negotiation. Some carriers will ask folks to start calls as G729 and then change to T.38 midway through the call. The reason a carrier does this is because running calls over expensive gear is expensive. Ideally, a carrier will only run the calls that require expensive gear over the expensive gear and everything else should run over the cheap gear, but the one of the major features of the expensive gear is T.38 tolerance. What this means is that you have to first send a signal to your carrier to get onto the expensive switch (starting the call in G729) and then convert the call to T.38 afterwards.
THE REASON CALLS WITH NAKED T.38 FAIL is because many carriers are not setup to route T.38 media unless you tell them it goes on the expensive gear. Therefore, you have to signal G729 first to get on the expensive gear before you can begin a fax transmission. This is a huge pain, but it exists to lower the costs of carrier operations.
Last note: Asterisk and Freeswitch both use spandsp so if someone tells you it works better on one platform over the other, they’re probably commenting about the network and not the switch
Don’t forget our provisioning Q&A, two fridays from today. Here’s the link:
Thanks and see you soon!
Visualizing a Cyber Attack on a VoIP server. Really cool visualization!
The best part is that the honeypot (the server setup by the security researchers) is not even broadcast to the public internet as a VoIP server. Imagine the scale of attacks advertised servers receive.
2600hz Virtualization Expert Q&A Recap
2600hz hosted a panel this morning on Virtualization in Communications applications. The panel featured expert commentary on OS timing, Lost Ticks, Virtualized hardware demands and best practices in the Telecom industry. The panelists were:
- Darren Schreiber, Co-Founder, 2600hz
- Adam Kalsey, Product Manager, VoxeoLabs
- Chris Spearman, Big Data Systems Architect, PSSC Labs
Here are some of the highlights from the event:
Virtualization for virtualization’s sake is not the right approach. We don’t build database without reason, and even though it’s sexy to do cloud infrastructure, there has to be a business case driving adoption.
-Joshua Goldbard, Moderator, 2600hz
We see degraded performance on virtual hosts operating near capacity. In our experience, running virtual hosts at 25-30% of capacity will allow them to perform at a similar level to physical boxes.
-Adam Kalsey, Panelist, VoxeoLabs
For 2600hz, virtualization is about control. It’s impossible for me to provision physical boxes at a remote datacenter via API or with sufficient urgency, but the elasticity of the cloud means response times are consequently lower. Virtualized infrastructure can be much more responsive.
-Darren Schreiber, Panelist, 2600hz
VIRTUALIZE ALL THE THINGS!!
-Chris Spearman, Panelist, PSSC Labs
Honestly, I think virtualization makes a lot of sense. I run virtualization on everything because of the backup and retention benefits. Recovering a physical box is not a trivial task, but virtualized instances are easy to maintain and restore.
-Chris Spearman, Panelist, PSSC Labs
We had a sprited talk about Virtualization and I hope everyone had a blast. We’ll be doing this again in two weeks at our next Expert Q&A on Faxing.
Join us here: http://2600hzqa2.eventbrite.com/
Thanks to our awesome panelists and I look forward to talking to all of our attendees in the future.
-2600hz Training Team
Do you run a VoIP Company? Did you enjoy the training? We’d love to hear your feedback about the event! Email email@example.com with your questions or comments. If you’re interested in speaking on an upcoming panel drop us a line! Once again just email firstname.lastname@example.org and we’ll get you sorted.
Ivar walked into our office on February 22nd, 2012 and I’ll remember the meeting for a long time. I was sitting there with Patrick and Ivar was telling us about his plans and all of the different things he was going to enable. I have to admit I was skeptical, as many companies have tried to enter the space OnRelay plays in and most of them have failed miserably. Mobile phone communications is really hard and I think everyone has seen a company fail in the past. But then, Ivar took out his phone and showed me the product.
From that day on, I knew we were going to become partners and I’m tremendously happy to announce the introduction of the OnRelay Mobile Office Phone System, powered by 2600hz. Starting at $9.99 per user per month, OnRelay is offering a robust communications platform that’s truly mobile.
It’s an exciting time to work in communications and we’re honored to have been selected as one of OnRelay’s infrastructure partners. The Kazoo cloud infrastructure supports OnRelay’s business and provides the provisioning and communications technology that really reduced their time to market.
We look forward to watching OnRelay grow and we anticipate many great things from our partner. Keep an eye on this one, we think it’ll be one to watch in the coming years.
-Joshua Goldbard, VP of Marketing, 2600hz
Voice and Video are Dead. Here’s the future:
We at 2600hz believe that the debate over voice and video has been going on for far too long. People have been dwelling in the past and are conditioned to believe that they have to use mediums like voice and video to communicate. Here at 2600hz we look towards the future. The time has come when desk phones, cell phones and Web RTC are no longer relevant. The time when people communicate how they were always supposed to is upon us and it is… TELEPATHY!
Telepathy (from the ancient Greek, tele meaning “distant” and pathe or patheia meaning “feeling, perception, passion, affliction, experience”) is the transmission of information from one person to another without using any of our known sensory channels or physical interaction.
2600hz has been putting in a lot of research into understanding how the brains can communicate via a direct connect. Some people call Telepathy the Bluetooth for your craniums and we couldn’t agree more. Our engineers have worked diligently to communicate with thoughts vs. words or gestures. We have made some huge strides with this technology and look forward to releasing it to the world. We call this project BrainRTC™.
We hope you join us on this adventure of truly revolutionizing communications. Welcome to 26000hz, the “Future of Cloud Telepathy!”