HAproxy and DNS in the cloud

HAproxy is a great tool that we all know and love.(Well, in case you don’t…go here!).
It is, however, surprising how, even basic features, are not default.
In particular, today I stumbled upon the configuration for HAproxy for dynamic DNS resolution.
Continue reading “HAproxy and DNS in the cloud”

Advertisements

A “DevOps team” work organization (I)

We often hear about what DevOps is or what are the tools to achieve DevOps in your organization (whatever it means), we know of Terraform and Cloudformation but we rarely see a definition of the principles behind the work organization of our teams.
At Curve, I was exactly hired to create and structure the SRE/DevOps team of the company. In this article, instead of the usual technical deep dive, I’d like to share some of the inspiring principles of the DevOps culture and of how they were adapted to define the work in a startup.

Continue reading “A “DevOps team” work organization (I)”

A “DevOps team” work organization (II)

This is the second part of an article about the work organization of my DevOps team. You can find the first part here

  1. Small batches of work
    Without entering the rabbit hole of the Toyota production system and the theory of the value stream, I remember how most of the IT professionals I’ve worked with have had an “in-house big side project”: a large refactoring of that part of the infrastructure or an update of all the operating systems in that part of the company network. It’s something they’re doing every day for a small percentage of their time. It’s hidden work as well, as usual, their managers don’t know about it. It cannot be measured, no one knows if/when it will be finished and ends up providing little value to both the company and the employee that gets no reward for the job done.  Continue reading “A “DevOps team” work organization (II)”

Postgres’s invisible data or the curious case of the intangible length

A few days ago at Curve, our developers had some problems dealing with data coming from our database and they asked for help. Apparently, a query that was working in dev (TM), did not work as expected in production.

Performing a sum on a certain set of rows was succeeding, whether a simple select was mysteriously failing. In fact, it was supposed to return an integer such as  “3” of length “1” was actually returning a solid integer “3” and length (“7“). Strange, isn’t it?

Continue reading “Postgres’s invisible data or the curious case of the intangible length”

Site Reliability Engineer – In Search of a Unicorn

At Curve, we’re rolling on the “Great Fintech Adventure”™  of revolutionizing the way in which you spend and manage your money. At its very core, the company is a blend of Finance and Engineering.  Two disciplines that get together to deliver at your doorstep the Curve card that you know and love.

Engineering is working hard these days to support and enable the organic growth of the team and, as part of our process, we’re constantly hiring new players that can help us go the extra mile and build amazing stuff!

We have openings for a lot of positions at the moment (btw, why don’t you join us ? https://www.imaginecurve.com/hiring) but among them the hardest to find so far has proven to be the mythological figure of the “Site Reliability Engineer” also known, in the wild, as “DevOps or Cloud Engineer”.

So…What are they?

They are a peculiar breed of software developers; usually, highly motivated individuals not scared by the complexity of code or by the configuration nuances of an operating system; they are, instead, attracted by a blurring line between development and operations with strong fundamentals in both the worlds. Ideally, they should be equally comfortable debugging the interactions of a Docker container and writing a piece of code that automates a manual task.

 

Has this not always been the case? What’s so special about this?

This role can exist only in a world revolutionized, a few years ago, by cloud computing. A world where new technologies are launched daily, legacy is almost non-existent (and regulations and compliance are still not well defined). Cloud computing enables a company or even an individual to quickly rent a shared pool of virtual computing resources and scale them on-demand depending on the workload. It means that a company starting today does not need to buy upfront any expensive physical infrastructure, but it can “rent,” for a ridiculous portion of the price, resources on a public data center and scale them elastically accordingly to its need. This enables business models that were unthinkable only five years ago – every startup on the planet at the moment is trying to use this opportunity. But deploying on the cloud and scaling virtual resources is a complicated problem to master, it requires a person with a unique blend of skills.

 

So…DevOps ? SRE ? Cloud engineer? Greengrocers ? 🙂

In theory, DevOps is a larger movement that encompasses both a culture and a role, strengthening communications between the development and operations team and trying to automate the delivery process to be as fast as possible. Site Reliability Engineering instead is a specialization of DevOps that was defined in a famous book written by Google [2] and can be synthesized as “what happens when you ask a software engineer to design an operations function.” It focuses on designing and coding production system that respects their SLAs, however, obtaining this by sharing the same ideas and techniques of the DevOps movement. Truth be told, in a startup, such as Curve, the difference between these roles is so blurry, to be almost non-existent; nonetheless, we believe it is important to start defining the right culture and practices from the beginning. SRE felt like the obvious choice in an industry where the reliability of the product is of core importance, and there are heavy compliance regulations.

 

Nice, but what exactly are is the SRE team doing at Curve and why is it actually SRE?

 

At Curve, we are a very small team but we are involved in the design, and the scalability of every feature developed. We are not working, hidden, in the background: we are doing distributed systems engineering every day. We are doing operations, making sure our containers run on updated machines and operating systems. We are doing development by writing functions that work with our firewalls or provide insights and monitors the usage of the card and the reliability of the system and, if needed, notify our super valuable Business Operations team.  We are doing Site Reliability Engineering by looking and defining the SLOs of our systems with the developers that code them, and we’re managing them together. But we’re also cautious about security and compliance, ensuring that all the compliance and regulations requirements are indeed taken into account. We are alive, growing and kicking!

 

Ok, great, so why is it hard to find someone?

This, in fact, is a surprisingly complicated question, but in my view, it happens for many reasons some of them organic to software engineering others due to market and education:

 

  1. Blurring the lines – Historically the worlds of development and operations have always been separated “by a fence”: people with different skillsets and mindsets were managing different portions of the same product. They have always been focused on conflicting goals: creating features in the shortest possible time vs. keeping the system as stable as possible”. Requiring a change in this way of working is far from being easily understandable even by experienced individuals.
  2. Breadth over depth – In a world, where “breadth over depth” is key, it becomes very hard to find professionals whose depth is not too small and are simply unfocused and jumping from one thing to the next.
  3. Mindset – A large number of people very experienced in operations and now adapting to cloud environments are adopting these new technologies “as tools” without realizing the implications that they bring. They are changing their skillset instead of changing their mindset; “Learning how to use Terraform without understanding what the potential of Infrastructure as Code is and how it may help the developers to be faster at creating their features instead of only being used to manage the machines.” Still thinking that their role is to “operationalize the product” instead of being involved in designing it.
  4. Taking the plunge – Experienced  “IT pros” traditionally used to “isolated” systems management are professionally scared by learning how to code and how to work with developers, Agile and the company.  (If I had a penny for all the people that said: “ I do operations and use Python, but I’m not a developer”)  
  5. Universities – The shift towards cloud computing has been massive but occurred in a short number of years, and the universities are struggling in preparing experts in the field. There are only a handful universities in the UK offering a dedicated cloud computing module (among them City, is doing a good job at it [4]). As a result, most experts in the field are self-taught.  Junior DevOps engineers are having a hard time deciding which path to follow to become a recognized expert. There are only a handful certifications coming from different vendors but none of them is actually trying to teach or verify anything cross-platform.
  6. Money – Money – Money – A shortage of professionals in this field has increased the competition and thus the expense necessary for a company to acquire skilled workers; in a market where startups are no game for the bigger players.

So, what are you looking for, in the end?

The SRE team is already growing, but we’re always looking for someone that is ready to analyze, plan and maintain production systems as they scale in capacity and complexity. Someone that will refuse to do routine administration BUT will engineer an automated solution! Someone that will help the developers in defining the scalability requirements for a feature.  Are you interested in it? Does it ring a bell? Come and join us:  https://www.imaginecurve.com/hiring/sre

 

 

REFERENCES:

  1. https://www.usenix.org/publications/login/june15/hiring-site-reliability-engineers
  2. https://landing.google.com/sre/interview/ben-treynor.html
  3. https://www.docker.com/what-docker
  4. http://www.city.ac.uk/courses/postgraduate/software-engineering
  5. https://www.terraform.io/
  6. https://itrevolution.com/book/the-phoenix-project/

How to connect to a EC2 instance using Powershell

Hi guys, I don’t exactly know why but apparently there are no articles out there, with a good step by step guide to connect from your local pc to a Windows Server 2012 R2 instance hosted on Amazon AWS on EC2, this short article aims to fill this gap:

ASSUMPTIONS :

  • This article assumes some knowledge of AWS, the EC2 service and of Windows Server 2012 but nothing is complicated and I’ve added many links with large documentation
  • The Powershell comunication uses on the WinRM protocol, therefore  it needs a specific port reachable 5985 TCP  on the server (be advised, the default transport protocol will be the insecure HTTP).
  • The WinRM service is enable by default on WIN2012 R2 but the default Windows Firewall configuration is to allow connections on the 5985 port only from the same subnet of the machine, therefore we need to login to the machine using RDP an modify the default configuration of that firewall rule.
  • If your pc (the client !) is not part of the domain of the remote server you need to add the remote server into the list of your trusted hosts on YOUR pc (covered below).

GUIDE :

  1.  Deploy the VM with Windows Server 2012 ( docs )
  2. Modify the security group of the instance, adding a rule to open the port 5985 TCP from your IP/ or from anywhere ( docs )
  3. Wait a few minutes for the machine to boot up completely and then connect to it using the Remote Desktop protocol (RDP), aka the usual way to connect to win istance on EC2 ( docs )
  4. Modify the Windows firewall configuration to allow incoming connections to the port 5985 from any ip (or as you please 🙂 ), to do so you can : Control Panel -> Windows Firewall -> Advanced Settings -> Inbound Rules -> “Windows Remote Management (HTTP-In)” where the profile is PUBLIC (make sure to choose the right one !) -> Properties -> Scope -> Remote IP Addresses -> Any IP Address (or know better ! )

    OR

    Use this simple Powershell command:
    Set-NetFirewallRule -Name “WINRM-HTTP-In-TCP-PUBLIC” -RemoteAddress “Any”

  5. Reboot the Windows Firewall service ( don’t ask me why, but sometimes the rules are not picked up until a reboot of the service, I’ve witnessed that myself ) (docs)
  6. Then make sure that the WinRM protocol is working correctly on the server machine running this comand in a shell (not really needed, just to make sure it works)  Enable-PSRemoting –force

  7. Then move to your local machine and make sure the WinRM service is working here as well, in a privileged shell:
    Start-Service -Name Winrm
  8. Then add the remote host as a trusted host, running this command into a privileged shell :
    SetItem WSMan:\localhost\Client\TrustedHosts Value “XXXXXXXXX.eu-west-1.compute.amazonaws.com”
    or, possibly smarter maybe not super secure, use a wildcard :
    SetItem WSMan:\localhost\Client\TrustedHosts Value “*”

  9. Then connect to the remote machine using one of the various options provided by powershell such as :
    Enter-PSSession -ComputerName “XXXXXXX.eu-west-1.compute.amazonaws.com” -Credential $(Get-Credential)
    Inserting the login credentials of the remote machine when requested to ( docs )

NOTES: 

The WinRM service can be configured server side to use the more secure HTTPS on port 5986 or a COMPATIBLITY MODE running on port 80, used usually for firewall related issues ( docs).

Older versions of Windows have different requirements to set up Powershell Remoting / WinRM ( docs ).

And, obviously, this guide is generazible to many other IaaS services (Azure, Digial Ocean).

Hope this helps somebody !

ciAO!

how to solve (some) graphical issues with putty, UTF8, and ncurses

Hello everybody,

I’m writing this article to help all those people that may have had problems with text garbled, mismatched or other kind of graphical issues with all those software that uses the famous ncurses libraries (libncurses5). It all started when I was using (via puTTY) my favorite command line log parsing tool: the great  multitail (go out there and take it if you don’t know it) I started noticing some odd errors: part of the text was garbled, some of the lines were wrong in size or were substituted by wrong characters, as you can see in the screenshot:

screenshot
multitail in a centOS environment

This problem happened when using puTTY on a CentOS 6.6 system, with locale set on UTF-8,  libncurses version 5.x and multitail at 6.4.1

This is the result of multiple problems and some steps are required to fix all the issues :

  1.  Download the latest version of puTTY (0.64 as of today)
  2. Make sure that under Windows -> Translation  and Connection -> Data you have everything as in the images :
    Remote character set: UTF-8 and "use Unicode line drawing code points"
    Remote character set: UTF-8 and “use Unicode line drawing code points”

    Terminal-type string: putty
    Terminal-type string: putty
  3. Then, you have to set an environment variable to tell the ncurses libraries to use UTF-8 :

export NCURSES_NO_UTF8_ACS=1

you should also make it stick (echo export NCURSES_NO_UTF8_ACS=1 >> ~/.bashrc )

This should solve all your issues with UTF-8 and the ncurses libraries.

Btw, this short guide wouldn’t have been possible without the help of Folkert…so thank you !

As usual, I hope this was useful to somebody.