Mock Firefox Geolocation with selenium webdriver

Geo-location is being considered a hygiene factor these days and is part of most of the websites. Pair it with a responsive web page design and faster wireless internet on devices like iPad, phones, tablets etc – the mix becomes too lucrative for any web designer to miss.

A designer’s adventures with visitor’s data come with a cost though and things become messy for an automation tester when he/she has to test such functionality for a website with a tool like selenium.

Imagine running selenium tests with Firefox browser on a CI pipeline. You were sure your browser detected the correct location when you wrote that automation script, but you cant be certain where the test will launch the browser in an CI/CD environment. Browser, in this case would pick up the current location(you would also need to disable/give consent for the share location dialogs) and run the test which may fail if your assertion is set to verify location you tested while writing the test.

The problem here can be solved by faking your browser’s geolocation altogether while in the test environments for these tests. Firefox provides an easy to automate approach for setting up things.

Note: This is applicable for firefox only and I am exploring ways of doing the same for other browsers using WebDriver.

Firefox’s under-the-hood setting of our interest

You can check your browser’s settings to verify where your geolocation is picked from by entering ‘about:config‘ in the address bar of Firefox. Agree to any warnings related to warranty (take that leap of faith, its worth exploring).

It opens up a plethora of configuration strings, the one you need can be searched – ‘geo.wifi.uri‘. It has the value – ‘‘ in my case.

Screen Shot 2017-03-22 at 8.13.01 AM

replace it with a string like  –

data:application/json,{“location”: {“lat”: , “lng”: }, “accuracy”: 100.0}

That’s it!! your browser now shows your location as you want it to show.

Doing this in WebDriver

Fortunately, Selenium WebDriver can help up achieve a similar result when launching a browser instance.

var profile = new FirefoxProfile();
profile.setPreference("geo.prompt.testing", true);
profile.setPreference("geo.prompt.testing.allow", true);
profile.setPreference("geo.enabled", true);
profile.setPreference("geo.wifi.uri", 'data:application/json,{"location": {"lat": <lat>, "lng": <long>}, "accuracy": 100.0}');
var browser = new SeleniumBrowser(new FirefoxDriver(profile));

The highlighted lines will add geo location abilities to the browser while the top two will take care of any prompts by consenting to share user’s location.

The browser object returned here can now be used for browsing the webpages and automating with your browser now behaving to be present at the desired location.

Happy Automating !

Mock Firefox Geolocation with selenium webdriver

Linux Kernel Modules – 101

Why Linux Kernel Modules?

It’s 2016, and Linux kernel is about to turn 25 in a few days from now. Why am I writing about a “Hello World” Linux kernel module now? There is tons of material, both written and audiovisual, available on this topic. The reason I can think of doing this is because I myself stumbled upon a lot of popular blogs and materials about kernels while learning about Linux. Popular, in our times, is what google search serves us on its first page. Whatever algorithm the search may be using, there is a fair chance of a particular keyword combination to make it to the reader’s eyes through google. So, I feel even if this article comes up against one search string and helps even one person learn, the purpose is served.

What are kernel and modules?

Linux kernel is an operating system kernel which acts as a bridge between the user and the computer hardware. It is the first program to be loaded into the memory when the system boots and takes care of other programs like reading/writing from devices or peripherals attached to the computer once its up. After the take off, kernel’s responsibility is to handle running of background processes, handling interrupts and system calls from other programs, handling devices etc.

The beauty of a Linux kernel is that it’s modular in nature. It can boot up with a bare minimum functionality itself and programs needed to handle additional functionality can be attached on-the-go to this kernel (device drivers are the best example). These attachable programs are known as Modules or Loadable Kernel Modules.

Let’s start

Linux kernel is available free of cost for download at The code is written in C language. One can compile the kernel with changed configuration to suit one’s requirements. As far as module development is concerned, the C language program needs header files to compile which are provided by the same kernel code.

Important tools and commands

There are some handy tools and commands you would need while writing the first kernel modules:

  • vi/nano/gedit editors ( I would be using vim)
  • Makefile and make command
  • Linux module commands:
    • lsmod – lists the current modules attached to the kernel
    • modinfo – lists information about the module
    • insmod – inserts/attaches a module to kernel
    • rmmod – removes a module
    • dmesg – prints kernel logs ( also can read /var/log/messages file for logs)

The C program

Create a new directory named “modules” in /tmp

/tmp$ mkdir module

change into this directory now

/tmp$ cd module

Create a new .c file by

/tmp/module$ vi Module1.c

This opens up the vi editor

Screen Shot 2016-07-19 at 4.03.52 PM

Use !wq to save the .c file

The header files module.h is required by all modules, kernel.h is required because we are using KERN_INFO (basically a log level used a parameter with printk, which is an equivalent of printf() in C ) and init.h is required for using the __init and __exit macros which are used by program to manage kernel memory.

The function “hello_init” uses no parameters but returns an integer, 0(zero) in case there are no errors, there might be a non zero value returned in case of any exceptions. Our program simply prints a message while loading and unloading so we can safely return a zero always.

The code is done, we need to compile the code to get a .ko file (kernel object file). Yes, its an object file, a binary file which is linked dynamically at the time of attaching to the kernel.

We now need to compile the code and this is not the ordinary C executable. We use Makefile for this purpose.

This is one example of Makefile you need to compile the code. Again use a vi editor to create a Makefile

/tmp/module$ vi Makefile

Screen Shot 2016-07-19 at 4.38.24 PM

The path I am using in this makefile is my the one specific for my system. You can get this path for your system by viewing the /usr/src folder. This is one way of writing a Makefile, there is another one as well


Screen Shot 2016-07-19 at 5.36.23 PM

Notice the difference between the paths used for compiling the code in make command

Once the Makefile is there, all that’s left is running a “make” command (the Makefile I used is the one in first example)

/tmp/module$ make

the output looks like:

/tmp/module$ make
make -C /usr/src/linux-headers-3.13.0-71-generic  SUBDIRS=/tmp/module modules
make[1]: Entering directory `/usr/src/linux-headers-3.13.0-71-generic'
  Building modules, stage 2.
  MODPOST 1 modules
make[1]: Leaving directory `/usr/src/linux-headers-3.13.0-71-generic'

You get a number of files generated now, what we need is the Module1.ko file. Run the command “modinfo Module1.ko”

/tmp/module$ modinfo Module1.ko
filename:       /tmp/module/Module1.ko
description:    A HelloWorld Module
author:         Hrmeet
license:        GPL
srcversion:     9C73E1D2DD70C36B10ED118
vermagic:       3.13.0-71-generic SMP mod_unload modversions

To get a list of modules that are currently loaded on the kernel, use the lsmod command

/tmp/module$ lsmod
Module                  Size  Used by
nfsv3                  39326  1
vboxsf                 43798  2
nfsd                  284385  2
auth_rpcgss            59338  1 nfsd
binfmt_misc            17468  1
nfs_acl                12837  2 nfsd,nfsv3
nfs                   236726  2 nfsv3
lockd                  93977  3 nfs,nfsd,nfsv3
sunrpc                289260  19 nfs,nfsd,auth_rpcgss,lockd,nfsv3,nfs_acl
fscache                63988  1 nfs
dm_crypt               23177  0
joydev                 17381  0
crct10dif_pclmul       14289  0
crc32_pclmul           13113  0
ghash_clmulni_intel    13216  0
video                  19476  0
serio_raw              13462  0

Time now to insert your first module to the kernel. Use insmod with a sudo.

/tmp/module$ sudo insmod Module1.ko

If everything goes fine, you wont see any errors or messages, run the lsmod command again to find what modules it lists now

/tmp/module$ lsmod
Module                  Size  Used by
Module1                12428  0
nfsv3                  39326  1
vboxsf                 43798  2
nfsd                  284385  2
auth_rpcgss            59338  1 nfsd
binfmt_misc            17468  1
nfs_acl                12837  2 nfsd,nfsv3
nfs                   236726  2 nfsv3
lockd                  93977  3 nfs,nfsd,nfsv3
sunrpc                289260  19 nfs,nfsd,auth_rpcgss,lockd,nfsv3,nfs_acl
fscache                63988  1 nfs
dm_crypt               23177  0
joydev                 17381  0
crct10dif_pclmul       14289  0
crc32_pclmul           13113  0
ghash_clmulni_intel    13216  0
video                  19476  0

The module “Module1” appears on the top now. It can be removed by using the

/tmp/module$ sudo rmmod Module1.ko

The removal of module in this case is simple, as our module has not been using any peripheral device or file to read or write data. In modules with more complex functions, removal is a very critical part.

What did our module do when it was attached to the kernel? use command dmesg

/tmp/module$ dmesg


Screen Shot 2016-07-19 at 5.18.26 PM

dmesg shows all kernel activity starting from when the kernel boots up. If you dont include the licensing information in your module program, you may see warning like the following in dmesg output.


Screen Shot 2016-07-19 at 8.42.11 PM

Kernel tainting by missing licensing information is one of the ways you get a warning from the kernel.  It means that your action has made changes to the kernel which is not recommended by the developer community and you may not get help resolving any errors you may face. Tread with caution here.

That’s all. A simple module registering into the kernel is just a foot in the door. This further opens up the possibility of creating programs like device drivers which can be used for more complex tasks.

Thanks for reading and please leave a comment/suggestion.

Linux Kernel Modules – 101

Selenium WebDriver – Handling Upload Dialogs using AutoIt

Handling an upload dialog while automating test cases using Selenium WebDriver can be a headache. There is a list of problems, to start with the first, WebDriver only works on the current webpage. Things like Upload dialogs, alerts, save as dialogs are handled by the operating system rather than the webpages.

So as the execution crosses the webpage boundary, things can go out of control. There is support in Webdriver to handle popup alerts but when one has to send some keyboard inputs instead of standard Ok/cancel button responses to a dialog, we do it this way as a user:

  1. Use mouse to browse – hard to replicate using code as mouse movements are complex and need lots of assumptions related to screen coordinates
  2. Set focus to the file name box and send keys – Easier!!

I tried emulating the mouse actions using PyAutoGUI, a python library, still working on it. Till that time, in interest of time, have implemented the second approach using another tool “AutoIT”

AutoIT allows me to interact with the windows components like dialogs. I can control focus on the dialogues and click on any element I want.

The simplest program in AutoIT is:

Local $hwnd = WinWait($CmdLine[1], "", 10)
ControlClick($hwnd, $CmdLine[3], int($CmdLine[4]))

Saved this file as “HandleFileUpload.au3” that is an AutoIt script format. It can be executed directly by double-clicking, but since I used command line arguments in it, only makes sense to execute it from command line. Further, AutoIt scripts can be compiled into windows executables (exe) easily. Refer AutoIT Tutorials for further details.

The program above takes 4 arguments

HandleFileUpload.exe <Window_Title_for_searching_window> <Keys_to_send_to_dialog> <Button_Text> <Button_Id>

The first argument will be used to identify the window/dialog we need to bring focus to (for Firefox, this text is “File Upload”, for IE, its “Choose File to Upload”). The second argument is a string to be sent to dialog, third argument is the text on the button, it can be “Open” or “Close”. Fourth argument is a numeric value, an ID assigned to a button on the active window. It can be determined using an Inbuilt AutoIT Tool named “AutoIT Window Info”.

Once we have the exe, we need to call it from the code. The example here is C#, can be Java or any other:

        /// Pass keys input to modal file Upload dialog
        /// There are two modes of passing values used in the function - Direct window method for Chrome and AutoIT exe for Firefox and IE
        /// AutoIT method is more reliable as it lets Autoit handle the dialog and keeps focus while sending keys. Chrome, does not support this function.
        /// The AutoIT exe is in the "ExternalTools" folder, it takes command as:
        ///     HandleFileUpload.exe "UPLOAD dialog title" "Keys to send" "Button text" "Button ID"
        ///     "Upload Dialog title" - The title of the upload window for IE, this is "Choose File to Upload" and for Firefox, it is "File Upload"
        ///     "Keys to send" - the desired keys to send
        ///     "Button Text" - Text on the button to click like "Open" and "Close"
        ///     "Button ID" - Button ID, can be found using AutoIT utility, generally, for both IE and Firefox, the ID for save button is "1" and cancel button is "2", we have used 1
        ///     Very unlikely that these values change with future versions of browsers
        /// </summary>
        /// <param name="strKeys">String to pass</param>
        /// <param name="driver">WebDriver Instance</param>
        public static void sendKeysToUploadDialog(IWebDriver driver, String strKeys)
            if(getBrowserName(driver).Equals("chrome")) //getBrowserName
                System.Windows.Forms.SendKeys.SendWait(strKeys + "~");
                ProcessStartInfo startinfo = new ProcessStartInfo();
                startinfo.FileName = "Path_To_Folder\\HandleFileUpload.exe";
                    startinfo.Arguments = "\"File Upload\"" + strKeys + "\"\" \"1\" ";
                else if (getBrowserName(driver).Equals("internet explorer"))
                    startinfo.Arguments = "\"Choose File to Upload\"" + strKeys + "\"\" \"1\" ";
                    using (Process exeProcess = Process.Start(startinfo))
                catch(Exception e)
                    //log messages or handle otherwise

/* Function to get the browser name as a string from the webdriver instance */
        public static String getBrowserName(IWebDriver driver)
            ICapabilities cap = ((RemoteWebDriver)driver).Capabilities;
            String strBrowserName = cap.BrowserName.ToLower();
            return strBrowserName;

Note: The Chrome browser has to be handled using windows default support in C# as it does not get handled by the AutoIt code

Thanks for reading.

Selenium WebDriver – Handling Upload Dialogs using AutoIt

ਅਸੀਂ ਨਾਨਕ ਦੇ ਕੀ ਲੱਗਦੇ ਹਾਂ – ਜਸਵੰਤ ਜ਼ਫ਼ਰ

ਨਾਨਕ ਤਾਂ ਪਹਿਲੇ ਦਿਨ ਹੀ
ਵਿਦਿਆਲੇ ਨੂੰ
ਵਿਦਿਆ ਦੀ ਵਲਗਣ ਨੂੰ
ਰੱਦ ਕੇ ਘਰ ਮੁੜੇ
ਘਰ ਮੁੜੇ ਘਰੋਂ ਜਾਣ ਲਈ
ਘਰੋਂ ਗਏ ਘਰ ਨੂੰ ਵਿਸਥਾਰਨ ਲਈ
ਵਿਸ਼ਾਲਣ ਲਈ

ਅਸੀਂ ਨਾਨਕ ਵਾਂਗ ਵਿਦਿਆਲੇ ਨੂੰ ਨਕਾਰ ਨਹੀਂ ਸਕਦੇ
ਨਾਨਕ ਨਾਮ ਤੇ ਵਿਦਿਆਲੇ ਉਸਾਰ ਸਕਦੇ ਹਾਂ –
ਗੁਰੂ ਨਾਨਕ ਵਿਦਿਆਲਾ
ਗੁਰੂ ਨਾਨਕ ਮਹਾਂਵਿਦਿਆਲਾ
ਗੁਰੂ ਨਾਨਕ ਵਿਸ਼ਵਵਿਦਿਆਲਾ

ਵਿਦਿਆਲੇ ਦੇ ਸੋਧੇ ਪ੍ਰਬੋਧੇ ਅਸੀਂ
ਵਿਦਿਆ ਦਾਨੀ
ਘਰਾਂ ਦੇ ਕੈਦੀ
ਪਤਵੰਤੇ ਸੱਜਣ
ਨਾਨਕ ਦੇ ਕੀ ਲੱਗਦੇ ਹਾਂ

~ ਜਸਵੰਤ ਜ਼ਫ਼ਰ ਦੀ ਕਿਤਾਬ “ਅਸੀਂ ਨਾਨਕ ਦੇ ਕੀ ਲੱਗਦੇ ਹਾਂ” ਵਿਚੋਂ

ਅਸੀਂ ਨਾਨਕ ਦੇ ਕੀ ਲੱਗਦੇ ਹਾਂ – ਜਸਵੰਤ ਜ਼ਫ਼ਰ

Monitoring WebDriver Actions – Using WebDriverEventListener and EventFiringWebDriver

WebDriver performs a sequence of complex actions in the background even for tasks which seem as trivial as navigating to a webpage and entering some test data. Selenium provides useful framework which gives a peep into the busy life of WebDriver. When things become critical, debugging at the level of each event becomes crucial – these events being navigating to a URL, web element’s value changing, script getting executed through webdriver and most useful one, an event just when exception occurs. TestNG provides its own implementations for the root level event tracking like ITestListener and ISuiteListeners, but Selenium has its own way of doing this.

To throw an event, WebDriver gives a class named EventFiringWebDriver, and to catch that event, it provides an interface named WebDriverEventListener. Together, these two can be used to trigger an event, catch it and perform desired action.

There may be more than one listeners waiting for a single event and handle it their own way. It’s done by registering multiple listeners to an EventFiringWebDriver.

The whole flow of things looks like:

  1. Create an EventListener class
  2. Create a WebDriver instance
  3. Create an instance of EventFiringWebDriver by passing driver from step 2
  4. Create an instance of EventListener
  5. Register this EventListener to the EventFiringWebDriver Instance
  6. Done !! Handle the events sent by WebDriver now

An event listener can be created by either:

  1. Implementing WebDriverEventListener interface
  2. Extending AbstractWebDriverEventListener class

I am using the interface in the example here. Implementing the interface makes me define all the methods of Interface. So the code for a class implementing WebDriverEventListener is like:

package org.selenium;
import org.openqa.selenium.*;

public class TheEventListener implements WebDriverEventListener{

	public void afterChangeValueOf(WebElement arg0, WebDriver arg1) {
		System.out.println("After change of value :" + arg0.toString());

	public void afterClickOn(WebElement arg0, WebDriver arg1) {
		System.out.println("After click on webelement: " + arg0.getText());

	public void afterFindBy(By arg0, WebElement arg1, WebDriver arg2) {
		System.out.println("After find by: " + arg0.toString());

	public void afterNavigateBack(WebDriver arg0) {
		System.out.println("After navigating back to : " + arg0.getCurrentUrl());

	public void afterNavigateForward(WebDriver arg0) {
		System.out.println("After navigating forward to : "+ arg0.getCurrentUrl());

	public void afterNavigateTo(String arg0, WebDriver arg1) {
		System.out.println("After navigating to : "+arg0);

	public void afterScript(String arg0, WebDriver arg1) {
		System.out.println("After execution of script : "+ arg0);

	public void beforeChangeValueOf(WebElement arg0, WebDriver arg1) {
		System.out.println("Before value change of : " + arg0.toString());

	public void beforeClickOn(WebElement arg0, WebDriver arg1) {
		System.out.println("Before clicking on WebElement : " + arg0.getText());

	public void beforeFindBy(By arg0, WebElement arg1, WebDriver arg2) {
		System.out.println("Before find by : " + arg0.toString());

	public void beforeNavigateBack(WebDriver arg0) {
		System.out.println("Before navigating back from : " + arg0.getCurrentUrl());

	public void beforeNavigateForward(WebDriver arg0) {
		System.out.println("Before navigating forward from : "+ arg0.getCurrentUrl());

	public void beforeNavigateTo(String arg0, WebDriver arg1) {
		System.out.println("Before navigating to : "+ arg0);

	public void beforeScript(String arg0, WebDriver arg1) {
		System.out.println("Before executing the script : " + arg0);

	public void onException(Throwable arg0, WebDriver arg1) {
		System.out.println("On exception : " + arg0.getMessage());


Now comes the event firing WebDriver. It’s the attention seeking big brother of WebDriver who loves to brag about what he’s going to do and have done. Listeners, tuned-in or, registered with this driver get to know everything and act as they want.

This is the code for an EventFiringWebdriver implementation:


package org.selenium;
import org.openqa.selenium.*;
import org.openqa.selenium.firefox.*;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;

public class TheDriver {
	WebDriver driver;
	public void setupBrowser()
		driver = new FirefoxDriver();
	public void BrowserTest()
	EventFiringWebDriver eventFiringDriver = new EventFiringWebDriver(driver); //Get EventFiringWebDriver instance
	TheEventListener eventListener = new TheEventListener(); //Get Listener instance
	eventFiringDriver.register(eventListener); // Register listener to driver
	eventFiringDriver.findElement("searchInput")).sendKeys("Lorem Ipsum");
	Select selLanguage = new Select(eventFiringDriver.findElement("searchLanguage")));
	public void exit() throws InterruptedException


On execution, you get every event listed on the console just as we wanted to be handled. It gives an exception which is also expected.

This is how the output looks like:

Before navigating to :
After navigating to :
Before find by : searchInput
After find by: searchInput
Before value change of : [[FirefoxDriver: firefox on WINDOWS (8dc5e6de-b235-452e-8884-dfb1d6ea4d0b)] -> id: searchInput]
After change of value :[[FirefoxDriver: firefox on WINDOWS (8dc5e6de-b235-452e-8884-dfb1d6ea4d0b)] -> id: searchInput]
Before find by : searchLanguage
After find by: searchLanguage
Before find by : By.xpath: .//option[@value = “en”]
After find by: By.xpath: .//option[@value = “en”]
Before find by : By.xpath: .//body[@id=’www-wikipedia-org’]/div[2]//input[@type=’submit’]
After find by: By.xpath: .//body[@id=’www-wikipedia-org’]/div[2]//input[@type=’submit’]
Before clicking on WebElement :
On exception : Element not found in the cache – perhaps the page has changed since it was looked up

Thanks for reading !!!

Monitoring WebDriver Actions – Using WebDriverEventListener and EventFiringWebDriver

SpecFlow and Selenium WebDriver – An alternative approach to hybrid test automation framework

Test Automation Frameworks – By definition, are there between the user and the Application Under Test, provide ground rules, reusable components, exception handling, fallback mechanisms and reporting to the users. Using an existing framework is supposed to be easy – adding test cases to the existing set of test cases, designing and developing one is a bit tedious, requires long going discussions within the team. There are couple of problems with group decision making and applicable to any decision not alone framework designs –

  • Larger the team, more discussions – less productive meetings – “law of diminishing returns” at play
  • Individual ideas, however brilliant each one of them is, gets outweighed by group wisdom – people feel safe in a group, and shy away from presenting a good idea – Abilene’s Paradox

Fortunately, If you get a chance like I got, finding myself as the only person responsible for the design and development of such framework, its time to learn. Though you have your skin in the game, but the learning is worth the risk. You got less to think about the group and more to focus on the problem at hand.

Me and my project

I have experience in automation using C language on a proprietary framework and on Quick Test Professional. I started learning Selenium WebDriver around two years back on my own but main learning started when I started working on the framework. I find it interesting that selenium is being embraced as an industry standard. Lots of help available on Internet and tool itself being so powerful to enable user to do anything in the never ending “-JSs” world. My challenge was to develop the framework for a new SPA built on AngularJS considering only one constraint – It has to be using Microsoft Tech Stack. Which means no Java 😦

But it’s fine. So the design began with the general framework expectations –

  • Data/Keyword Driven/Hybrid
  • Reusable
  • Robust
  • Well Documented
  • Intuitive
  • Easy to maintain
  • Loosely coupled components

We know an automation framework is not a test automation framework without these characteristics. So, where to focus? The answer to this question lies in another question that I asked myself – How am I going to make “my future self adding and running a new test case” easy?

So focusing on the “Intuitive” part at starting was the good choice.

These are the tools used for the development of this framework:

Considerations while designing

Consideration 1 – Data/Keyword/Hybrid?

The decision of building Data driven or keyword driven or Hybrid framework has to be made at the inception. If I need to test a single workflow with varying data, I need to write a Data driven framework. The data drives the test cases here. The flow of execution is like :

  1. Test control with value = value1
  2. Test control with value = value2
  3. Test control with value = value3
  4. ………

Keyword Driven framework on the other hand, is driven by workflows. Workflows – as per definition, are sequence of steps to achieve a result. In keyword driven framework, you define keywords that can be used to initiate a sequence of steps which complete a workflow. For example, I define “login()” as a sequence of entering user name, entering password and click login button. Then I just need to call this keyword to perform login. The execution flow here is:

  1. Test login function
  2. Test functionality1
  3. Test functionality2 with data1
  4. …..

Hybrid frameworks are the ones which use the best of the two worlds. You have data to test and workflows too.

  1. Test login function
  2. Test functionality1 with data1
  3. Test functionality2
    1. with data1
    2. with data2
    3. with data3
  4. …..


Consideration 2 – Re-usability

Have to write a layered framework where the different layers interact with each other. Framework is going to be for a specific product but atleast one of the layer can be made in such a way to be plug-and-play for others to use. This layer can have the basic coding standards, tools etc supported by the organizations standards and guidelines. Here are some suggestions:

  • Decide on the naming conventions to be used like camel casing for function names etc
  • Use self explanatory names for variables, functions etc
  • Consider use of enumerations wherever possible
  • Create different projects within the main solution to get libraries of the individual projects which can then be referenced elsewhere and built independently when changed
  • Separate layers on logical basis. For instance, define data access layer, basic selenium functions with exception handling and logging in a different layer, test cases in a different layer. This allows changes to be made easily in one layer when required

Consideration 3 – Robustness

Framework has to make sure test cases do not change behavior on changes in external factor and not raise a false alarm. External factors being – web pages being too slow, elements loading in random order, exceptions not caught or exceptions not  documented well to get idea of what caused the failure.

For web pages loading related tasks, explicit waits (waiting for certain condition to be true before performing an action on the element) were the best solution for my problem. Implicit wait(polling the DOM for the elements and waiting for the elements to be loaded, then trying again second and last time), causes delays in the code as they are set for all elements on the page. Explicit waits can be used if we know there are certain elements which take more time.

This is an implicit wait called just after getting the driver instance:


Here is an explicit wait:

IWebDriver driver;
WebElement element;
By elementLocator = new By.XPath("Xpath value");

WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(DEFAULT_TIMEOUT_TIME_SEC));
element = wait.Until(ExpectedConditions.ElementIsVisible(elementLocator));

If need be, go for defining constant wait times in config files like app.config in c# or any other file in other languages.

Custom exception messages go a long way in reducing the debugging time considerably. For instance, catching the “ElementNotFoundException” all the times and logging inner text leads to nowhere.

try {
/*some code here*/
throw new Exception("Some error occurred at this block");
} catch (Exception e) {

Another step towards robustness what I thought was implementing my Object Repository as an XML instead of using excel sheet. Based on my experience with frameworks, I can say the data stored in object repositories can be potentially harmful sometimes (I injected code from the repository once because it was not handled). There was additional overhead required in maintaining the integrity of the Object Repository. I have seen implementations where additional checks were implemented to check the data that was entered in excel sheets. XML, however, allows me to enforce a schema (XSD) on an XML. If the user edits repository in Visual Studio, the IDE itself takes care of the enforcement. However, if user edits XML outside, the XML can be validated against schema before loading it to memory. The data from schema is then loaded into a Dictionary<String,String> object in the program.

XML schema validation can be implemented like:

public static void isValidXML(String sXML, String sXSD)
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.Schemas.Add(null, sXSD);
            settings.ValidationType = ValidationType.Schema;
            settings.ValidationEventHandler += new System.Xml.Schema.ValidationEventHandler(ValidationCallBack);

            XmlReader reader = XmlReader.Create(sXML, settings);

            while (reader.Read()) ;

        // Display any warnings or errors. 
        private static void ValidationCallBack(object sender, ValidationEventArgs args)
            if (args.Severity == XmlSeverityType.Warning)
                Console.WriteLine("\tWarning: Matching schema not found.  No validation occurred." + args.Message);
                Console.WriteLine("\tValidation error: " + args.Message);


Dictionary allows to have unique Key values which can be retrieved quickly when required.

Tip: I defined a data access layer to define these functions allowing a scope of implementation of other methods like reading from  database or excel sheet or anything.

Consideration 4 – Well Documented

Use of comments and <Summary> for each member functions in classes helps to have a quick idea of what a function does. Anyone who starts working on the framework quickly comes to speed by reading these.

To insert the <Summary> tags, just type “///” on top of the function. Refer MSDN documentation for the same.

Here is an example of the summary:

       private static float fPercentageDiff;

        /// CompareImages function lets compare two images for similarities.
        /// </summary>
        /// <param name="imgPath">Image path</param>
        /// <param name="imgPathBaseline"> Baseline Image path</param>
        /// <param name="Tolerance">Value between 0-100 as a percentage. 0 means no tolerance, images should match exactly, while 100 means images can totally differ</param>
        /// <returns>Boolean value - If the provided images match within the specified Tolerance limit</returns>
        public static Boolean CompareImages(String imgPath, String imgPathBaseline, int Tolerance)
           LoggingHelper.LogMessage(LogLevel.INFO, "Entered Method CompareImages ", typeof(Verification).ToString());


Consideration 5 – Loosely Coupled

This point is not in the exact order that I specified above, but it can be discussed here. We have interesting stuff coming up later.

Loosely coupled code is the key to increase the granularity and re-usability of code. Whenever I heard such water cooler discussions, one name appeared repeatedly – Dependency Injection. It’s basically a design pattern used to decouple two classes.

Conceptual view of the Dependency Injection pattern – source: MDSN documentation

The idea here is – not to use the “new” keyword in the class where other class is used. Instead, let the instance creation and passing to your code, where its member functions will be used, to be handled by another class. This makes the two classes independent.

Consideration 6 – Intuitive

Now comes the part where things are somewhat different from conventional frameworks. My idea of adding and maintaining test cases always had simple “natural language to machine language mapping”. The closest analogy I found is “Behavior Driven Development” or BDD. It’s an approach majorly used in development which focuses on bridging the gap between development and business teams through the use of common tools and domain language formats. Business people write their specifications in the same format and developers read and develop accordingly. “Gherkin” is one popular format:

Story: Returns go to stock

In order to keep track of stock
As a store owner
I want to add items back to stock when they’re returned

Scenario 1: Refunded items should be returned to stock
Given a customer previously bought a black sweater from me
And I currently have three black sweaters left in stock
When he returns the sweater for a refund
Then I should have four black sweaters in stock


Here, the words in BOLD are keywords. This language is legible to both business and developers. For developers working on Unit tests, “Given” specifies the precondition, “When” specifies the action and “Then” specifies the desired result. “And” can be used to include more conditions under same category.

Is’nt this the same flow we follow while testing? Set everything, perform action(s) and verify result(s)? I decided to use Specflow – a given/when/then tool for C# majorly used for Unit testing.

The same framework can be used with richer language for UI interactions based automation testing on top of Tools like Selenium WebDriver. After all, interacting with the UI is the closest one can get to a human interaction with a system. Only thing to keep in mind is, have to keep everything on UI level then. All actions and their validations etc.

The architecture of the framework comes out to be like this:

Framework Architecture
Framework Architecture

As shown in the diagram, there are three layers in the architecture. Selenium, the basic logging functions and data access layer are part of the common library functions. On top of that is the workflows layer, which has keywords defined as functions which further use common library functions to perform actions. So workflows are basically combinations of ground level actions.

The top most layer contains the SpecFlow feature file and steps definitions. Feature file looks like

Specflow Feature File
Specflow Feature file

I can pass my baseline strings in this example directly in the feature file and can update them when required. Similarly, if I need to pass some Image file path for comparison, that too can be done.

This feature file is accompanied by a step file i.e. the mapping part. The code looks like:

[Then(@"I can see footer text ""(.*)""")]
        public void ThenICanSeeFooterText(string p0)
            Assert.AreEqual(p0, DashboardWorkFlows.getFooterText().ToString() , true);

Further, this call here goes to our workflows layer:

        //  Get footer text by using XPath for footer from object repository
        public static String getFooterText()
            return Utilities.getString(_driver,By.XPath(dictObjectRepository["lblFooter"]));

Which further calls the library functions in selenium to get the required thing done. Here, the XPath address
passed is from the object repository which was read into a Dictionary<String,String> object.

The library function is:

        // Overloaded function getValue to get string value from the given element
        public static String getString(IWebDriver driver, By locator)
            LoggingHelper.LogMessage(LogLevel.INFO, "Entering method getString ", typeof(Utilities).ToString());

                return (driver.FindElement(locator).Text);
            catch (NoSuchElementException e)
                LoggingHelper.LogMessage(LogLevel.ERROR, "getString - No Element found to get text. Error: " + e.Message, typeof(Utilities).ToString());
                return null;

On Hybrid nature:

The framework, as I said earlier is a Hybrid one. Keywords are defined in the middle layer of architecture and data can be given directly in SpecFlow feature file as :

Specflow feature file with data table
Specflow feature file with data table

Final Thought:
I could not share most of the code/design part, but would love to help if someone needs help in understanding the implementation. Also, suggestions/comments are most welcome.

Thanks for reading.

SpecFlow and Selenium WebDriver – An alternative approach to hybrid test automation framework

Python-OpenCV for detecting colored object

I always wondered how “Line following robots” worked. How do machines see something? Read news about famous people warning against implications of Artificial intelligence but always wondered how can intelligence be artificial? We, as humans, ever since we are born start collecting data from all around us and form our intelligence on the empirical findings.Nature gave us sensory organs for that.

Machines too got sensory organs. The one that really fascinates me is “Vision”. OpenCV, or Open Computer Vision is such a program library that provides computers the ability to see through a camera and analyse the images.

This is my first program using OpenCV and Python (its easy to use Python for prototyping) which allows the computer to use its camera and filter out only one color from the frames it captures.

import cv2
import numpy as np

def getthresholdedimg(hsv):
threshImg = cv2.inRange(hsv,np.array((cv2.getTrackbarPos('Hue_Low','Trackbars'),cv2.getTrackbarPos('Saturation_Low','Trackbars'),cv2.getTrackbarPos('Value_Low','Trackbars'))),np.array((cv2.getTrackbarPos('Hue_High','Trackbars'),cv2.getTrackbarPos('Saturation_High','Trackbars'),cv2.getTrackbarPos('Value_High','Trackbars'))))
return threshImg

def getTrackValue(value):
return value
c = cv2.VideoCapture(0)
width,height = c.get(3),c.get(4)
print "frame width and height : ", width, height

cv2.namedWindow('Trackbars', cv2.WINDOW_NORMAL)
cv2.createTrackbar('Hue_Low','Trackbars',0,255, getTrackValue)
cv2.createTrackbar('Saturation_Low','Trackbars',0,255, getTrackValue)
cv2.createTrackbar('Value_Low','Trackbars',0,255, getTrackValue)

cv2.createTrackbar('Hue_High','Trackbars',0,255, getTrackValue)
cv2.createTrackbar('Saturation_High','Trackbars',0,255, getTrackValue)
cv2.createTrackbar('Value_High','Trackbars',0,255, getTrackValue)
cv2.createTrackbar('Caliberate','Trackbars',0,1, getTrackValue)

_,f =
f = cv2.flip(f,1)
blur = cv2.medianBlur(f,5)
hsv = cv2.cvtColor(f,cv2.COLOR_BGR2HSV)
thrImg = getthresholdedimg(hsv)
erode = cv2.erode(thrImg,None,iterations = 3)
dilate = cv2.dilate(erode,None,iterations = 10)

contours,hierarchy = cv2.findContours(dilate,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
x,y,w,h = cv2.boundingRect(cnt)
cx,cy = x+w/2, y+h/2


if(cv2.getTrackbarPos('Caliberate','Trackbars') == 1):

if cv2.waitKey(10) & 0xFF == ord('q'):



The code is written in Python 2.7 and OpenCV 2.4.10. It shows one console, and two preview boxes. One shows the image while the other has sliders on it.

OpenCV Dialogs
Open CV Program Windows

The OpenCV library does its job of identifying and spotting the object of specific color in front of camera. The only tricky part I found was to get the exact “Hue Saturation Value” range for a particular color. The code when run, opens a window with sliders on it. Just turn on the calibrate slider and try adjusting the other LOW and HIGH values sliders for HSV value to see the object under observation is the only one visible now (it can be the only white on black background or black on white)

Once calibration is done, switch back to image by turning off the calibrate switch. That’s it !!! the program will start identifying the object and draw a red box outside it.

(I struggled a lot for getting the correct set of HSV values, hope this blog helps someone)

Edit Jan 2017 – The code is available on github

Python-OpenCV for detecting colored object