Selenium WebDriver – Handling Upload Dialogs using AutoIt

Handling an upload dialog while automating test cases using Selenium WebDriver can be a headache. There is a list of problems, to start with the first, WebDriver only works on the current webpage. Things like Upload dialogs, alerts, save as dialogs are handled by the operating system rather than the webpages.

So as the execution crosses the webpage boundary, things can go out of control. There is support in Webdriver to handle popup alerts but when one has to send some keyboard inputs instead of standard Ok/cancel button responses to a dialog, we do it this way as a user:

  1. Use mouse to browse – hard to replicate using code as mouse movements are complex and need lots of assumptions related to screen coordinates
  2. Set focus to the file name box and send keys – Easier!!

I tried emulating the mouse actions using PyAutoGUI, a python library, still working on it. Till that time, in interest of time, have implemented the second approach using another tool “AutoIT”

AutoIT allows me to interact with the windows components like dialogs. I can control focus on the dialogues and click on any element I want.

The simplest program in AutoIT is:

Local $hwnd = WinWait($CmdLine[1], "", 10)
ControlClick($hwnd, $CmdLine[3], int($CmdLine[4]))

Saved this file as “HandleFileUpload.au3” that is an AutoIt script format. It can be executed directly by double-clicking, but since I used command line arguments in it, only makes sense to execute it from command line. Further, AutoIt scripts can be compiled into windows executables (exe) easily. Refer AutoIT Tutorials for further details.

The program above takes 4 arguments

HandleFileUpload.exe <Window_Title_for_searching_window> <Keys_to_send_to_dialog> <Button_Text> <Button_Id>

The first argument will be used to identify the window/dialog we need to bring focus to (for Firefox, this text is “File Upload”, for IE, its “Choose File to Upload”). The second argument is a string to be sent to dialog, third argument is the text on the button, it can be “Open” or “Close”. Fourth argument is a numeric value, an ID assigned to a button on the active window. It can be determined using an Inbuilt AutoIT Tool named “AutoIT Window Info”.

Once we have the exe, we need to call it from the code. The example here is C#, can be Java or any other:

        /// Pass keys input to modal file Upload dialog
        /// There are two modes of passing values used in the function - Direct window method for Chrome and AutoIT exe for Firefox and IE
        /// AutoIT method is more reliable as it lets Autoit handle the dialog and keeps focus while sending keys. Chrome, does not support this function.
        /// The AutoIT exe is in the "ExternalTools" folder, it takes command as:
        ///     HandleFileUpload.exe "UPLOAD dialog title" "Keys to send" "Button text" "Button ID"
        ///     "Upload Dialog title" - The title of the upload window for IE, this is "Choose File to Upload" and for Firefox, it is "File Upload"
        ///     "Keys to send" - the desired keys to send
        ///     "Button Text" - Text on the button to click like "Open" and "Close"
        ///     "Button ID" - Button ID, can be found using AutoIT utility, generally, for both IE and Firefox, the ID for save button is "1" and cancel button is "2", we have used 1
        ///     Very unlikely that these values change with future versions of browsers
        /// </summary>
        /// <param name="strKeys">String to pass</param>
        /// <param name="driver">WebDriver Instance</param>
        public static void sendKeysToUploadDialog(IWebDriver driver, String strKeys)
            if(getBrowserName(driver).Equals("chrome")) //getBrowserName
                System.Windows.Forms.SendKeys.SendWait(strKeys + "~");
                ProcessStartInfo startinfo = new ProcessStartInfo();
                startinfo.FileName = "Path_To_Folder\\HandleFileUpload.exe";
                    startinfo.Arguments = "\"File Upload\"" + strKeys + "\"\" \"1\" ";
                else if (getBrowserName(driver).Equals("internet explorer"))
                    startinfo.Arguments = "\"Choose File to Upload\"" + strKeys + "\"\" \"1\" ";
                    using (Process exeProcess = Process.Start(startinfo))
                catch(Exception e)
                    //log messages or handle otherwise

/* Function to get the browser name as a string from the webdriver instance */
        public static String getBrowserName(IWebDriver driver)
            ICapabilities cap = ((RemoteWebDriver)driver).Capabilities;
            String strBrowserName = cap.BrowserName.ToLower();
            return strBrowserName;

Note: The Chrome browser has to be handled using windows default support in C# as it does not get handled by the AutoIt code

Thanks for reading.

Selenium WebDriver – Handling Upload Dialogs using AutoIt

Monitoring WebDriver Actions – Using WebDriverEventListener and EventFiringWebDriver

WebDriver performs a sequence of complex actions in the background even for tasks which seem as trivial as navigating to a webpage and entering some test data. Selenium provides useful framework which gives a peep into the busy life of WebDriver. When things become critical, debugging at the level of each event becomes crucial – these events being navigating to a URL, web element’s value changing, script getting executed through webdriver and most useful one, an event just when exception occurs. TestNG provides its own implementations for the root level event tracking like ITestListener and ISuiteListeners, but Selenium has its own way of doing this.

To throw an event, WebDriver gives a class named EventFiringWebDriver, and to catch that event, it provides an interface named WebDriverEventListener. Together, these two can be used to trigger an event, catch it and perform desired action.

There may be more than one listeners waiting for a single event and handle it their own way. It’s done by registering multiple listeners to an EventFiringWebDriver.

The whole flow of things looks like:

  1. Create an EventListener class
  2. Create a WebDriver instance
  3. Create an instance of EventFiringWebDriver by passing driver from step 2
  4. Create an instance of EventListener
  5. Register this EventListener to the EventFiringWebDriver Instance
  6. Done !! Handle the events sent by WebDriver now

An event listener can be created by either:

  1. Implementing WebDriverEventListener interface
  2. Extending AbstractWebDriverEventListener class

I am using the interface in the example here. Implementing the interface makes me define all the methods of Interface. So the code for a class implementing WebDriverEventListener is like:

package org.selenium;
import org.openqa.selenium.*;

public class TheEventListener implements WebDriverEventListener{

	public void afterChangeValueOf(WebElement arg0, WebDriver arg1) {
		System.out.println("After change of value :" + arg0.toString());

	public void afterClickOn(WebElement arg0, WebDriver arg1) {
		System.out.println("After click on webelement: " + arg0.getText());

	public void afterFindBy(By arg0, WebElement arg1, WebDriver arg2) {
		System.out.println("After find by: " + arg0.toString());

	public void afterNavigateBack(WebDriver arg0) {
		System.out.println("After navigating back to : " + arg0.getCurrentUrl());

	public void afterNavigateForward(WebDriver arg0) {
		System.out.println("After navigating forward to : "+ arg0.getCurrentUrl());

	public void afterNavigateTo(String arg0, WebDriver arg1) {
		System.out.println("After navigating to : "+arg0);

	public void afterScript(String arg0, WebDriver arg1) {
		System.out.println("After execution of script : "+ arg0);

	public void beforeChangeValueOf(WebElement arg0, WebDriver arg1) {
		System.out.println("Before value change of : " + arg0.toString());

	public void beforeClickOn(WebElement arg0, WebDriver arg1) {
		System.out.println("Before clicking on WebElement : " + arg0.getText());

	public void beforeFindBy(By arg0, WebElement arg1, WebDriver arg2) {
		System.out.println("Before find by : " + arg0.toString());

	public void beforeNavigateBack(WebDriver arg0) {
		System.out.println("Before navigating back from : " + arg0.getCurrentUrl());

	public void beforeNavigateForward(WebDriver arg0) {
		System.out.println("Before navigating forward from : "+ arg0.getCurrentUrl());

	public void beforeNavigateTo(String arg0, WebDriver arg1) {
		System.out.println("Before navigating to : "+ arg0);

	public void beforeScript(String arg0, WebDriver arg1) {
		System.out.println("Before executing the script : " + arg0);

	public void onException(Throwable arg0, WebDriver arg1) {
		System.out.println("On exception : " + arg0.getMessage());


Now comes the event firing WebDriver. It’s the attention seeking big brother of WebDriver who loves to brag about what he’s going to do and have done. Listeners, tuned-in or, registered with this driver get to know everything and act as they want.

This is the code for an EventFiringWebdriver implementation:


package org.selenium;
import org.openqa.selenium.*;
import org.openqa.selenium.firefox.*;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;

public class TheDriver {
	WebDriver driver;
	public void setupBrowser()
		driver = new FirefoxDriver();
	public void BrowserTest()
	EventFiringWebDriver eventFiringDriver = new EventFiringWebDriver(driver); //Get EventFiringWebDriver instance
	TheEventListener eventListener = new TheEventListener(); //Get Listener instance
	eventFiringDriver.register(eventListener); // Register listener to driver
	eventFiringDriver.findElement("searchInput")).sendKeys("Lorem Ipsum");
	Select selLanguage = new Select(eventFiringDriver.findElement("searchLanguage")));
	public void exit() throws InterruptedException


On execution, you get every event listed on the console just as we wanted to be handled. It gives an exception which is also expected.

This is how the output looks like:

Before navigating to :
After navigating to :
Before find by : searchInput
After find by: searchInput
Before value change of : [[FirefoxDriver: firefox on WINDOWS (8dc5e6de-b235-452e-8884-dfb1d6ea4d0b)] -> id: searchInput]
After change of value :[[FirefoxDriver: firefox on WINDOWS (8dc5e6de-b235-452e-8884-dfb1d6ea4d0b)] -> id: searchInput]
Before find by : searchLanguage
After find by: searchLanguage
Before find by : By.xpath: .//option[@value = “en”]
After find by: By.xpath: .//option[@value = “en”]
Before find by : By.xpath: .//body[@id=’www-wikipedia-org’]/div[2]//input[@type=’submit’]
After find by: By.xpath: .//body[@id=’www-wikipedia-org’]/div[2]//input[@type=’submit’]
Before clicking on WebElement :
On exception : Element not found in the cache – perhaps the page has changed since it was looked up

Thanks for reading !!!

Monitoring WebDriver Actions – Using WebDriverEventListener and EventFiringWebDriver

SpecFlow and Selenium WebDriver – An alternative approach to hybrid test automation framework

Test Automation Frameworks – By definition, are there between the user and the Application Under Test, provide ground rules, reusable components, exception handling, fallback mechanisms and reporting to the users. Using an existing framework is supposed to be easy – adding test cases to the existing set of test cases, designing and developing one is a bit tedious, requires long going discussions within the team. There are couple of problems with group decision making and applicable to any decision not alone framework designs –

  • Larger the team, more discussions – less productive meetings – “law of diminishing returns” at play
  • Individual ideas, however brilliant each one of them is, gets outweighed by group wisdom – people feel safe in a group, and shy away from presenting a good idea – Abilene’s Paradox

Fortunately, If you get a chance like I got, finding myself as the only person responsible for the design and development of such framework, its time to learn. Though you have your skin in the game, but the learning is worth the risk. You got less to think about the group and more to focus on the problem at hand.

Me and my project

I have experience in automation using C language on a proprietary framework and on Quick Test Professional. I started learning Selenium WebDriver around two years back on my own but main learning started when I started working on the framework. I find it interesting that selenium is being embraced as an industry standard. Lots of help available on Internet and tool itself being so powerful to enable user to do anything in the never ending “-JSs” world. My challenge was to develop the framework for a new SPA built on AngularJS considering only one constraint – It has to be using Microsoft Tech Stack. Which means no Java 😦

But it’s fine. So the design began with the general framework expectations –

  • Data/Keyword Driven/Hybrid
  • Reusable
  • Robust
  • Well Documented
  • Intuitive
  • Easy to maintain
  • Loosely coupled components

We know an automation framework is not a test automation framework without these characteristics. So, where to focus? The answer to this question lies in another question that I asked myself – How am I going to make “my future self adding and running a new test case” easy?

So focusing on the “Intuitive” part at starting was the good choice.

These are the tools used for the development of this framework:

Considerations while designing

Consideration 1 – Data/Keyword/Hybrid?

The decision of building Data driven or keyword driven or Hybrid framework has to be made at the inception. If I need to test a single workflow with varying data, I need to write a Data driven framework. The data drives the test cases here. The flow of execution is like :

  1. Test control with value = value1
  2. Test control with value = value2
  3. Test control with value = value3
  4. ………

Keyword Driven framework on the other hand, is driven by workflows. Workflows – as per definition, are sequence of steps to achieve a result. In keyword driven framework, you define keywords that can be used to initiate a sequence of steps which complete a workflow. For example, I define “login()” as a sequence of entering user name, entering password and click login button. Then I just need to call this keyword to perform login. The execution flow here is:

  1. Test login function
  2. Test functionality1
  3. Test functionality2 with data1
  4. …..

Hybrid frameworks are the ones which use the best of the two worlds. You have data to test and workflows too.

  1. Test login function
  2. Test functionality1 with data1
  3. Test functionality2
    1. with data1
    2. with data2
    3. with data3
  4. …..


Consideration 2 – Re-usability

Have to write a layered framework where the different layers interact with each other. Framework is going to be for a specific product but atleast one of the layer can be made in such a way to be plug-and-play for others to use. This layer can have the basic coding standards, tools etc supported by the organizations standards and guidelines. Here are some suggestions:

  • Decide on the naming conventions to be used like camel casing for function names etc
  • Use self explanatory names for variables, functions etc
  • Consider use of enumerations wherever possible
  • Create different projects within the main solution to get libraries of the individual projects which can then be referenced elsewhere and built independently when changed
  • Separate layers on logical basis. For instance, define data access layer, basic selenium functions with exception handling and logging in a different layer, test cases in a different layer. This allows changes to be made easily in one layer when required

Consideration 3 – Robustness

Framework has to make sure test cases do not change behavior on changes in external factor and not raise a false alarm. External factors being – web pages being too slow, elements loading in random order, exceptions not caught or exceptions not  documented well to get idea of what caused the failure.

For web pages loading related tasks, explicit waits (waiting for certain condition to be true before performing an action on the element) were the best solution for my problem. Implicit wait(polling the DOM for the elements and waiting for the elements to be loaded, then trying again second and last time), causes delays in the code as they are set for all elements on the page. Explicit waits can be used if we know there are certain elements which take more time.

This is an implicit wait called just after getting the driver instance:


Here is an explicit wait:

IWebDriver driver;
WebElement element;
By elementLocator = new By.XPath("Xpath value");

WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(DEFAULT_TIMEOUT_TIME_SEC));
element = wait.Until(ExpectedConditions.ElementIsVisible(elementLocator));

If need be, go for defining constant wait times in config files like app.config in c# or any other file in other languages.

Custom exception messages go a long way in reducing the debugging time considerably. For instance, catching the “ElementNotFoundException” all the times and logging inner text leads to nowhere.

try {
/*some code here*/
throw new Exception("Some error occurred at this block");
} catch (Exception e) {

Another step towards robustness what I thought was implementing my Object Repository as an XML instead of using excel sheet. Based on my experience with frameworks, I can say the data stored in object repositories can be potentially harmful sometimes (I injected code from the repository once because it was not handled). There was additional overhead required in maintaining the integrity of the Object Repository. I have seen implementations where additional checks were implemented to check the data that was entered in excel sheets. XML, however, allows me to enforce a schema (XSD) on an XML. If the user edits repository in Visual Studio, the IDE itself takes care of the enforcement. However, if user edits XML outside, the XML can be validated against schema before loading it to memory. The data from schema is then loaded into a Dictionary<String,String> object in the program.

XML schema validation can be implemented like:

public static void isValidXML(String sXML, String sXSD)
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.Schemas.Add(null, sXSD);
            settings.ValidationType = ValidationType.Schema;
            settings.ValidationEventHandler += new System.Xml.Schema.ValidationEventHandler(ValidationCallBack);

            XmlReader reader = XmlReader.Create(sXML, settings);

            while (reader.Read()) ;

        // Display any warnings or errors. 
        private static void ValidationCallBack(object sender, ValidationEventArgs args)
            if (args.Severity == XmlSeverityType.Warning)
                Console.WriteLine("\tWarning: Matching schema not found.  No validation occurred." + args.Message);
                Console.WriteLine("\tValidation error: " + args.Message);


Dictionary allows to have unique Key values which can be retrieved quickly when required.

Tip: I defined a data access layer to define these functions allowing a scope of implementation of other methods like reading from  database or excel sheet or anything.

Consideration 4 – Well Documented

Use of comments and <Summary> for each member functions in classes helps to have a quick idea of what a function does. Anyone who starts working on the framework quickly comes to speed by reading these.

To insert the <Summary> tags, just type “///” on top of the function. Refer MSDN documentation for the same.

Here is an example of the summary:

       private static float fPercentageDiff;

        /// CompareImages function lets compare two images for similarities.
        /// </summary>
        /// <param name="imgPath">Image path</param>
        /// <param name="imgPathBaseline"> Baseline Image path</param>
        /// <param name="Tolerance">Value between 0-100 as a percentage. 0 means no tolerance, images should match exactly, while 100 means images can totally differ</param>
        /// <returns>Boolean value - If the provided images match within the specified Tolerance limit</returns>
        public static Boolean CompareImages(String imgPath, String imgPathBaseline, int Tolerance)
           LoggingHelper.LogMessage(LogLevel.INFO, "Entered Method CompareImages ", typeof(Verification).ToString());


Consideration 5 – Loosely Coupled

This point is not in the exact order that I specified above, but it can be discussed here. We have interesting stuff coming up later.

Loosely coupled code is the key to increase the granularity and re-usability of code. Whenever I heard such water cooler discussions, one name appeared repeatedly – Dependency Injection. It’s basically a design pattern used to decouple two classes.

Conceptual view of the Dependency Injection pattern – source: MDSN documentation

The idea here is – not to use the “new” keyword in the class where other class is used. Instead, let the instance creation and passing to your code, where its member functions will be used, to be handled by another class. This makes the two classes independent.

Consideration 6 – Intuitive

Now comes the part where things are somewhat different from conventional frameworks. My idea of adding and maintaining test cases always had simple “natural language to machine language mapping”. The closest analogy I found is “Behavior Driven Development” or BDD. It’s an approach majorly used in development which focuses on bridging the gap between development and business teams through the use of common tools and domain language formats. Business people write their specifications in the same format and developers read and develop accordingly. “Gherkin” is one popular format:

Story: Returns go to stock

In order to keep track of stock
As a store owner
I want to add items back to stock when they’re returned

Scenario 1: Refunded items should be returned to stock
Given a customer previously bought a black sweater from me
And I currently have three black sweaters left in stock
When he returns the sweater for a refund
Then I should have four black sweaters in stock


Here, the words in BOLD are keywords. This language is legible to both business and developers. For developers working on Unit tests, “Given” specifies the precondition, “When” specifies the action and “Then” specifies the desired result. “And” can be used to include more conditions under same category.

Is’nt this the same flow we follow while testing? Set everything, perform action(s) and verify result(s)? I decided to use Specflow – a given/when/then tool for C# majorly used for Unit testing.

The same framework can be used with richer language for UI interactions based automation testing on top of Tools like Selenium WebDriver. After all, interacting with the UI is the closest one can get to a human interaction with a system. Only thing to keep in mind is, have to keep everything on UI level then. All actions and their validations etc.

The architecture of the framework comes out to be like this:

Framework Architecture
Framework Architecture

As shown in the diagram, there are three layers in the architecture. Selenium, the basic logging functions and data access layer are part of the common library functions. On top of that is the workflows layer, which has keywords defined as functions which further use common library functions to perform actions. So workflows are basically combinations of ground level actions.

The top most layer contains the SpecFlow feature file and steps definitions. Feature file looks like

Specflow Feature File
Specflow Feature file

I can pass my baseline strings in this example directly in the feature file and can update them when required. Similarly, if I need to pass some Image file path for comparison, that too can be done.

This feature file is accompanied by a step file i.e. the mapping part. The code looks like:

[Then(@"I can see footer text ""(.*)""")]
        public void ThenICanSeeFooterText(string p0)
            Assert.AreEqual(p0, DashboardWorkFlows.getFooterText().ToString() , true);

Further, this call here goes to our workflows layer:

        //  Get footer text by using XPath for footer from object repository
        public static String getFooterText()
            return Utilities.getString(_driver,By.XPath(dictObjectRepository["lblFooter"]));

Which further calls the library functions in selenium to get the required thing done. Here, the XPath address
passed is from the object repository which was read into a Dictionary<String,String> object.

The library function is:

        // Overloaded function getValue to get string value from the given element
        public static String getString(IWebDriver driver, By locator)
            LoggingHelper.LogMessage(LogLevel.INFO, "Entering method getString ", typeof(Utilities).ToString());

                return (driver.FindElement(locator).Text);
            catch (NoSuchElementException e)
                LoggingHelper.LogMessage(LogLevel.ERROR, "getString - No Element found to get text. Error: " + e.Message, typeof(Utilities).ToString());
                return null;

On Hybrid nature:

The framework, as I said earlier is a Hybrid one. Keywords are defined in the middle layer of architecture and data can be given directly in SpecFlow feature file as :

Specflow feature file with data table
Specflow feature file with data table

Final Thought:
I could not share most of the code/design part, but would love to help if someone needs help in understanding the implementation. Also, suggestions/comments are most welcome.

Thanks for reading.

SpecFlow and Selenium WebDriver – An alternative approach to hybrid test automation framework